public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kexec on ia64
@ 2004-11-15 20:32 Khalid Aziz
  2004-11-15 21:15 ` Luck, Tony
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-15 20:32 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 2763 bytes --]

I have been able to get kexec working on ia64. I am attaching the kernel
patch and kexec-tools patch. For the kernel patch, start with 2.6.8
kernel from kernel.org, apply ia64 patch
<http://www.kernel.org/pub/linux/kernel/ports/ia64/v2.6/linux-2.6.8-ia64-040901.diff.bz2>, apply Eric' 2.6.8.1-kexec3 patch <http://www.xmission.com/~ebiederm/files/kexec/2.6.8.1-kexec3> and apply attached 2.6.8.1-kexec3-ia64.diff patch. For kexec-tools, apply attached kexec-tools-1.98-ia64.diff patch to Eric's kexec-tools 1.98 sources <http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.98.tgz>.

At this point, I have done minimal testing. Here is what I know does not
work currently:

1. No support for initrd for kexec'd kernel

2. No support for new kernel parameters for kexec'd kernel.

3. If a kernel is booted up with "mem=" or "max_addr=" to restrict the
amount of memory, a kernel kexec'd from this kernel will only see the
same amount of memory as this one. This is not only due to the new
kernel being kexec'd with the same parameter, but also becuase the EFI
memory map as passed to the kernel by ELILO gets trimmed very early on
by the kernel. I have tried adding code to save the memory map early on
and then pass this saved memory map to kexec'd kernel, but apparently I
still am not saving it early enough. I wait until bootmem allocator has
been initailized so I can allocate memory to save unmolested EFI memory
map in. In the process of initializing bootmem allocator, kernel calls
efi-Memory_map_walk() which in turn trims the memory map. I am looking
into allocating memory out of the EFI memory map before the first
efi_mem_map_walk() happens, so I can save pristine EFI memmap for use
later by kexec.

Here is what I have not tested yet:

1. I am not sure if  ACPI subsystem is happy in kexec'd kernel. I have
not seen any problems, but I have not tested it enough either.

2. Stability of kexec'd kernel over long term. It ran fine for an hour
not doing much :)

Here is what I am working on next:

1. Save EFI memory map before it is trimmed.

2. Fix up /proc/iomem on ia64 so we can enable validating memory range
in kexec tools.

3. Add a /proc interface to enable reboots on panic and INIT (and
possibly MCA) to be kexec reboots.

4. Add initrd support.

Any feedback on these patches is welcome. Any patch to fix problems in
these patches is very much appreciated :)

-- 
Khalid

====================================================================
Khalid Aziz                                Linux and Open Source Lab
(970)898-9214                                        Hewlett-Packard
khalid_aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
				- Alessandro Rubini


[-- Attachment #2: 2.6.8.1-kexec3-ia64.diff --]
[-- Type: text/x-patch, Size: 19533 bytes --]

diff -urN linux-2.6.8/arch/ia64/Kconfig linux-2.6.8-ia64/arch/ia64/Kconfig
--- linux-2.6.8/arch/ia64/Kconfig	2004-08-13 23:38:04.000000000 -0600
+++ linux-2.6.8-ia64/arch/ia64/Kconfig	2004-11-12 09:32:23.000000000 -0700
@@ -278,6 +278,23 @@
 	  little bigger and slows down execution a bit, but it is generally
 	  a good idea to turn this on.  If you're unsure, say Y.
 
+config KEXEC
+	bool "kexec system call (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	help
+         kexec is a system call that implements the ability to shutdown your
+         current kernel, and to start another kernel.  It is like a reboot
+         but it is indepedent of the system firmware.   And like a reboot
+         you can start any kernel with it, not just Linux.  
+       
+         The name comes from the similiarity to the exec system call. 
+       
+         It is an ongoing process to be certain the hardware in a machine
+         is properly shutdown, so do not be surprised if this code does not
+         initially work for you.  It may help to enable device hotplugging
+         support.  As of this writing the exact hardware interface is
+         strongly in flux, so no good recommendation can be made.
+
 config IA64_PALINFO
 	tristate "/proc/pal support"
 	help
diff -urN linux-2.6.8/arch/ia64/kernel/Makefile linux-2.6.8-ia64/arch/ia64/kernel/Makefile
--- linux-2.6.8/arch/ia64/kernel/Makefile	2004-08-13 23:38:09.000000000 -0600
+++ linux-2.6.8-ia64/arch/ia64/kernel/Makefile	2004-11-12 09:32:23.000000000 -0700
@@ -17,6 +17,7 @@
 obj-$(CONFIG_SMP)		+= smp.o smpboot.o
 obj-$(CONFIG_PERFMON)		+= perfmon_default_smpl.o
 obj-$(CONFIG_IA64_CYCLONE)	+= cyclone.o
+obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o
 
 # The gate DSO image is built using a special linker script.
 targets += gate.so gate-syms.o
diff -urN linux-2.6.8/arch/ia64/kernel/efi.c linux-2.6.8-ia64/arch/ia64/kernel/efi.c
--- linux-2.6.8/arch/ia64/kernel/efi.c	2004-08-13 23:36:13.000000000 -0600
+++ linux-2.6.8-ia64/arch/ia64/kernel/efi.c	2004-11-15 11:05:39.000000000 -0700
@@ -17,6 +17,9 @@
  *
  * Goutham Rao: <goutham.rao@intel.com>
  *	Skip non-WB memory and ignore empty memory ranges.
+ *
+ * Nov 12, 2004: Added initial support for kexec 
+ * 				- Khalid Aziz <khalid.aziz@hp.com>
  */
 #include <linux/config.h>
 #include <linux/module.h>
@@ -37,6 +40,10 @@
 extern efi_status_t efi_call_phys (void *, ...);
 
 struct efi efi;
+#ifdef CONFIG_KEXEC
+unsigned long kexec_reboot = 0;
+unsigned long saved_efi_memmap_size;
+#endif
 EXPORT_SYMBOL(efi);
 static efi_runtime_services_t *runtime;
 static unsigned long mem_limit = ~0UL, max_addr = ~0UL;
@@ -464,6 +471,9 @@
 		 * Cannot write to CRx with PSR.ic=1
 		 */
 		psr = ia64_clear_ic();
+#if CONFIG_KEXEC
+		ia64_ptr(0x01, vaddr & mask, IA64_GRANULE_SHIFT);
+#endif
 		ia64_itr(0x1, IA64_TR_PALCODE, vaddr & mask,
 			 pte_val(pfn_pte(md->phys_addr >> PAGE_SHIFT, PAGE_KERNEL)),
 			 IA64_GRANULE_SHIFT);
@@ -503,6 +513,14 @@
 			if (end != cp)
 				break;
 			cp = end;
+#ifdef CONFIG_KEXEC
+		} else if (memcmp(cp, "kexec_reboot", 12) == 0) {
+			cp += 12;
+			kexec_reboot = 1;
+			if (end != cp)
+				break;
+			cp = end;
+#endif
 		} else {
 			while (*cp != ' ' && *cp)
 				++cp;
@@ -595,6 +613,9 @@
 	}
 #endif
 
+#ifdef CONFIG_KEXEC
+	saved_efi_memmap_size = ia64_boot_param->efi_memmap_size;
+#endif
 	efi_map_pal_code();
 	efi_enter_virtual_mode();
 }
@@ -647,10 +668,17 @@
 		}
 	}
 
-	status = efi_call_phys(__va(runtime->set_virtual_address_map),
+#ifdef CONFIG_KEXEC
+	if (kexec_reboot == 0)
+		status = efi_call_phys(__va(runtime->set_virtual_address_map),
 			       ia64_boot_param->efi_memmap_size,
 			       efi_desc_size, ia64_boot_param->efi_memdesc_version,
 			       ia64_boot_param->efi_memmap);
+	else {
+		printk(KERN_INFO "kexec'd kernel: Not virtualizing EFI\n");
+		status = EFI_SUCCESS;
+	}
+#endif
 	if (status != EFI_SUCCESS) {
 		printk(KERN_WARNING "warning: unable to switch EFI into virtual mode "
 		       "(status=%lu)\n", status);
diff -urN linux-2.6.8/arch/ia64/kernel/entry.S linux-2.6.8-ia64/arch/ia64/kernel/entry.S
--- linux-2.6.8/arch/ia64/kernel/entry.S	2004-08-13 23:36:32.000000000 -0600
+++ linux-2.6.8-ia64/arch/ia64/kernel/entry.S	2004-11-12 09:32:23.000000000 -0700
@@ -1525,7 +1525,7 @@
 	data8 sys_mq_timedreceive		// 1265
 	data8 sys_mq_notify
 	data8 sys_mq_getsetattr
-	data8 sys_ni_syscall			// reserved for kexec_load
+	data8 sys_kexec_load
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall			// 1270
 	data8 sys_ni_syscall
diff -urN linux-2.6.8/arch/ia64/kernel/machine_kexec.c linux-2.6.8-ia64/arch/ia64/kernel/machine_kexec.c
--- linux-2.6.8/arch/ia64/kernel/machine_kexec.c	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.8-ia64/arch/ia64/kernel/machine_kexec.c	2004-11-12 09:45:19.000000000 -0700
@@ -0,0 +1,179 @@
+/*
+ * machine_kexec.c - handle transition of Linux booting another kernel
+ * Copyright (C) 2004 Khalid Aziz <khalid.aziz@hp.com>
+ * Copyright (C) 2004 Hewlett Packard Development Co
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <asm/mmu_context.h>
+#include <asm/setup.h>
+#include <asm/mca.h>
+#include <asm/page.h>
+#include <asm/bitops.h>
+
+#define PHYS_UNCACHED_OFFSET	0x8000000000000000
+extern unsigned long ia64_iobase;
+static struct ia64_boot_param boot_param;
+extern unsigned long saved_efi_memmap_size;
+extern void *saved_efi_memmap;
+
+static void set_io_base(void)
+{
+	unsigned long phys_iobase;
+
+	/* set kr0 to iobase */
+	phys_iobase = __pa(ia64_iobase);
+	ia64_set_kr(IA64_KR_IO_BASE, PHYS_UNCACHED_OFFSET | phys_iobase);
+};
+
+typedef void (*relocate_new_kernel_t)(
+	unsigned long indirection_page, unsigned long start_address, 
+	unsigned long boot_param_address);
+
+//extern void relocate_new_kernel(unsigned long indirection_page, 
+//				unsigned long start_address,
+//				unsigned long boot_param_address);
+const extern unsigned long relocate_new_kernel[];
+const extern unsigned int relocate_new_kernel_size;
+extern void use_mm(struct mm_struct *mm);
+
+const extern unsigned char test_loader[];
+extern void test_loader_end(void);
+const extern unsigned int test_loader_size;
+
+volatile extern long kexec_cont;
+const extern unsigned char kexec_reloc[];
+extern long kexec_ptcebase, kexec_count0, kexec_count1;
+extern long kexec_stride0, kexec_stride1;
+extern long kexec_tlblist;
+
+
+/*
+ * Do what every setup is needed on image and the
+ * reboot code buffer to allow us to avoid allocations
+ * later.  Currently nothing.
+ */
+int machine_kexec_prepare(struct kimage *image)
+{
+	return 0;
+}
+
+void machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+void machine_shutdown(void)
+{
+#ifdef CONFIG_SMP
+	int reboot_cpu_id;
+
+	/* The boot cpu is always logical cpu 0 */
+	reboot_cpu_id = 0;
+
+	/* Make certain the cpu I'm rebooting on is online */
+	if (!cpu_isset(reboot_cpu_id, cpu_online_map)) {
+		reboot_cpu_id = smp_processor_id();
+	}
+
+	/* Make certain I only run on the appropriate processor */
+	set_cpus_allowed(current, cpumask_of_cpu(reboot_cpu_id));
+	
+	/* O.K. Now that I'm on the appropriate processor, flush
+	 * TLB on all other CPUs and stop all of the others.
+	 */
+
+	/*smp_flush_tlb_all();*/
+	smp_send_stop();
+#endif
+}
+
+/*
+ * Do not allocate memory (or fail in any way) in machine_kexec().
+ * We are past the point of no return, committed to rebooting now. 
+ */
+void machine_kexec(struct kimage *image)
+{
+	unsigned long indirection_page;
+	void *control_code_buffer;
+	relocate_new_kernel_t rnk;
+	unsigned char *cmdline;
+	int cpu;
+	void *efi_map_start;
+
+	/* Interrupts aren't acceptable while we reboot */
+	local_irq_disable();
+
+
+	control_code_buffer = ((unsigned long)phys_to_virt(page_to_pfn(image->control_code_page) << PAGE_SHIFT) & (unsigned long)0x1fffffffffffffffL) | __IA64_UNCACHED_OFFSET;
+	indirection_page = image->head & PAGE_MASK;
+
+	/* copy it out */
+	memcpy((void *)control_code_buffer, relocate_new_kernel, relocate_new_kernel_size);
+
+#if 0
+	/* Build boot parameter list */
+	boot_param.efi_systab = ia64_tpa(efi.systab);
+	boot_param.efi_memmap = ia64_boot_param->efi_memmap;
+	boot_param.efi_memmap_size = ia64_boot_param->efi_memmap_size;
+	boot_param.efi_memdesc_size = ia64_boot_param->efi_memdesc_size;
+	boot_param.efi_memdesc_version = ia64_boot_param->efi_memdesc_version;
+	boot_param.fpswa = ia64_boot_param->fpswa;
+#endif
+
+	kexec_cont = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_reloc -  (long) relocate_new_kernel;
+
+	/* Save PTCE data for cache flush later */
+        kexec_ptcebase    = local_cpu_data->ptce_base; 
+	kexec_count0  = local_cpu_data->ptce_count[0]; 
+	kexec_count1  = local_cpu_data->ptce_count[1];
+	kexec_stride0 = local_cpu_data->ptce_stride[0];
+        kexec_stride1 = local_cpu_data->ptce_stride[1];
+
+	/* Save PAL mapping for TR flush later */
+	cpu = smp_processor_id();
+	kexec_tlblist = &ia64_mca_tlb_list;
+
+	/* set kr0 to the appropriate address */
+	set_io_base();
+
+	/* now execute the control code 
+	 * We will start by executing the control code linked into the 
+	 * kernel as opposed to the code we copied in control code buffer		 * page. When this code switches to physical mode, we will start
+	 * executing the code in control code buffer page. Reason for
+	 * doing this is we start code execution in virtual address space.
+	 * If we were to try to execute the newly copied code in virtual
+	 * address space, we will need to make an ITLB entry to avoid ITLB 
+	 * miss. By executing the code linked into kernel, we take advantage
+	 * of the ITLB entry already in place of kernel and avoid making
+	 * a new entry.
+	 */
+	control_code_buffer = relocate_new_kernel;
+	rnk = &control_code_buffer;
+	strcat(saved_command_line, " kexec_reboot");
+	cmdline = __va(ia64_boot_param->command_line);
+	strlcpy(cmdline, saved_command_line, COMMAND_LINE_SIZE);
+	/* Restore original EFI memory map */
+	memcpy(__va(ia64_boot_param->efi_memmap), saved_efi_memmap, saved_efi_memmap_size);
+	ia64_boot_param->efi_memmap_size = saved_efi_memmap_size;
+
+	{
+		unsigned long pta, impl_va_bits;
+
+#       define pte_bits                 3
+#       define vmlpt_bits               (impl_va_bits - PAGE_SHIFT + pte_bits)
+#       define POW2(n)                  (1ULL << (n))
+
+		/* Disable VHPT */
+		impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+		pta = POW2(61) - POW2(vmlpt_bits);
+		ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+	}
+
+	rnk(indirection_page, image->start, ia64_boot_param);
+}
diff -urN linux-2.6.8/arch/ia64/kernel/relocate_kernel.S linux-2.6.8-ia64/arch/ia64/kernel/relocate_kernel.S
--- linux-2.6.8/arch/ia64/kernel/relocate_kernel.S	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.8-ia64/arch/ia64/kernel/relocate_kernel.S	2004-11-12 09:47:14.000000000 -0700
@@ -0,0 +1,228 @@
+/*
+ * relocate_kernel.S - put the kernel image in place to boot
+ * Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ * Copyright (C) 2004 Khalid Aziz <khalid.aziz@hp.com>
+ * Copyright (C) 2004 Hewlett Packard Development Co
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+#include <linux/config.h>
+#include <asm/asmmacro.h>
+#include <asm/kregs.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
+       /* Must be relocatable PIC code callable as a C function, that once
+        * it starts can not use the previous processes stack.
+        *
+        */
+       /* Q: Do I want to setup an interrupt vector, so what happens
+        * when exceptions occur is well defined?
+        */
+	.text
+	.align 32
+	.global relocate_new_kernel#
+	.proc relocate_new_kernel#
+relocate_new_kernel:
+	mf
+	;;
+	/* Save the ptce information for translation cache purge later */
+	movl	r25=kexec_cont
+	movl	r27=kexec_ptcebase
+	movl	r28=kexec_count0
+	;;
+	ld8	r17=[r25]
+	ld8	r22=[r27]
+	ld8	r20=[r28]
+	;;
+	movl	r25=kexec_count1
+	movl	r27=kexec_stride0
+	movl	r28=kexec_stride1
+	;;
+	ld8	r21=[r25]
+	ld8	r23=[r27]
+	ld8	r24=[r28]
+	;;
+	movl	r27=kexec_tlblist
+	adds 	r25=48,r27
+	;;
+	ld8	r26=[r25]
+
+	{
+		flushrs
+		srlz.i
+	}
+	;;
+       /* See where I am running, and compute gp */
+	{
+		mov     ar.rsc = 0      /* Put RSE in enforce lacy, LE mode */
+		mov     gp = ip         /* gp == relocate_new_kernel */
+	}
+
+	movl r8=0x00000100000000
+	;;
+	mov cr.iva=r8
+
+	/* Transition from virtual to physical mode */
+	rsm	psr.i | psr.ic
+	srlz.i
+	movl	r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+	;;
+	mov	cr.ipsr=r16
+	;;
+	mov	cr.iip=r17
+	mov	cr.ifs=r0
+	;;
+	rfi
+	;;
+	.global kexec_reloc
+kexec_reloc:     /* Now we are in physical mode */
+	/* Setup the memory stack */
+	add     r12=(memory_stack_end - relocate_new_kernel),gp
+	/* Setup the register stack */
+	add     r8=(register_stack - relocate_new_kernel),gp
+	;;
+	loadrs
+	;;
+	mov     ar.bspstore=r8
+	;;
+
+	/* Do the copies */
+	mov     r8=r32
+	mov     b6=r33
+	tpa     r28=r34
+	mov     r9=0
+	mov     r11=PAGE_SIZE
+	;;
+	/* top, read another word for the indirection page */
+top:   ld8     r10=[r8], 8
+	;;
+	tbit.nz p6,p0 = r10, 0  /* Is it a destination page? */
+	tbit.nz p7,p0 = r10, 1  /* Is it an indirection page? */
+	tbit.nz p8,p0 = r10, 3  /* Is it the source indicator? */
+	tbit.nz p9,p0 = r10, 2  /* Is it the done indicator? */
+	addl	r19 = -4096, r0
+	;;
+	and	r10 = r10, r19	/* Clear the low 12 bits of r10 */
+	;;
+(p6)   mov     r9 = r10        /* destination addr */
+(p7)   mov     r8 = r10        /* indirection addr */
+(p8)   br.cond.sptk.few        source
+(p9)   br.cond.sptk.few        done
+	br.cond.sptk.few        top
+source:
+	add     r16 = r11, r10
+	add     r14 = 8, r10
+	add     r15 = 8, r9
+	;;
+0:
+	ld8     r17 = [r10],16
+	ld8     r18 = [r14],16
+	;;
+	st8     [r9]  = r17, 16
+	st8     [r15] = r18, 16
+	cmp.ne  p6,p0 = r16, r10
+	;;
+(p6)   br.cond.sptk.few        0b
+	br.cond.sptk.few        top
+done:
+	srlz.i
+	srlz.d
+	;;
+
+	/* Now purge local tlb */
+	mov r19 = r0
+	adds	r21=-1,r20
+	;;
+2:
+	cmp.ltu	p6,p7=r19,r20
+(p7)	br.cond.dpnt.few	4f
+	mov	ar.lc=r21
+3:
+	ptc.e	r22
+	;;
+	add	r22=r24,r22
+	br.cloop.sptk.few	3b
+	;;
+	add	r22=r23,r22
+	add	r19=1,r19
+	;;
+	br.sptk.few	2b
+4:
+	srlz.i ;;
+	
+       // Now purge addresses formerly mapped by TR registers
+	// Purge ITR&DTR for kernel.
+	movl r16=KERNEL_START
+	mov r18=KERNEL_TR_PAGE_SHIFT<<2
+	;;
+	ptr.i r16, r18
+	ptr.d r16, r18
+	;;
+	srlz.i
+	;;
+	srlz.d
+	;;
+	// Purge DTR for PERCPU data.
+	movl r16=PERCPU_ADDR
+	mov r18=PERCPU_PAGE_SHIFT<<2
+	;;
+	ptr.d r16,r18
+	;;
+	srlz.d
+	;;
+	// Purge ITR for PAL code
+	mov r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.i r26,r18
+	;;
+	srlz.i
+	;;
+	// Purge DTR for stack.
+	mov r16=IA64_KR(CURRENT_STACK)
+	;;
+	shl r16=r16,IA64_GRANULE_SHIFT
+	movl r19=PAGE_OFFSET
+	;;
+	add r16=r19,r16
+	mov r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.d r16,r18
+	;;
+	srlz.i
+	;;
+
+	br.sptk.few		b6
+	br.cond.sptk.few        0b
+	.endp relocate_new_kernel#
+
+	.balign 8192
+relocate_new_kernel_end:
+	.global relocate_new_kernel_size
+relocate_new_kernel_size:
+	.long relocate_new_kernel_end - relocate_new_kernel
+
+	.global kexec_cont
+	.align 8
+kexec_cont:	data8 0xdeadbeefdeadbeef
+	.global kexec_ptcebase
+kexec_ptcebase:	data8 0xdeadbeefdeadbeef
+	.global kexec_count0
+kexec_count0:	data8 0xdeadbeefdeadbeef
+	.global kexec_count1
+kexec_count1:	data8 0xdeadbeefdeadbeef
+	.global kexec_stride0
+kexec_stride0:	data8 0xdeadbeefdeadbeef
+	.global kexec_stride1
+kexec_stride1:	data8 0xdeadbeefdeadbeef
+	.global kexec_tlblist
+kexec_tlblist:	data8 0xdeadbeefdeadbeef
+
+
+register_stack:
+	.fill           8192, 1, 0
+register_stack_end:
+memory_stack:
+	.fill           8192, 1, 0
+memory_stack_end:
diff -urN linux-2.6.8/arch/ia64/mm/contig.c linux-2.6.8-ia64/arch/ia64/mm/contig.c
--- linux-2.6.8/arch/ia64/mm/contig.c	2004-08-13 23:36:45.000000000 -0600
+++ linux-2.6.8-ia64/arch/ia64/mm/contig.c	2004-11-15 12:22:15.000000000 -0700
@@ -29,6 +29,11 @@
 static unsigned long num_dma_physpages;
 #endif
 
+#ifdef CONFIG_KEXEC
+void *saved_efi_memmap;
+extern unsigned long saved_efi_memmap_size;
+#endif
+
 /**
  * show_mem - display a memory statistics summary
  *
@@ -164,6 +169,11 @@
 	/* Free all available memory, then mark bootmem-map as being in use. */
 	efi_memmap_walk(filter_rsvd_memory, free_bootmem);
 	reserve_bootmem(bootmap_start, bootmap_size);
+#ifdef CONFIG_KEXEC
+	/* Save EFI memory map for use later when kexec'ing a kernel */
+	saved_efi_memmap = alloc_bootmem(saved_efi_memmap_size);
+	memcpy(saved_efi_memmap, __va(ia64_boot_param->efi_memmap), saved_efi_memmap_size);
+#endif
 
 	find_initrd();
 }
diff -urN linux-2.6.8/arch/ia64/mm/discontig.c linux-2.6.8-ia64/arch/ia64/mm/discontig.c
--- linux-2.6.8/arch/ia64/mm/discontig.c	2004-11-12 09:48:47.000000000 -0700
+++ linux-2.6.8-ia64/arch/ia64/mm/discontig.c	2004-11-15 12:29:40.896657633 -0700
@@ -40,6 +40,11 @@
 
 static struct early_node_data mem_data[NR_NODES] __initdata;
 
+#ifdef CONFIG_KEXEC
+void *saved_efi_memmap;
+extern unsigned long saved_efi_memmap_size;
+#endif
+
 /**
  * reassign_cpu_only_nodes - called from find_memory to move CPU-only nodes to a memory node
  *
@@ -459,6 +464,12 @@
 	reserve_pernode_space();
 	initialize_pernode_data();
 
+#ifdef CONFIG_KEXEC
+	/* Save EFI memory map for use later when kexec'ing a kernel */
+	saved_efi_memmap = alloc_bootmem(saved_efi_memmap_size);
+	memcpy(saved_efi_memmap, __va(ia64_boot_param->efi_memmap), saved_efi_memmap_size);
+#endif
+
 	max_pfn = max_low_pfn;
 
 	find_initrd();
diff -urN linux-2.6.8/include/asm-ia64/kexec.h linux-2.6.8-ia64/include/asm-ia64/kexec.h
--- linux-2.6.8/include/asm-ia64/kexec.h	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.8-ia64/include/asm-ia64/kexec.h	2004-11-12 09:32:23.000000000 -0700
@@ -0,0 +1,14 @@
+#ifndef _ASM_IA64_KEXEC_H
+#define _ASM_IA64_KEXEC_H
+
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
+
+#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+
+#endif /* _ASM_IA64_KEXEC_H */
diff -urN linux-2.6.8/include/asm-ia64/mmu_context.h linux-2.6.8-ia64/include/asm-ia64/mmu_context.h
--- linux-2.6.8/include/asm-ia64/mmu_context.h	2004-08-13 23:36:16.000000000 -0600
+++ linux-2.6.8-ia64/include/asm-ia64/mmu_context.h	2004-11-12 09:32:23.000000000 -0700
@@ -203,5 +203,7 @@
 
 #define switch_mm(prev_mm,next_mm,next_task)	activate_mm(prev_mm, next_mm)
 
+extern void use_mm(struct mm_struct *mm);
+
 # endif /* ! __ASSEMBLY__ */
 #endif /* _ASM_IA64_MMU_CONTEXT_H */
diff -urN linux-2.6.8/kernel/sys.c linux-2.6.8-ia64/kernel/sys.c
--- linux-2.6.8/kernel/sys.c	2004-11-12 09:28:23.000000000 -0700
+++ linux-2.6.8-ia64/kernel/sys.c	2004-11-12 09:32:23.000000000 -0700
@@ -516,7 +516,7 @@
 			return -EINVAL;
 		}
 		notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
-		system_state = SYSTEM_BOOTING;
+		system_state = SYSTEM_RESTART;
 		device_shutdown();
 		printk(KERN_EMERG "Starting new kernel\n");
 		machine_shutdown();

[-- Attachment #3: kexec-tools-1.98-ia64.diff --]
[-- Type: text/x-patch, Size: 14472 bytes --]

diff -urN kexec-tools-1.98/Makefile.main kexec-tools-1.98-ia64/Makefile.main
--- kexec-tools-1.98/Makefile.main	2004-09-15 03:49:17.000000000 -0600
+++ kexec-tools-1.98-ia64/Makefile.main	2004-11-15 12:05:36.000000000 -0700
@@ -22,6 +22,7 @@
 BINARIES_ppc:=$(SBINDIR)/kexec
 BINARIES_ppc64:=$(SBINDIR)/kexec
 BINARIES_x86_64:=$(SBINDIR)/kexec $(BINDIR)/kexec_test
+BINARIES_ia64:=$(SBINDIR)/kexec
 BINARIES:=$(BINARIES_$(ARCH))
 
 TARGETS:=$(BINARIES) $(MAN_PAGES)
@@ -41,6 +42,12 @@
 	@echo ALPHA_AR=$(ALPHA_AR)
 	@echo ALPHA_LD=$(ALPHA_LD)
 
+install:
+	mkdir -p $(DESTDIR)/usr/sbin
+	install -m700 $(SBINDIR)/kexec $(DESTDIR)/usr/sbin
+	mkdir -p $(DESTDIR)/usr/share/man/man8
+	install -m644 kexec/kexec.8 $(DESTDIR)/usr/share/man/man8
+
 clean:
 	find $(OBJDIR) ! -name '*.d' -type f | xargs rm -f
 
diff -urN kexec-tools-1.98/kexec/Makefile kexec-tools-1.98-ia64/kexec/Makefile
--- kexec-tools-1.98/kexec/Makefile	2004-07-19 02:47:19.000000000 -0600
+++ kexec-tools-1.98-ia64/kexec/Makefile	2004-11-15 11:27:00.000000000 -0700
@@ -25,6 +25,10 @@
 ifeq ($(ARCH),ppc64)
 KEXEC_C_SRCS+= kexec/kexec-zImage-ppc64.c
 endif
+ifeq ($(ARCH),ia64)
+KEXEC_C_SRCS+= kexec/kexec-ia64.c kexec/kexec-elf64-ia64.c
+KEXEC_C_SRCS+= kexec/ifdown.c
+endif
 
 KEXEC_C_OBJS:= $(patsubst %.c, $(OBJDIR)/%.o, $(KEXEC_C_SRCS))
 KEXEC_C_DEPS:= $(patsubst %.c, $(OBJDIR)/%.d, $(KEXEC_C_SRCS))
diff -urN kexec-tools-1.98/kexec/kexec-elf64-ia64.c kexec-tools-1.98-ia64/kexec/kexec-elf64-ia64.c
--- kexec-tools-1.98/kexec/kexec-elf64-ia64.c	1969-12-31 17:00:00.000000000 -0700
+++ kexec-tools-1.98-ia64/kexec/kexec-elf64-ia64.c	2004-11-15 11:27:00.000000000 -0700
@@ -0,0 +1,306 @@
+/*
+ * kexec: Linux boots Linux
+ *
+ * Copyright (C) 2003,2004  Eric Biederman (ebiederm@xmission.com)
+ * Copyright (C) 2004 Albert Herranz
+ * Copyright (C) 2004 Silicon Graphics, Inc.
+ *   Jesse Barnes <jbarnes@sgi.com>
+ * Copyright (C) 2004 Khalid Aziz <khalid.aziz@hp.com> Hewlett Packard Co
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#define _GNU_SOURCE
+#include <stddef.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <stdint.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <getopt.h>
+#include <elf.h>
+#include <boot/elf_boot.h>
+#include <ip_checksum.h>
+#include "kexec.h"
+
+#define MAX_MEMORY_RANGES 64
+#define MAX_LINE 160
+static struct memory_range memory_range[MAX_MEMORY_RANGES];
+
+static int debug = 0;
+
+#define BOOTLOADER         "kexec"
+#define BOOTLOADER_VERSION VERSION
+#define MAX_COMMAND_LINE   256
+
+/*
+ * elf64_ia64_probe - sanity check the elf image
+ *
+ * Make sure that the file image has a reasonable chance of working.
+ */
+int elf64_ia64_probe(FILE * file)
+{
+	Elf64_Ehdr ehdr;
+	if (fseek(file, 0, SEEK_SET) < 0) {
+		fprintf(stderr, "seek error: %s\n", strerror(errno));
+		return -1;
+	}
+	if (fread(&ehdr, sizeof(ehdr), 1, file) != 1) {
+		if (debug) {
+			fprintf(stderr, "File too short.\n");
+		}
+		return 0;
+	}
+	if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0) {
+		/* No ELF header */
+		if (debug) {
+			fprintf(stderr, "No ELF header.\n");
+		}
+		return 0;
+	}
+	if (ehdr.e_ident[EI_CLASS] != ELFCLASS64) {
+		/* Not a 64bit ELF file */
+		if (debug) {
+			fprintf(stderr, "Not a 64bit ELF file.\n");
+		}
+		return 0;
+	}
+	if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
+		/* not a little endian ELF file */
+		if (debug) {
+			fprintf(stderr, "Not a little endian ELF file.\n");
+		}
+		return 0;
+	}
+	if ((ehdr.e_ident[EI_VERSION] != EV_CURRENT) ||
+	    (ehdr.e_version != EV_CURRENT)) {
+		/* unkown elf version */
+		if (debug) {
+			fprintf(stderr, "Unknown ELF file version.\n");
+		}
+		return 0;
+	}
+	if (ehdr.e_type != ET_EXEC) {
+		/* not an ELF executable */
+		if (debug) {
+			fprintf(stderr, "Not an ELF executable.\n");
+		}
+		return 0;
+	}
+
+	if (ehdr.e_ehsize != sizeof(Elf64_Ehdr)) {
+		/* invalid ELF header size */
+		if (debug) {
+			fprintf(stderr, "Invalid ELF header size.\n");
+		}
+		return 0;
+	}
+	if (ehdr.e_phentsize != sizeof(Elf64_Phdr)) {
+		/* invalid program header size */
+		if (debug) {
+			fprintf(stderr, "Invalid program header size.\n");
+		}
+		return 0;
+	}
+	if ((ehdr.e_phoff == 0) || (ehdr.e_phnum == 0)) {
+		/* no program header */
+		if (debug) {
+			fprintf(stderr, "No program header.\n");
+		}
+		return 0;
+	}
+	/* Verify the architecuture specific bits */
+	if (ehdr.e_machine != EM_IA_64) {
+		/* for a different architecture */
+		if (debug) {
+			fprintf(stderr, "Not for this architecture.\n");
+		}
+		return 0;
+	}
+	return 1;
+}
+
+void elf64_ia64_usage(void)
+{
+	printf
+	    ("-d, --debug               Enable debugging to help spot a failure.\n"
+	     "    --command-line=STRING Set the kernel command line to STRING.\n"
+	     "    --append=STRING       Set the kernel command line to STRING.\n");
+}
+
+int elf64_ia64_load(FILE * file, int argc, char **argv,
+		    void **ret_entry, struct kexec_segment **ret_segments,
+		    int *ret_nr_segments)
+{
+	Elf64_Ehdr ehdr;
+	Elf64_Phdr *phdr;
+	struct kexec_segment *segment;
+	int nr_segments;
+	size_t phdr_bytes;
+	const char *command_line;
+	int command_line_len;
+	unsigned long mstart;
+	char *buf;
+	size_t size;
+	int i;
+	int opt;
+#define OPT_APPEND	(OPT_MAX+0)
+	static const struct option options[] = {
+		KEXEC_OPTIONS
+		{"debug", 0, 0, OPT_DEBUG},
+		{"command-line", 1, 0, OPT_APPEND},
+		{"append", 1, 0, OPT_APPEND},
+		{0, 0, 0, 0},
+	};
+
+	static const char short_options[] = KEXEC_OPT_STR "d";
+
+	debug = 0;
+	command_line = 0;
+	while ((opt = getopt_long(argc, argv, short_options, options, 0)) != -1) {
+		switch (opt) {
+		default:
+			/* Ignore core options */
+			if (opt < OPT_MAX) {
+				break;
+			}
+		case '?':
+			usage();
+			return -1;
+		case OPT_DEBUG:
+			debug = 1;
+			break;
+		case OPT_APPEND:
+			command_line = optarg;
+			break;
+		}
+	}
+	command_line_len = 0;
+	if (command_line) {
+		command_line_len = strlen(command_line) + 1;
+	}
+
+	/* Read in the Elf header */
+	if (fseek(file, 0, SEEK_SET) != 0) {
+		fprintf(stderr, "seek error: %s\n", strerror(errno));
+		return -1;
+	}
+	if (fread(&ehdr, sizeof(ehdr), 1, file) != 1) {
+		fprintf(stderr, "read error: %s\n", strerror(errno));
+		return -1;
+	}
+	/* do a sanity check on header */
+	if (ehdr.e_ident[EI_MAG0]  != 0x7f
+			|| ehdr.e_ident[EI_MAG1] != 'E'
+			|| ehdr.e_ident[EI_MAG2] != 'L'
+			|| ehdr.e_ident[EI_MAG3] != 'F'
+			|| ehdr.e_ident[EI_CLASS] != ELFCLASS64
+			|| ehdr.e_type != ET_EXEC
+			|| ehdr.e_machine != EM_IA_64) {
+		fprintf(stderr, "Not an elf64 executable\n");
+		return -1;
+	}
+	/* Read in the program header */
+	phdr_bytes = sizeof(*phdr) * ehdr.e_phnum;
+	phdr = malloc(phdr_bytes);
+	if (phdr == 0) {
+		fprintf(stderr, "malloc failed: %s\n", strerror(errno));
+		return -1;
+	}
+	if (fseek(file, ehdr.e_phoff, SEEK_SET) != 0) {
+		fprintf(stderr, "seek error: %s\n", strerror(errno));
+		return -1;
+	}
+	if (fread(phdr, phdr_bytes, 1, file) != 1) {
+		fprintf(stderr, "read error: %s\n", strerror(errno));
+		return -1;
+	}
+
+	/* Setup the segments */
+	segment = malloc(sizeof(*segment) * (ehdr.e_phnum + 1));
+	if (segment == 0) {
+		fprintf(stderr, "malloc failed: %s\n", strerror(errno));
+		return -1;
+	}
+
+	/* Skip the argument segment */
+	nr_segments = 0;
+	segment[nr_segments].buf = 0;
+	segment[nr_segments].bufsz = 0;
+	segment[nr_segments].mem = 0;
+	segment[nr_segments].memsz = 0;
+	nr_segments++;
+
+	/* Now all the rest of the segments */
+	for (i = 0; i < ehdr.e_phnum; i++) {
+		if (phdr[i].p_type != PT_LOAD) {
+			continue;
+		}
+		size = phdr[i].p_filesz;
+		if (size > phdr[i].p_memsz) {
+			size = phdr[i].p_memsz;
+		}
+		buf = malloc(size);
+		if (buf == 0) {
+			fprintf(stderr, "malloc failed: %s\n", strerror(errno));
+			return -1;
+		}
+		segment[nr_segments].buf = buf;
+		segment[nr_segments].bufsz = size;
+		mstart = phdr[i].p_paddr;
+		segment[nr_segments].mem = (void *)mstart;
+		segment[nr_segments].memsz = phdr[i].p_memsz;
+		if (valid_memory_range(segment + nr_segments) < 0) {
+			fprintf(stderr, "Invalid memory segment %p - %p\n",
+				segment[nr_segments].mem,
+				((char *)segment[nr_segments].mem) +
+				segment[nr_segments].memsz);
+			return -1;
+		}
+		nr_segments++;
+		if (size == 0) {
+			/* Don't do file I/O if there is nothing in the file */
+			continue;
+		}
+		if (fseek(file, phdr[i].p_offset, SEEK_SET) != 0) {
+			fprintf(stderr, "seek failed: %s\n", strerror(errno));
+			return -1;
+		}
+		if (fread(buf, size, 1, file) != 1) {
+			fprintf(stderr, "Read failed: %s\n", strerror(errno));
+			return -1;
+		}
+	}
+
+#if 0
+	/* Generate and setup the argument segment */
+	if (sort_segments(segment, nr_segments) < 0) {
+		return -1;
+	}
+
+	if (locate_arguments(segment, nr_segments, 4, ~0UL) < 0) {
+		return -1;
+	}
+
+#endif
+	*ret_entry = ehdr.e_entry;
+	*ret_nr_segments = nr_segments;
+	*ret_segments = segment;
+	return 0;
+}
diff -urN kexec-tools-1.98/kexec/kexec-ia64.c kexec-tools-1.98-ia64/kexec/kexec-ia64.c
--- kexec-tools-1.98/kexec/kexec-ia64.c	1969-12-31 17:00:00.000000000 -0700
+++ kexec-tools-1.98-ia64/kexec/kexec-ia64.c	2004-11-15 11:27:00.000000000 -0700
@@ -0,0 +1,47 @@
+/*
+ * kexec: Linux boots Linux
+ *
+ * Copyright (C) 2003,2004  Eric Biederman (ebiederm@xmission.com)
+ * Copyright (C) 2004 Albert Herranz
+ * Copyright (C) 2004 Silicon Graphics, Inc.
+ *   Jesse Barnes <jbarnes@sgi.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#define _GNU_SOURCE
+#include <stddef.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdint.h>
+#include <string.h>
+#include "kexec.h"
+#include "kexec-ia64.h"
+
+#define MAX_MEMORY_RANGES 64
+#define MAX_LINE 160
+static struct memory_range memory_range[MAX_MEMORY_RANGES];
+
+/* Return a sorted list of available memory ranges. */
+int get_memory_ranges(struct memory_range **range, int *ranges)
+{
+	return 0;
+}
+
+/* Supported file types and callbacks */
+struct file_type file_type[] = {
+       {"elf64-ia64", elf64_ia64_probe, elf64_ia64_load, elf64_ia64_usage},
+};
+int file_types = sizeof(file_type) / sizeof(file_type[0]);
+
diff -urN kexec-tools-1.98/kexec/kexec-ia64.h kexec-tools-1.98-ia64/kexec/kexec-ia64.h
--- kexec-tools-1.98/kexec/kexec-ia64.h	1969-12-31 17:00:00.000000000 -0700
+++ kexec-tools-1.98-ia64/kexec/kexec-ia64.h	2004-11-15 11:27:00.000000000 -0700
@@ -0,0 +1,9 @@
+#ifndef KEXEC_IA64_H
+#define KEXEC_IA64_H
+
+int elf64_ia64_probe(FILE *file);
+int elf64_ia64_load(FILE *file, int argc, char **argv,
+	void **ret_entry, struct kexec_segment **ret_segments, int *ret_nr_segments);
+void elf64_ia64_usage(void);
+
+#endif /* KEXEC_IA64_H */
diff -urN kexec-tools-1.98/kexec/kexec.8 kexec-tools-1.98-ia64/kexec/kexec.8
--- kexec-tools-1.98/kexec/kexec.8	1969-12-31 17:00:00.000000000 -0700
+++ kexec-tools-1.98-ia64/kexec/kexec.8	2004-11-15 11:27:00.000000000 -0700
@@ -0,0 +1,45 @@
+.\"                                      Hey, EMACS: -*- nroff -*-
+.\" First parameter, NAME, should be all caps
+.\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
+.\" other parameters are allowed: see man(7), man(1)
+.TH KEXEC-TOOLS 8 "October 13, 2004"
+.\" Please adjust this date whenever revising the manpage.
+.\"
+.\" Some roff macros, for reference:
+.\" .nh        disable hyphenation
+.\" .hy        enable hyphenation
+.\" .ad l      left justify
+.\" .ad b      justify to both left and right margins
+.\" .nf        disable filling
+.\" .fi        enable filling
+.\" .br        insert line break
+.\" .sp <n>    insert n+1 empty lines
+.\" for manpage-specific macros, see man(7)
+.SH NAME
+kexec-tools \- Tool to load a kernel for warm reboot and initiate a warm reboot
+.SH SYNOPSIS
+.B kexec-tools
+.RI [ options ] " files" ...
+.SH DESCRIPTION
+.PP
+.\" TeX users may be more comfortable with the \fB<whatever>\fP and
+.\" \fI<whatever>\fP escape sequences to invode bold face and italics, 
+.\" respectively.
+\fBkexec-tools\fP does not have a man page yet. Please use "kexec -h" for help.
+.SH OPTIONS
+These programs follow the usual GNU command line syntax, with long
+options starting with two dashes (`-').
+A summary of options is included below.
+For a complete description, see the Info files.
+.TP
+.B \-h, \-\-help
+Show summary of options.
+.TP
+.B \-v, \-\-version
+Show version of program.
+.SH SEE ALSO
+.SH AUTHOR
+kexec-tools was written by Eric Biederman.
+.PP
+This manual page was written by Khalid Aziz <khalid_aziz@hp.com>,
+for the Debian project (but may be used by others).
diff -urN kexec-tools-1.98/kexec/kexec.c kexec-tools-1.98-ia64/kexec/kexec.c
--- kexec-tools-1.98/kexec/kexec.c	2004-08-20 00:07:07.000000000 -0600
+++ kexec-tools-1.98-ia64/kexec/kexec.c	2004-11-15 12:54:22.134096799 -0700
@@ -35,6 +35,16 @@
 {
 	int i;
 
+#ifdef __ia64__
+	/*
+	 * /proc/iomem on ia64 does not show where all memory is. If
+	 * that is fixed up, we can make use of that to validate
+	 * the memory range kernel will be loade din. Until then.....
+	 * -- Khalid Aziz
+	 */
+	return 0;
+#endif
+
 	for (i = 0; i < memory_ranges; i++) {
 		unsigned long mstart, mend;
 		unsigned long sstart, send;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
@ 2004-11-15 20:41 Khalid Aziz
  2004-11-16  3:46 ` Khalid Aziz
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-15 20:41 UTC (permalink / raw)
  To: linux-ia64

On Mon, 2004-11-15 at 13:32, Khalid Aziz wrote:
> Here is what I am working on next:
> 
> 1. Save EFI memory map before it is trimmed.
> 
> 2. Fix up /proc/iomem on ia64 so we can enable validating memory range
> in kexec tools.
> 
> 3. Add a /proc interface to enable reboots on panic and INIT (and
> possibly MCA) to be kexec reboots.
> 
> 4. Add initrd support.

And

5. Port the patch to 2.6.9 kernel :) Or 2.6.10 if I do not get to it
soon enough.

-- 
Khalid

==================================
Khalid Aziz                                Linux and Open Source Lab
(970)898-9214                                        Hewlett-Packard
khalid_aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
				- Alessandro Rubini



^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
@ 2004-11-15 21:15 ` Luck, Tony
  2004-11-15 22:03 ` David Mosberger
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2004-11-15 21:15 UTC (permalink / raw)
  To: linux-ia64

>Here is what I am working on next:
>
>1. Save EFI memory map before it is trimmed.

This code has been "evolving" for a long time now, more layers
get addded to solve each new problem.  If you get time, please
step back about a half-mile and take a look at the big picture
and see you you can see a better way to do the scanning and
trimming and re-scanning.  The overall problem statement (ignore
anything except complete granules, honour the command-line arguments
max_mem/max_addr, allocate a temporary bitmap for bootmem) seems
like it shouldn't require such complex code :-)  You can add your
own new requirement to not modify the original EFI tables so that
they can be re-scanned by a new kernel after kexec (new kernel
might have a different granule size).

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
  2004-11-15 21:15 ` Luck, Tony
@ 2004-11-15 22:03 ` David Mosberger
  2004-11-15 22:14 ` Khalid Aziz
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-11-15 22:03 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Mon, 15 Nov 2004 13:15:25 -0800, "Luck, Tony" <tony.luck@intel.com> said:


  Tony> You can add your own new requirement to not modify the
  Tony> original EFI tables so that they can be re-scanned by a new
  Tony> kernel after kexec (new kernel might have a different granule
  Tony> size).

That certainly would be the right way to go about it.  It would also
make it less likely that something else might get confused when
changing the memory map underneath it.

	--david

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
  2004-11-15 21:15 ` Luck, Tony
  2004-11-15 22:03 ` David Mosberger
@ 2004-11-15 22:14 ` Khalid Aziz
  2004-11-16 17:28 ` Khalid Aziz
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-15 22:14 UTC (permalink / raw)
  To: linux-ia64

On Mon, 2004-11-15 at 14:15, Luck, Tony wrote:
> >Here is what I am working on next:
> >
> >1. Save EFI memory map before it is trimmed.
> 
> This code has been "evolving" for a long time now, more layers
> get addded to solve each new problem.  If you get time, please
> step back about a half-mile and take a look at the big picture
> and see you you can see a better way to do the scanning and
> trimming and re-scanning.  The overall problem statement (ignore
> anything except complete granules, honour the command-line arguments
> max_mem/max_addr, allocate a temporary bitmap for bootmem) seems
> like it shouldn't require such complex code :-)  You can add your
> own new requirement to not modify the original EFI tables so that
> they can be re-scanned by a new kernel after kexec (new kernel
> might have a different granule size).
> 
> -Tony

Tony,

I definitely like this idea better. I have been talking to another
developer who is struggling with efi_mem_map_walk() trimming original
EFI memory map for "mem=" and "max_addr=". We have discussed separating
efi_mem_map_walk() into three separate routines, one to simply walk
memory map and compute the physical memory size without touching map,
one to trim memory map for granule size and one to trim memory map for
"mem=" and "max_addr=". This will allow us to save an untouched memory
map in between calls to these routines. Now that I know you guys are
open to something like this, we will pursue it further :)

-- 
Khalid

==================================
Khalid Aziz                                Linux and Open Source Lab
(970)898-9214                                        Hewlett-Packard
khalid_aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
				- Alessandro Rubini



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
  2004-11-15 20:41 Khalid Aziz
@ 2004-11-16  3:46 ` Khalid Aziz
  2006-04-05  0:36 ` Zou, Nanhai
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-16  3:46 UTC (permalink / raw)
  To: linux-ia64

Another limitation I forgot to mention. I have not added support for
compressed kernel to kexec-tools yet. It is on my list of things to do
next.

--
Khalid

On Mon, 2004-11-15 at 13:32, Khalid Aziz wrote:
> I have been able to get kexec working on ia64. I am attaching the kernel
> patch and kexec-tools patch. For the kernel patch, start with 2.6.8
> kernel from kernel.org, apply ia64 patch
> <http://www.kernel.org/pub/linux/kernel/ports/ia64/v2.6/linux-2.6.8-ia64-040901.diff.bz2>, apply Eric' 2.6.8.1-kexec3 patch <http://www.xmission.com/~ebiederm/files/kexec/2.6.8.1-kexec3> and apply attached 2.6.8.1-kexec3-ia64.diff patch. For kexec-tools, apply attached kexec-tools-1.98-ia64.diff patch to Eric's kexec-tools 1.98 sources <http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.98.tgz>.
> 
> At this point, I have done minimal testing. Here is what I know does not
> work currently:
> 
> 1. No support for initrd for kexec'd kernel
> 
> 2. No support for new kernel parameters for kexec'd kernel.
> 
> 3. If a kernel is booted up with "mem=" or "max_addr=" to restrict the
> amount of memory, a kernel kexec'd from this kernel will only see the
> same amount of memory as this one. This is not only due to the new
> kernel being kexec'd with the same parameter, but also becuase the EFI
> memory map as passed to the kernel by ELILO gets trimmed very early on
> by the kernel. I have tried adding code to save the memory map early on
> and then pass this saved memory map to kexec'd kernel, but apparently I
> still am not saving it early enough. I wait until bootmem allocator has
> been initailized so I can allocate memory to save unmolested EFI memory
> map in. In the process of initializing bootmem allocator, kernel calls
> efi-Memory_map_walk() which in turn trims the memory map. I am looking
> into allocating memory out of the EFI memory map before the first
> efi_mem_map_walk() happens, so I can save pristine EFI memmap for use
> later by kexec.
> 
> Here is what I have not tested yet:
> 
> 1. I am not sure if  ACPI subsystem is happy in kexec'd kernel. I have
> not seen any problems, but I have not tested it enough either.
> 
> 2. Stability of kexec'd kernel over long term. It ran fine for an hour
> not doing much :)
> 
> Here is what I am working on next:
> 
> 1. Save EFI memory map before it is trimmed.
> 
> 2. Fix up /proc/iomem on ia64 so we can enable validating memory range
> in kexec tools.
> 
> 3. Add a /proc interface to enable reboots on panic and INIT (and
> possibly MCA) to be kexec reboots.
> 
> 4. Add initrd support.
> 
> Any feedback on these patches is welcome. Any patch to fix problems in
> these patches is very much appreciated :)
-- 
Khalid Aziz                                 Linux and Open Source Lab
(970)898-9214                                         Hewlett-Packard
khalid_aziz@hp.com                                   Fort Collins, CO


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (2 preceding siblings ...)
  2004-11-15 22:14 ` Khalid Aziz
@ 2004-11-16 17:28 ` Khalid Aziz
  2005-10-25 22:52 ` Khalid Aziz
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-16 17:28 UTC (permalink / raw)
  To: linux-ia64

I have noticed that on x86, trimming memory with "mem=" has no effect on
RAM reported by /proc/iomem. I assume we want the same behavior on ia64.
This would mean we definitely need to save an untrimmed EFI memory map.

--
Khalid

On Mon, 2004-11-15 at 15:03, David Mosberger wrote:
> >>>>> On Mon, 15 Nov 2004 13:15:25 -0800, "Luck, Tony" <tony.luck@intel.com> said:
> 
> 
>   Tony> You can add your own new requirement to not modify the
>   Tony> original EFI tables so that they can be re-scanned by a new
>   Tony> kernel after kexec (new kernel might have a different granule
>   Tony> size).
> 
> That certainly would be the right way to go about it.  It would also
> make it less likely that something else might get confused when
> changing the memory map underneath it.
> 
> 	--david
-- 

==================================
Khalid Aziz                                Linux and Open Source Lab
(970)898-9214                                        Hewlett-Packard
khalid_aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
				- Alessandro Rubini



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (3 preceding siblings ...)
  2004-11-16 17:28 ` Khalid Aziz
@ 2005-10-25 22:52 ` Khalid Aziz
  2005-10-26 18:28 ` Gerald Pfeifer
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2005-10-25 22:52 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

I have ported the original patch I had done for kexec on ia64 on 2.6.8
kernel and fixed a few bugs in the original patch. Attached is a patch
for kernel 2.6.14-rc4. It works with normal kexec reboot on an HP
rx2600. I am now working on adding support for crash kexec. I am also
working on kexec on INIT which I currently have working on 2.6.10
kernel. I am porting it to 2.6.14-rc kernel.

Attached patch needs to be applied on top of iomem and efi_memmapwalk
patches already in ia64 test tree (these patches attached as well for
those who may need them). 

Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>

-- 
Khalid

====================================================================
Khalid Aziz                       Open Source and Linux Organization
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
                                - Alessandro Rubini

[-- Attachment #2: iomem-2.6.14-rc4.patch --]
[-- Type: text/x-patch, Size: 3518 bytes --]

--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -923,3 +923,90 @@ efi_memmap_init(unsigned long *s, unsign
 	*s = (u64)kern_memmap;
 	*e = (u64)++k;
 }
+
+void
+efi_initialize_iomem_resources(struct resource *code_resource,
+			       struct resource *data_resource)
+{
+	struct resource *res;
+	void *efi_map_start, *efi_map_end, *p;
+	efi_memory_desc_t *md;
+	u64 efi_desc_size;
+	char *name;
+	unsigned long flags;
+
+	efi_map_start = __va(ia64_boot_param->efi_memmap);
+	efi_map_end   = efi_map_start + ia64_boot_param->efi_memmap_size;
+	efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+	res = NULL;
+
+	for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
+		md = p;
+
+		if (md->num_pages == 0) /* should not happen */
+			continue;
+
+		flags = IORESOURCE_MEM;
+		switch (md->type) {
+
+			case EFI_MEMORY_MAPPED_IO:
+			case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
+				continue;
+
+			case EFI_LOADER_CODE:
+			case EFI_LOADER_DATA:
+			case EFI_BOOT_SERVICES_DATA:
+			case EFI_BOOT_SERVICES_CODE:
+			case EFI_CONVENTIONAL_MEMORY:
+				if (md->attribute & EFI_MEMORY_WP) {
+					name = "System ROM";
+					flags |= IORESOURCE_READONLY;
+				} else {
+					name = "System RAM";
+				}
+				break;
+
+			case EFI_ACPI_MEMORY_NVS:
+				name = "ACPI Non-volatile Storage";
+				flags |= IORESOURCE_BUSY;
+				break;
+
+			case EFI_UNUSABLE_MEMORY:
+				name = "reserved";
+				flags |= IORESOURCE_BUSY | IORESOURCE_DISABLED;
+				break;
+
+			case EFI_RESERVED_TYPE:
+			case EFI_RUNTIME_SERVICES_CODE:
+			case EFI_RUNTIME_SERVICES_DATA:
+			case EFI_ACPI_RECLAIM_MEMORY:
+			default:
+				name = "reserved";
+				flags |= IORESOURCE_BUSY;
+				break;
+		}
+
+		if ((res = kcalloc(1, sizeof(struct resource), GFP_KERNEL)) == NULL) {
+			printk(KERN_ERR "failed to alocate resource for iomem\n");
+			return;
+		}
+
+		res->name = name;
+		res->start = md->phys_addr;
+		res->end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
+		res->flags = flags;
+
+		if (insert_resource(&iomem_resource, res) < 0)
+			kfree(res);
+		else {
+			/*
+			 * We don't know which region contains
+			 * kernel data so we try it repeatedly and
+			 * let the resource manager test it.
+			 */
+			insert_resource(res, code_resource);
+			insert_resource(res, data_resource);
+		}
+	}
+}
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -78,6 +78,19 @@ struct screen_info screen_info;
 unsigned long vga_console_iobase;
 unsigned long vga_console_membase;
 
+static struct resource data_resource = {
+	.name	= "Kernel data",
+	.flags	= IORESOURCE_BUSY | IORESOURCE_MEM
+};
+
+static struct resource code_resource = {
+	.name	= "Kernel code",
+	.flags	= IORESOURCE_BUSY | IORESOURCE_MEM
+};
+extern void efi_initialize_iomem_resources(struct resource *,
+		struct resource *);
+extern char _text[], _edata[], _etext[];
+
 unsigned long ia64_max_cacheline_size;
 unsigned long ia64_iobase;	/* virtual address for I/O accesses */
 EXPORT_SYMBOL(ia64_iobase);
@@ -171,6 +184,22 @@ sort_regions (struct rsvd_region *rsvd_r
 	}
 }
 
+/*
+ * Request address space for all standard resources
+ */
+static int __init register_memory(void)
+{
+	code_resource.start = ia64_tpa(_text);
+	code_resource.end   = ia64_tpa(_etext) - 1;
+	data_resource.start = ia64_tpa(_etext);
+	data_resource.end   = ia64_tpa(_edata) - 1;
+	efi_initialize_iomem_resources(&code_resource, &data_resource);
+
+	return 0;
+}
+
+__initcall(register_memory);
+
 /**
  * reserve_memory - setup reserved memory areas
  *

[-- Attachment #3: efi_memmapwalk-2.6.14-rc4.patch --]
[-- Type: text/x-patch, Size: 15529 bytes --]

--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -239,57 +239,30 @@ is_available_memory (efi_memory_desc_t *
 	return 0;
 }
 
-/*
- * Trim descriptor MD so its starts at address START_ADDR.  If the descriptor covers
- * memory that is normally available to the kernel, issue a warning that some memory
- * is being ignored.
- */
-static void
-trim_bottom (efi_memory_desc_t *md, u64 start_addr)
-{
-	u64 num_skipped_pages;
+typedef struct kern_memdesc {
+	u64 attribute;
+	u64 start;
+	u64 num_pages;
+} kern_memdesc_t;
 
-	if (md->phys_addr >= start_addr || !md->num_pages)
-		return;
-
-	num_skipped_pages = (start_addr - md->phys_addr) >> EFI_PAGE_SHIFT;
-	if (num_skipped_pages > md->num_pages)
-		num_skipped_pages = md->num_pages;
-
-	if (is_available_memory(md))
-		printk(KERN_NOTICE "efi.%s: ignoring %luKB of memory at 0x%lx due to granule hole "
-		       "at 0x%lx\n", __FUNCTION__,
-		       (num_skipped_pages << EFI_PAGE_SHIFT) >> 10,
-		       md->phys_addr, start_addr - IA64_GRANULE_SIZE);
-	/*
-	 * NOTE: Don't set md->phys_addr to START_ADDR because that could cause the memory
-	 * descriptor list to become unsorted.  In such a case, md->num_pages will be
-	 * zero, so the Right Thing will happen.
-	 */
-	md->phys_addr += num_skipped_pages << EFI_PAGE_SHIFT;
-	md->num_pages -= num_skipped_pages;
-}
+static kern_memdesc_t *kern_memmap;
 
 static void
-trim_top (efi_memory_desc_t *md, u64 end_addr)
+walk (efi_freemem_callback_t callback, void *arg, u64 attr)
 {
-	u64 num_dropped_pages, md_end_addr;
+	kern_memdesc_t *k;
+	u64 start, end, voff;
 
-	md_end_addr = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
-
-	if (md_end_addr <= end_addr || !md->num_pages)
-		return;
-
-	num_dropped_pages = (md_end_addr - end_addr) >> EFI_PAGE_SHIFT;
-	if (num_dropped_pages > md->num_pages)
-		num_dropped_pages = md->num_pages;
-
-	if (is_available_memory(md))
-		printk(KERN_NOTICE "efi.%s: ignoring %luKB of memory at 0x%lx due to granule hole "
-		       "at 0x%lx\n", __FUNCTION__,
-		       (num_dropped_pages << EFI_PAGE_SHIFT) >> 10,
-		       md->phys_addr, end_addr);
-	md->num_pages -= num_dropped_pages;
+	voff = (attr == EFI_MEMORY_WB) ? PAGE_OFFSET : __IA64_UNCACHED_OFFSET;
+	for (k = kern_memmap; k->start != ~0UL; k++) {
+		if (k->attribute != attr)
+			continue;
+		start = PAGE_ALIGN(k->start);
+		end = (k->start + (k->num_pages << EFI_PAGE_SHIFT)) & PAGE_MASK;
+		if (start < end)
+			if ((*callback)(start + voff, end + voff, arg) < 0)
+				return;
+	}
 }
 
 /*
@@ -299,148 +272,19 @@ trim_top (efi_memory_desc_t *md, u64 end
 void
 efi_memmap_walk (efi_freemem_callback_t callback, void *arg)
 {
-	int prev_valid = 0;
-	struct range {
-		u64 start;
-		u64 end;
-	} prev, curr;
-	void *efi_map_start, *efi_map_end, *p, *q;
-	efi_memory_desc_t *md, *check_md;
-	u64 efi_desc_size, start, end, granule_addr, last_granule_addr, first_non_wb_addr = 0;
-	unsigned long total_mem = 0;
-
-	efi_map_start = __va(ia64_boot_param->efi_memmap);
-	efi_map_end   = efi_map_start + ia64_boot_param->efi_memmap_size;
-	efi_desc_size = ia64_boot_param->efi_memdesc_size;
-
-	for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
-		md = p;
-
-		/* skip over non-WB memory descriptors; that's all we're interested in... */
-		if (!(md->attribute & EFI_MEMORY_WB))
-			continue;
-
-		/*
-		 * granule_addr is the base of md's first granule.
-		 * [granule_addr - first_non_wb_addr) is guaranteed to
-		 * be contiguous WB memory.
-		 */
-		granule_addr = GRANULEROUNDDOWN(md->phys_addr);
-		first_non_wb_addr = max(first_non_wb_addr, granule_addr);
-
-		if (first_non_wb_addr < md->phys_addr) {
-			trim_bottom(md, granule_addr + IA64_GRANULE_SIZE);
-			granule_addr = GRANULEROUNDDOWN(md->phys_addr);
-			first_non_wb_addr = max(first_non_wb_addr, granule_addr);
-		}
-
-		for (q = p; q < efi_map_end; q += efi_desc_size) {
-			check_md = q;
-
-			if ((check_md->attribute & EFI_MEMORY_WB) &&
-			    (check_md->phys_addr == first_non_wb_addr))
-				first_non_wb_addr += check_md->num_pages << EFI_PAGE_SHIFT;
-			else
-				break;		/* non-WB or hole */
-		}
-
-		last_granule_addr = GRANULEROUNDDOWN(first_non_wb_addr);
-		if (last_granule_addr < md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT))
-			trim_top(md, last_granule_addr);
-
-		if (is_available_memory(md)) {
-			if (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) >= max_addr) {
-				if (md->phys_addr >= max_addr)
-					continue;
-				md->num_pages = (max_addr - md->phys_addr) >> EFI_PAGE_SHIFT;
-				first_non_wb_addr = max_addr;
-			}
-
-			if (total_mem >= mem_limit)
-				continue;
-
-			if (total_mem + (md->num_pages << EFI_PAGE_SHIFT) > mem_limit) {
-				unsigned long limit_addr = md->phys_addr;
-
-				limit_addr += mem_limit - total_mem;
-				limit_addr = GRANULEROUNDDOWN(limit_addr);
-
-				if (md->phys_addr > limit_addr)
-					continue;
-
-				md->num_pages = (limit_addr - md->phys_addr) >>
-				                EFI_PAGE_SHIFT;
-				first_non_wb_addr = max_addr = md->phys_addr +
-				              (md->num_pages << EFI_PAGE_SHIFT);
-			}
-			total_mem += (md->num_pages << EFI_PAGE_SHIFT);
-
-			if (md->num_pages == 0)
-				continue;
-
-			curr.start = PAGE_OFFSET + md->phys_addr;
-			curr.end   = curr.start + (md->num_pages << EFI_PAGE_SHIFT);
-
-			if (!prev_valid) {
-				prev = curr;
-				prev_valid = 1;
-			} else {
-				if (curr.start < prev.start)
-					printk(KERN_ERR "Oops: EFI memory table not ordered!\n");
-
-				if (prev.end == curr.start) {
-					/* merge two consecutive memory ranges */
-					prev.end = curr.end;
-				} else {
-					start = PAGE_ALIGN(prev.start);
-					end = prev.end & PAGE_MASK;
-					if ((end > start) && (*callback)(start, end, arg) < 0)
-						return;
-					prev = curr;
-				}
-			}
-		}
-	}
-	if (prev_valid) {
-		start = PAGE_ALIGN(prev.start);
-		end = prev.end & PAGE_MASK;
-		if (end > start)
-			(*callback)(start, end, arg);
-	}
+	walk(callback, arg, EFI_MEMORY_WB);
 }
 
 /*
- * Walk the EFI memory map to pull out leftover pages in the lower
- * memory regions which do not end up in the regular memory map and
- * stick them into the uncached allocator
- *
- * The regular walk function is significantly more complex than the
- * uncached walk which means it really doesn't make sense to try and
- * marge the two.
+ * Walks the EFI memory map and calls CALLBACK once for each EFI memory descriptor that
+ * has memory that is available for uncached allocator.
  */
-void __init
-efi_memmap_walk_uc (efi_freemem_callback_t callback)
+void
+efi_memmap_walk_uc (efi_freemem_callback_t callback, void *arg)
 {
-	void *efi_map_start, *efi_map_end, *p;
-	efi_memory_desc_t *md;
-	u64 efi_desc_size, start, end;
-
-	efi_map_start = __va(ia64_boot_param->efi_memmap);
-	efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
-	efi_desc_size = ia64_boot_param->efi_memdesc_size;
-
-	for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
-		md = p;
-		if (md->attribute == EFI_MEMORY_UC) {
-			start = PAGE_ALIGN(md->phys_addr);
-			end = PAGE_ALIGN((md->phys_addr+(md->num_pages << EFI_PAGE_SHIFT)) & PAGE_MASK);
-			if ((*callback)(start, end, NULL) < 0)
-				return;
-		}
-	}
+	walk(callback, arg, EFI_MEMORY_UC);
 }
 
-
 /*
  * Look for the PAL_CODE region reported by EFI and maps it using an
  * ITR to enable safe PAL calls in virtual mode.  See IA-64 Processor
@@ -862,3 +706,220 @@ efi_uart_console_only(void)
 	printk(KERN_ERR "Malformed %s value\n", name);
 	return 0;
 }
+
+#define efi_md_size(md)	(md->num_pages << EFI_PAGE_SHIFT)
+
+static inline u64
+kmd_end(kern_memdesc_t *kmd)
+{
+	return (kmd->start + (kmd->num_pages << EFI_PAGE_SHIFT));
+}
+
+static inline u64
+efi_md_end(efi_memory_desc_t *md)
+{
+	return (md->phys_addr + efi_md_size(md));
+}
+
+static inline int
+efi_wb(efi_memory_desc_t *md)
+{
+	return (md->attribute & EFI_MEMORY_WB);
+}
+
+static inline int
+efi_uc(efi_memory_desc_t *md)
+{
+	return (md->attribute & EFI_MEMORY_UC);
+}
+
+/*
+ * Look for the first granule aligned memory descriptor memory
+ * that is big enough to hold EFI memory map. Make sure this
+ * descriptor is atleast granule sized so it does not get trimmed
+ */
+struct kern_memdesc *
+find_memmap_space (void)
+{
+	u64	contig_low=0, contig_high=0;
+	u64	as = 0, ae;
+	void *efi_map_start, *efi_map_end, *p, *q;
+	efi_memory_desc_t *md, *pmd = NULL, *check_md;
+	u64	space_needed, efi_desc_size;
+	unsigned long total_mem = 0;
+
+	efi_map_start = __va(ia64_boot_param->efi_memmap);
+	efi_map_end   = efi_map_start + ia64_boot_param->efi_memmap_size;
+	efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+	/*
+	 * Worst case: we need 3 kernel descriptors for each efi descriptor
+	 * (if every entry has a WB part in the middle, and UC head and tail),
+	 * plus one for the end marker.
+	 */
+	space_needed = sizeof(kern_memdesc_t) *
+		(3 * (ia64_boot_param->efi_memmap_size/efi_desc_size) + 1);
+
+	for (p = efi_map_start; p < efi_map_end; pmd = md, p += efi_desc_size) {
+		md = p;
+		if (!efi_wb(md)) {
+			continue;
+		}
+		if (pmd == NULL || !efi_wb(pmd) || efi_md_end(pmd) != md->phys_addr) {
+			contig_low = GRANULEROUNDUP(md->phys_addr);
+			contig_high = efi_md_end(md);
+			for (q = p + efi_desc_size; q < efi_map_end; q += efi_desc_size) {
+				check_md = q;
+				if (!efi_wb(check_md))
+					break;
+				if (contig_high != check_md->phys_addr)
+					break;
+				contig_high = efi_md_end(check_md);
+			}
+			contig_high = GRANULEROUNDDOWN(contig_high);
+		}
+		if (!is_available_memory(md) || md->type == EFI_LOADER_DATA)
+			continue;
+
+		/* Round ends inward to granule boundaries */
+		as = max(contig_low, md->phys_addr);
+		ae = min(contig_high, efi_md_end(md));
+
+		/* keep within max_addr= command line arg */
+		ae = min(ae, max_addr);
+		if (ae <= as)
+			continue;
+
+		/* avoid going over mem= command line arg */
+		if (total_mem + (ae - as) > mem_limit)
+			ae -= total_mem + (ae - as) - mem_limit;
+
+		if (ae <= as)
+			continue;
+
+		if (ae - as > space_needed)
+			break;
+	}
+	if (p >= efi_map_end)
+		panic("Can't allocate space for kernel memory descriptors");
+
+	return __va(as);
+}
+
+/*
+ * Walk the EFI memory map and gather all memory available for kernel
+ * to use.  We can allocate partial granules only if the unavailable
+ * parts exist, and are WB.
+ */
+void
+efi_memmap_init(unsigned long *s, unsigned long *e)
+{
+	struct kern_memdesc *k, *prev = 0;
+	u64	contig_low=0, contig_high=0;
+	u64	as, ae, lim;
+	void *efi_map_start, *efi_map_end, *p, *q;
+	efi_memory_desc_t *md, *pmd = NULL, *check_md;
+	u64	efi_desc_size;
+	unsigned long total_mem = 0;
+
+	k = kern_memmap = find_memmap_space();
+
+	efi_map_start = __va(ia64_boot_param->efi_memmap);
+	efi_map_end   = efi_map_start + ia64_boot_param->efi_memmap_size;
+	efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+	for (p = efi_map_start; p < efi_map_end; pmd = md, p += efi_desc_size) {
+		md = p;
+		if (!efi_wb(md)) {
+			if (efi_uc(md) && (md->type == EFI_CONVENTIONAL_MEMORY ||
+				    	   md->type == EFI_BOOT_SERVICES_DATA)) {
+				k->attribute = EFI_MEMORY_UC;
+				k->start = md->phys_addr;
+				k->num_pages = md->num_pages;
+				k++;
+			}
+			continue;
+		}
+		if (pmd == NULL || !efi_wb(pmd) || efi_md_end(pmd) != md->phys_addr) {
+			contig_low = GRANULEROUNDUP(md->phys_addr);
+			contig_high = efi_md_end(md);
+			for (q = p + efi_desc_size; q < efi_map_end; q += efi_desc_size) {
+				check_md = q;
+				if (!efi_wb(check_md))
+					break;
+				if (contig_high != check_md->phys_addr)
+					break;
+				contig_high = efi_md_end(check_md);
+			}
+			contig_high = GRANULEROUNDDOWN(contig_high);
+		}
+		if (!is_available_memory(md))
+			continue;
+
+		/*
+		 * Round ends inward to granule boundaries
+		 * Give trimmings to uncached allocator
+		 */
+		if (md->phys_addr < contig_low) {
+			lim = min(efi_md_end(md), contig_low);
+			if (efi_uc(md)) {
+				if (k > kern_memmap && (k-1)->attribute == EFI_MEMORY_UC &&
+				    kmd_end(k-1) == md->phys_addr) {
+					(k-1)->num_pages += (lim - md->phys_addr) >> EFI_PAGE_SHIFT;
+				} else {
+					k->attribute = EFI_MEMORY_UC;
+					k->start = md->phys_addr;
+					k->num_pages = (lim - md->phys_addr) >> EFI_PAGE_SHIFT;
+					k++;
+				}
+			}
+			as = contig_low;
+		} else
+			as = md->phys_addr;
+
+		if (efi_md_end(md) > contig_high) {
+			lim = max(md->phys_addr, contig_high);
+			if (efi_uc(md)) {
+				if (lim == md->phys_addr && k > kern_memmap &&
+				    (k-1)->attribute == EFI_MEMORY_UC &&
+				    kmd_end(k-1) == md->phys_addr) {
+					(k-1)->num_pages += md->num_pages;
+				} else {
+					k->attribute = EFI_MEMORY_UC;
+					k->start = lim;
+					k->num_pages = (efi_md_end(md) - lim) >> EFI_PAGE_SHIFT;
+					k++;
+				}
+			}
+			ae = contig_high;
+		} else
+			ae = efi_md_end(md);
+
+		/* keep within max_addr= command line arg */
+		ae = min(ae, max_addr);
+		if (ae <= as)
+			continue;
+
+		/* avoid going over mem= command line arg */
+		if (total_mem + (ae - as) > mem_limit)
+			ae -= total_mem + (ae - as) - mem_limit;
+
+		if (ae <= as)
+			continue;
+		if (prev && kmd_end(prev) == md->phys_addr) {
+			prev->num_pages += (ae - as) >> EFI_PAGE_SHIFT;
+			total_mem += ae - as;
+			continue;
+		}
+		k->attribute = EFI_MEMORY_WB;
+		k->start = as;
+		k->num_pages = (ae - as) >> EFI_PAGE_SHIFT;
+		total_mem += ae - as;
+		prev = k++;
+	}
+	k->start = ~0L; /* end-marker */
+
+	/* reserve the memory we are using for kern_memmap */
+	*s = (u64)kern_memmap;
+	*e = (u64)++k;
+}
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -211,6 +211,9 @@ reserve_memory (void)
 	}
 #endif
 
+	efi_memmap_init(&rsvd_region[n].start, &rsvd_region[n].end);
+	n++;
+
 	/* end of memory marker */
 	rsvd_region[n].start = ~0UL;
 	rsvd_region[n].end   = ~0UL;
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -205,23 +205,18 @@ EXPORT_SYMBOL(uncached_free_page);
 static int __init
 uncached_build_memmap(unsigned long start, unsigned long end, void *arg)
 {
-	long length;
-	unsigned long vstart, vend;
+	long length = end - start;
 	int node;
 
-	length = end - start;
-	vstart = start + __IA64_UNCACHED_OFFSET;
-	vend = end + __IA64_UNCACHED_OFFSET;
-
 	dprintk(KERN_ERR "uncached_build_memmap(%lx %lx)\n", start, end);
 
-	memset((char *)vstart, 0, length);
+	memset((char *)start, 0, length);
 
-	node = paddr_to_nid(start);
+	node = paddr_to_nid(start - __IA64_UNCACHED_OFFSET);
 
-	for (; vstart < vend ; vstart += PAGE_SIZE) {
-		dprintk(KERN_INFO "sticking %lx into the pool!\n", vstart);
-		gen_pool_free(uncached_pool[node], vstart, PAGE_SIZE);
+	for (; start < end ; start += PAGE_SIZE) {
+		dprintk(KERN_INFO "sticking %lx into the pool!\n", start);
+		gen_pool_free(uncached_pool[node], start, PAGE_SIZE);
 	}
 
 	return 0;
--- a/include/asm-ia64/meminit.h
+++ b/include/asm-ia64/meminit.h
@@ -16,10 +16,11 @@
  * 	- initrd (optional)
  * 	- command line string
  * 	- kernel code & data
+ * 	- Kernel memory map built from EFI memory map
  *
  * More could be added if necessary
  */
-#define IA64_MAX_RSVD_REGIONS 5
+#define IA64_MAX_RSVD_REGIONS 6
 
 struct rsvd_region {
 	unsigned long start;	/* virtual address of beginning of element */
@@ -33,6 +34,7 @@ extern void find_memory (void);
 extern void reserve_memory (void);
 extern void find_initrd (void);
 extern int filter_rsvd_memory (unsigned long start, unsigned long end, void *arg);
+extern void efi_memmap_init(unsigned long *, unsigned long *);
 
 /*
  * For rounding an address to the next IA64_GRANULE_SIZE or order

[-- Attachment #4: kexec-ia64-2.6.14-rc4.patch --]
[-- Type: text/x-patch, Size: 26143 bytes --]

diff -urNp linux-2.6.14-rc4/arch/ia64/hp/common/sba_iommu.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/hp/common/sba_iommu.c
--- linux-2.6.14-rc4/arch/ia64/hp/common/sba_iommu.c	2005-08-28 17:41:01.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/hp/common/sba_iommu.c	2005-10-24 09:18:19.000000000 -0600
@@ -1624,6 +1624,28 @@ ioc_iova_init(struct ioc *ioc)
 	READ_REG(ioc->ioc_hpa + IOC_IBASE);
 }
 
+#ifdef CONFIG_KEXEC
+void
+ioc_iova_disable(void)
+{
+	struct ioc *ioc;
+
+	ioc = ioc_list;
+
+	while (ioc != NULL) {
+		/* Disable IOVA translation */
+		WRITE_REG(ioc->ibase & 0xfffffffffffffffe, ioc->ioc_hpa + IOC_IBASE);
+		READ_REG(ioc->ioc_hpa + IOC_IBASE);
+
+		/* Clear I/O TLB of any possible entries */
+		WRITE_REG(ioc->ibase | (get_iovp_order(ioc->iov_size) + iovp_shift), ioc->ioc_hpa + IOC_PCOM);
+		READ_REG(ioc->ioc_hpa + IOC_PCOM);
+
+		ioc = ioc->next;
+	}
+}
+#endif
+
 static void __init
 ioc_resource_init(struct ioc *ioc)
 {
diff -urNp linux-2.6.14-rc4/arch/ia64/Kconfig linux-2.6.14-rc4-kexec-ia64/arch/ia64/Kconfig
--- linux-2.6.14-rc4/arch/ia64/Kconfig	2005-10-19 09:04:33.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/Kconfig	2005-10-24 09:18:19.000000000 -0600
@@ -323,6 +323,23 @@ config PERFMON
 	  little bigger and slows down execution a bit, but it is generally
 	  a good idea to turn this on.  If you're unsure, say Y.
 
+config KEXEC
+	bool "kexec system call (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	help
+         kexec is a system call that implements the ability to shutdown your
+         current kernel, and to start another kernel.  It is like a reboot
+         but it is indepedent of the system firmware.   And like a reboot
+         you can start any kernel with it, not just Linux.  
+       
+         The name comes from the similiarity to the exec system call. 
+       
+         It is an ongoing process to be certain the hardware in a machine
+         is properly shutdown, so do not be surprised if this code does not
+         initially work for you.  It may help to enable device hotplugging
+         support.  As of this writing the exact hardware interface is
+         strongly in flux, so no good recommendation can be made.
+
 config IA64_PALINFO
 	tristate "/proc/pal support"
 	help
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/crash.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/crash.c
--- linux-2.6.14-rc4/arch/ia64/kernel/crash.c	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/crash.c	2005-10-24 11:06:50.000000000 -0600
@@ -0,0 +1,44 @@
+/*
+ * Architecture specific (ia64) functions for kexec based crash dumps.
+ *
+ * Created by: Khalid Aziz (khalid.aziz@hp.com)
+ *
+ * Copyright (C) Hewlett Packard, 2005. All rights reserved.
+ *
+ */
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/irq.h>
+#include <linux/reboot.h>
+#include <linux/kexec.h>
+#include <linux/irq.h>
+#include <linux/delay.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+
+note_buf_t crash_notes[NR_CPUS];
+
+void
+machine_crash_shutdown(struct pt_regs *pt)
+{
+	extern void terminate_irqs(void);
+
+	/* This function is only called after the system
+	 * has paniced or is otherwise in a critical state.
+	 * The minimum amount of code to allow a kexec'd kernel
+	 * to run successfully needs to happen here.
+	 *
+	 * In practice this means shooting down the other cpus in
+	 * an SMP system.
+	 */
+	if (in_interrupt()) {
+		terminate_irqs();
+		ia64_eoi();
+	}
+	system_state = SYSTEM_RESTART;
+	device_shutdown();
+	system_state = SYSTEM_BOOTING;
+	machine_shutdown();
+}
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/efi.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/efi.c
--- linux-2.6.14-rc4/arch/ia64/kernel/efi.c	2005-10-20 16:44:30.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/efi.c	2005-10-24 09:25:03.000000000 -0600
@@ -38,6 +38,9 @@
 extern efi_status_t efi_call_phys (void *, ...);
 
 struct efi efi;
+#ifdef CONFIG_KEXEC
+unsigned long kexec_reboot = 0;
+#endif
 EXPORT_SYMBOL(efi);
 static efi_runtime_services_t *runtime;
 static unsigned long mem_limit = ~0UL, max_addr = ~0UL;
@@ -526,6 +529,9 @@ efi_map_pal_code (void)
 	 * Cannot write to CRx with PSR.ic=1
 	 */
 	psr = ia64_clear_ic();
+#ifdef CONFIG_KEXEC
+	ia64_ptr(0x01, GRANULEROUNDDOWN((unsigned long) pal_vaddr), IA64_GRANULE_SHIFT);
+#endif
 	ia64_itr(0x1, IA64_TR_PALCODE, GRANULEROUNDDOWN((unsigned long) pal_vaddr),
 		 pte_val(pfn_pte(__pa(pal_vaddr) >> PAGE_SHIFT, PAGE_KERNEL)),
 		 IA64_GRANULE_SHIFT);
@@ -549,15 +555,22 @@ efi_init (void)
 		if (memcmp(cp, "mem=", 4) == 0) {
 			cp += 4;
 			mem_limit = memparse(cp, &end);
-			if (end != cp)
-				break;
 			cp = end;
+			while (*cp == ' ')
+				++cp;
 		} else if (memcmp(cp, "max_addr=", 9) == 0) {
 			cp += 9;
 			max_addr = GRANULEROUNDDOWN(memparse(cp, &end));
-			if (end != cp)
-				break;
 			cp = end;
+			while (*cp == ' ')
+				++cp;
+#ifdef CONFIG_KEXEC
+		} else if (memcmp(cp, "kexec_reboot", 12) == 0) {
+			cp += 13;
+			kexec_reboot = 1;
+			while (*cp == ' ')
+				++cp;
+#endif
 		} else {
 			while (*cp != ' ' && *cp)
 				++cp;
@@ -702,10 +715,17 @@ efi_enter_virtual_mode (void)
 		}
 	}
 
+#ifdef CONFIG_KEXEC
+	if (kexec_reboot == 0)
+#endif
 	status = efi_call_phys(__va(runtime->set_virtual_address_map),
 			       ia64_boot_param->efi_memmap_size,
 			       efi_desc_size, ia64_boot_param->efi_memdesc_version,
 			       ia64_boot_param->efi_memmap);
+#ifdef CONFIG_KEXEC
+	else
+		status = EFI_SUCCESS;
+#endif
 	if (status != EFI_SUCCESS) {
 		printk(KERN_WARNING "warning: unable to switch EFI into virtual mode "
 		       "(status=%lu)\n", status);
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/entry.S linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/entry.S
--- linux-2.6.14-rc4/arch/ia64/kernel/entry.S	2005-10-19 09:04:34.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/entry.S	2005-10-24 09:25:39.000000000 -0600
@@ -1588,7 +1588,7 @@ sys_call_table:
 	data8 sys_mq_timedreceive		// 1265
 	data8 sys_mq_notify
 	data8 sys_mq_getsetattr
-	data8 sys_ni_syscall			// reserved for kexec_load
+	data8 sys_kexec_load
 	data8 sys_ni_syscall			// reserved for vserver
 	data8 sys_waitid			// 1270
 	data8 sys_add_key
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/machine_kexec.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/machine_kexec.c
--- linux-2.6.14-rc4/arch/ia64/kernel/machine_kexec.c	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/machine_kexec.c	2005-10-25 14:42:35.000000000 -0600
@@ -0,0 +1,224 @@
+/*
+ * machine_kexec.c - handle transition of Linux booting another kernel
+ * Copyright (C) 2002-2003 Eric Biederman  <ebiederm@xmission.com>
+ * Copyright (C) 2005 Khalid Aziz  <khalid.aziz@hp.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <linux/pci.h>
+#include <asm/mmu_context.h>
+#include <asm/setup.h>
+#include <asm/mca.h>
+#include <asm/page.h>
+#include <asm/bitops.h>
+#include <asm/tlbflush.h>
+
+DECLARE_PER_CPU(u64, ia64_mca_pal_base);
+
+unsigned int kexec_on_init = 0;
+extern unsigned long ia64_iobase;
+extern unsigned long kexec_reboot;
+extern void kexec_stop_this_cpu(void *);
+extern struct subsystem devices_subsys;
+
+static void set_io_base(void)
+{
+	unsigned long phys_iobase;
+
+	/* set kr0 to iobase */
+	phys_iobase = __pa(ia64_iobase);
+	ia64_set_kr(IA64_KR_IO_BASE, __IA64_UNCACHED_OFFSET | phys_iobase);
+};
+
+typedef void (*relocate_new_kernel_t)(
+	unsigned long indirection_page, unsigned long start_address, 
+	unsigned long boot_param_address);
+
+const extern unsigned long relocate_new_kernel[];
+const extern unsigned long kexec_fake_sal_rendez[];
+const extern unsigned int relocate_new_kernel_size;
+extern void use_mm(struct mm_struct *mm);
+extern void ioc_iova_disable(void);
+
+volatile extern long kexec_cont;
+volatile const extern unsigned char kexec_reloc[];
+volatile extern long kexec_rendez;
+volatile const extern unsigned char kexec_rendez_reloc[];
+volatile extern long kexec_ptcebase, kexec_count0, kexec_count1;
+volatile extern long kexec_stride0, kexec_stride1;
+volatile extern long kexec_pal_base;
+
+static void *kexec_boot_param;
+
+/*
+ * Do what every setup is needed on image and the
+ * reboot code buffer to allow us to avoid allocations
+ * later.
+ */
+int machine_kexec_prepare(struct kimage *image)
+{
+	void *control_code_buffer;
+	unsigned long cmdline_size;
+
+	/* 
+	 * We need to save the boot parameters in kernel pages.
+	 */
+	cmdline_size = (COMMAND_LINE_SIZE + PAGE_SIZE) & PAGE_MASK;
+	if (image->segment[0].bufsz > cmdline_size) {
+		printk(KERN_ERR "Not enough space to load kernel command line (%d)\n", image->segment[0].bufsz);
+		return -ENOMEM;
+	}
+	kexec_boot_param = kmalloc(cmdline_size, GFP_KERNEL);
+	if (kexec_boot_param == NULL) 
+		return -ENOMEM;
+	memset(kexec_boot_param, 0, cmdline_size);
+	memcpy(kexec_boot_param, image->segment[0].buf, 
+			image->segment[0].bufsz);
+	/* 
+	 * We do not want command line parameters loaded in memory later 
+	 * when kernel is relocated just before kexec. So zero out memory
+	 * size for command line param segment
+	 */
+	image->segment[0].memsz = 0;
+
+#if 0
+	/* Pre-load control code buffer in case of INIT */
+	control_code_buffer = ((unsigned long)phys_to_virt(page_to_pfn(image->control_code_page) << PAGE_SHIFT) & (unsigned long)0x1fffffffffffffffL) | __IA64_UNCACHED_OFFSET;
+	kexec_rendez = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_rendez_reloc -  (long)kexec_fake_sal_rendez;
+
+	/* copy it out */
+	memcpy((void *)control_code_buffer, kexec_fake_sal_rendez, relocate_new_kernel_size);
+#endif
+
+	return 0;
+}
+
+void machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+void machine_shutdown(void)
+{
+	struct pci_dev *dev;
+	struct list_head *n;
+	u16 command;
+
+	/* Disable bus mastering on all PCI devices */
+	n = pci_devices.next;
+	while (n && (n != &pci_devices)) {
+		dev = pci_dev_g(n);
+		pci_read_config_word(dev, PCI_COMMAND, &command);
+		command &= ~PCI_COMMAND_MASTER;
+		pci_write_config_word(dev, PCI_COMMAND, command);
+		n = n->next;
+	}
+
+#ifdef CONFIG_SMP
+	int reboot_cpu_id;
+
+	/* The boot cpu is always logical cpu 0 */
+	reboot_cpu_id = 0;
+
+	/* Make certain the cpu I'm rebooting on is online */
+	if (!cpu_isset(reboot_cpu_id, cpu_online_map)) {
+		reboot_cpu_id = smp_processor_id();
+	}
+
+	/* Make certain I only run on the appropriate processor */
+	set_cpus_allowed(current, cpumask_of_cpu(reboot_cpu_id));
+#endif
+}
+
+/*
+ * Do not allocate memory (or fail in any way) in machine_kexec().
+ * We are past the point of no return, committed to rebooting now. 
+ */
+void machine_kexec(struct kimage *image)
+{
+	unsigned long indirection_page;
+	void *control_code_buffer;
+	relocate_new_kernel_t rnk;
+	unsigned char *cmdline;
+	int cpu;
+	unsigned long initrd_start, initrd_size;
+
+	control_code_buffer = (void *) (((unsigned long)phys_to_virt(page_to_pfn(image->control_code_page) << PAGE_SHIFT) & (unsigned long)0x1fffffffffffffffL) | __IA64_UNCACHED_OFFSET);
+	indirection_page = image->head & PAGE_MASK;
+
+	/* copy it out */
+	memcpy((void *)control_code_buffer, kexec_fake_sal_rendez, relocate_new_kernel_size);
+
+	/* Save PTCE data for cache flush later */
+	kexec_ptcebase	=  local_cpu_data->ptce_base;
+	kexec_count0	= local_cpu_data->ptce_count[0];
+	kexec_count1	= local_cpu_data->ptce_count[1];
+	kexec_stride0	= local_cpu_data->ptce_stride[0];
+	kexec_stride1	= local_cpu_data->ptce_stride[1];
+
+#ifdef CONFIG_SMP
+	kexec_rendez = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_rendez_reloc -  (long)kexec_fake_sal_rendez;
+	if (!kexec_on_init)
+		smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
+
+#endif
+	/* Interrupts aren't acceptable while we reboot */
+	local_irq_disable();
+
+	kexec_cont = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_reloc -  (long) kexec_fake_sal_rendez;
+
+	/* Save PAL mapping for TR flush later */
+	cpu = smp_processor_id();
+	kexec_pal_base = __get_cpu_var(ia64_mca_pal_base);
+
+	/* set kr0 to the appropriate address */
+	set_io_base();
+
+	/* now execute the control code 
+	 * We will start by executing the control code linked into the 
+	 * kernel as opposed to the code we copied in control code buffer		 * page. When this code switches to physical mode, we will start
+	 * executing the code in control code buffer page. Reason for
+	 * doing this is we start code execution in virtual address space.
+	 * If we were to try to execute the newly copied code in virtual
+	 * address space, we will need to make an ITLB entry to avoid ITLB 
+	 * miss. By executing the code linked into kernel, we take advantage
+	 * of the ITLB entry already in place of kernel and avoid making
+	 * a new entry.
+	 */
+	control_code_buffer = (void *) relocate_new_kernel;
+	rnk = (relocate_new_kernel_t) &control_code_buffer;
+	if (strstr(kexec_boot_param, "kexec_reboot") == NULL)
+		strcat(kexec_boot_param, " kexec_reboot ");
+	cmdline = __va(ia64_boot_param->command_line);
+	strlcpy(cmdline, kexec_boot_param, COMMAND_LINE_SIZE);
+	initrd_start = image->segment[image->nr_segments-1].mem;
+	initrd_size = image->segment[image->nr_segments-1].memsz;
+	if (initrd_size != 0)
+		ia64_boot_param->initrd_start = initrd_start;
+	else
+		ia64_boot_param->initrd_start = 0UL;
+	ia64_boot_param->initrd_size = initrd_size;
+
+	{
+		unsigned long pta, impl_va_bits;
+
+#       define pte_bits                 3
+#       define vmlpt_bits               (impl_va_bits - PAGE_SHIFT + pte_bits)
+#       define POW2(n)                  (1ULL << (n))
+
+		/* Disable VHPT */
+		impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+		pta = POW2(61) - POW2(vmlpt_bits);
+		ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+	}
+
+#ifdef CONFIG_IA64_HP_ZX1
+	ioc_iova_disable();
+#endif
+	rnk(indirection_page, image->start, (unsigned long) ia64_boot_param);
+}
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/Makefile linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/Makefile
--- linux-2.6.14-rc4/arch/ia64/kernel/Makefile	2005-10-19 09:04:34.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/Makefile	2005-10-24 09:19:10.000000000 -0600
@@ -22,6 +22,7 @@ obj-$(CONFIG_PERFMON)		+= perfmon_defaul
 obj-$(CONFIG_IA64_CYCLONE)	+= cyclone.o
 obj-$(CONFIG_CPU_FREQ)		+= cpufreq/
 obj-$(CONFIG_IA64_MCA_RECOVERY)	+= mca_recovery.o
+obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o crash.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o jprobes.o
 obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR)	+= uncached.o
 mca_recovery-y			+= mca_drv.o mca_drv_asm.o
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/relocate_kernel.S linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/relocate_kernel.S
--- linux-2.6.14-rc4/arch/ia64/kernel/relocate_kernel.S	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/relocate_kernel.S	2005-10-25 14:43:42.000000000 -0600
@@ -0,0 +1,385 @@
+/*
+ * relocate_kernel.S - Relocate kexec'able kernel and start it
+ * Copyright (C) 2005 Khalid Aziz  <khalid.aziz@hp.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/config.h>
+#include <asm/asmmacro.h>
+#include <asm/kregs.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
+       /* Must be relocatable PIC code callable as a C function, that once
+        * it starts can not use the previous processes stack.
+        *
+        */
+       /* Q: Do I want to setup an interrupt vector, so what happens
+        * when exceptions occur is well defined?
+        */
+	.text
+	.align 32
+	.global kexec_fake_sal_rendez#
+	.proc kexec_fake_sal_rendez#
+kexec_fake_sal_rendez:
+	mf.a
+	;;
+	movl	r25=kexec_rendez
+	;;
+	ld8	r17=[r25]
+	{
+		flushrs
+		srlz.i
+	}
+	;;
+       /* See where I am running, and compute gp */
+	{
+		mov     ar.rsc = 0      /* Put RSE in enforce lacy, LE mode */
+		mov     gp = ip         /* gp == relocate_new_kernel */
+	}
+
+	movl r8=0x00000100000000
+	;;
+	mov cr.iva=r8
+	/* Transition from virtual to physical mode */
+	rsm	psr.i | psr.ic
+	srlz.i
+	movl	r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+	;;
+	mov	cr.ipsr=r16
+	;;
+	mov	cr.iip=r17
+	mov	cr.ifs=r0
+	;;
+	rfi
+	;;
+	.global kexec_rendez_reloc
+kexec_rendez_reloc:     /* Now we are in physical mode */
+
+	mov     b6=r32			/* _start addr */
+	mov	r8=r33			/* ap_wakeup_vector */
+	mov	r26=r34			/* PAL addr */
+	;;
+	/* Purge kernel TRs */
+	movl	r16=KERNEL_START
+	mov	r18=KERNEL_TR_PAGE_SHIFT<<2
+	;;
+	ptr.i	r16,r18
+	ptr.d	r16,r18
+	;;
+	srlz.i
+	;;
+	srlz.d
+	;;
+	/* Purge percpu TR */
+	movl	r16=PERCPU_ADDR
+	mov	r18=PERCPU_PAGE_SHIFT<<2
+	;;
+	ptr.d	r16,r18
+	;;
+	srlz.d
+	;;
+	/* Purge PAL TR */
+	mov	r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.i	r26,r18
+	;;
+	srlz.i
+	;;
+	/* Purge stack TR */
+	mov	r16=IA64_KR(CURRENT_STACK)
+	;;
+	shl	r16=r16,IA64_GRANULE_SHIFT
+	movl	r19=PAGE_OFFSET
+	;;
+	add	r16=r19,r16
+	mov	r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.d	r16,r18
+	;;
+	srlz.i
+	;;
+
+	/* Ensure we can read and clear external interrupts */
+	mov	cr.tpr=r0
+	srlz.d
+
+	shr.u	r9=r8,6			/* which irr */
+	;;
+	and	r8=63,r8		/* bit offset into irr */
+	;;
+	mov	r10=1;;
+	;;
+	shl	r10=r10,r8		/* bit mask off irr we want */
+	cmp.eq	p6,p0=0,r9
+	;;
+(p6)	br.cond.sptk.few        check_irr0
+	cmp.eq	p7,p0=1,r9
+	;;
+(p7)	br.cond.sptk.few        check_irr1
+	cmp.eq	p8,p0=2,r9
+	;;
+(p8)	br.cond.sptk.few        check_irr2
+	cmp.eq	p9,p0=3,r9
+	;;
+(p9)	br.cond.sptk.few        check_irr3
+
+check_irr0:
+	mov	r8=cr.irr0
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr0
+	br.few	call_start
+	
+check_irr1:
+	mov	r8=cr.irr1
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr1
+	br.few	call_start
+	
+check_irr2:
+	mov	r8=cr.irr2
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr2
+	br.few	call_start
+	
+check_irr3:
+	mov	r8=cr.irr3
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr3
+	br.few	call_start
+	
+call_start:
+	mov	cr.eoi=r0
+	;;
+	srlz.d
+	;;
+	mov	r8=cr.ivr
+	;;
+	srlz.d
+	;;
+	cmp.eq	p0,p6=15,r8
+(p6)	br.cond.sptk.few	call_start
+	br.sptk.few		b6
+	.endp kexec_fake_sal_rendez#
+
+	.global relocate_new_kernel#
+	.proc relocate_new_kernel#
+relocate_new_kernel:
+	mf
+	;;
+	/* Save the ptce information for translation cache purge later */
+	movl	r25=kexec_cont
+	movl	r27=kexec_ptcebase
+	movl	r28=kexec_count0
+	;;
+	ld8	r17=[r25]
+	ld8	r22=[r27]
+	ld8	r20=[r28]
+	;;
+	movl	r25=kexec_count1
+	movl	r27=kexec_stride0
+	movl	r28=kexec_stride1
+	;;
+	ld8	r21=[r25]
+	ld8	r23=[r27]
+	ld8	r24=[r28]
+	;;
+	movl	r27=kexec_pal_base
+	;;
+	adds 	r25=48,r27
+	;;
+	ld8	r26=[r25]
+	;;
+
+	{
+		flushrs
+		srlz.i
+	}
+	;;
+       /* See where I am running, and compute gp */
+	{
+		mov     ar.rsc = 0      /* Put RSE in enforce lacy, LE mode */
+		mov     gp = ip         /* gp == relocate_new_kernel */
+	}
+
+	movl r8=0x00000100000000
+	;;
+	mov cr.iva=r8
+
+	/* Transition from virtual to physical mode */
+	rsm	psr.i | psr.ic
+	srlz.i
+	movl	r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+	;;
+	mov	cr.ipsr=r16
+	;;
+	mov	cr.iip=r17
+	mov	cr.ifs=r0
+	;;
+	rfi
+	;;
+	.global kexec_reloc
+kexec_reloc:     /* Now we are in physical mode */
+	/* Setup the memory stack */
+	add     r12=(memory_stack_end - relocate_new_kernel),gp
+	/* Setup the register stack */
+	add     r8=(register_stack - relocate_new_kernel),gp
+	;;
+	loadrs
+	;;
+	mov     ar.bspstore=r8
+	;;
+
+	/* Do the copies */
+	mov     r8=r32
+	mov     b6=r33
+	tpa     r28=r34
+	mov     r9=0
+	mov     r11=PAGE_SIZE
+	;;
+	/* top, read another word for the indirection page */
+top:   ld8     r10=[r8], 8
+	;;
+	tbit.nz p6,p0 = r10, 0  /* Is it a destination page? */
+	tbit.nz p7,p0 = r10, 1  /* Is it an indirection page? */
+	tbit.nz p8,p0 = r10, 3  /* Is it the source indicator? */
+	tbit.nz p9,p0 = r10, 2  /* Is it the done indicator? */
+	movl	r19 = PAGE_MASK
+	;;
+	and	r10 = r10, r19	/* Clear the low 12 bits of r10 */
+	;;
+(p6)   mov     r9 = r10        /* destination addr */
+(p7)   mov     r8 = r10        /* indirection addr */
+(p8)   br.cond.sptk.few        source
+(p9)   br.cond.sptk.few        done
+	br.cond.sptk.few        top
+source:
+	add     r16 = r11, r10
+	add     r14 = 8, r10
+	add     r15 = 8, r9
+	;;
+0:
+	ld8     r17 = [r10],16
+	ld8     r18 = [r14],16
+	;;
+	st8     [r9]  = r17, 16
+	st8     [r15] = r18, 16
+	cmp.ne  p6,p0 = r16, r10
+	;;
+(p6)   br.cond.sptk.few        0b
+	br.cond.sptk.few        top
+done:
+	srlz.i
+	srlz.d
+	;;
+
+	/* Now purge local tlb */
+	mov r19 = r0
+	adds	r21=-1,r20
+	;;
+2:
+	cmp.ltu	p6,p7=r19,r20
+(p7)	br.cond.dpnt.few	4f
+	mov	ar.lc=r21
+3:
+	ptc.e	r22
+	;;
+	add	r22=r24,r22
+	br.cloop.sptk.few	3b
+	;;
+	add	r22=r23,r22
+	add	r19=1,r19
+	;;
+	br.sptk.few	2b
+4:
+	srlz.i ;;
+	
+       // Now purge addresses formerly mapped by TR registers
+	// Purge ITR&DTR for kernel.
+	movl r16=KERNEL_START
+	mov r18=KERNEL_TR_PAGE_SHIFT<<2
+	;;
+	ptr.i r16, r18
+	ptr.d r16, r18
+	;;
+	srlz.i
+	;;
+	srlz.d
+	;;
+	// Purge DTR for PERCPU data.
+	movl r16=PERCPU_ADDR
+	mov r18=PERCPU_PAGE_SHIFT<<2
+	;;
+	ptr.d r16,r18
+	;;
+	srlz.d
+	;;
+	// Purge ITR for PAL code
+	mov r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.i r26,r18
+	;;
+	srlz.i
+	;;
+	// Purge DTR for stack.
+	mov r16=IA64_KR(CURRENT_STACK)
+	;;
+	shl r16=r16,IA64_GRANULE_SHIFT
+	movl r19=PAGE_OFFSET
+	;;
+	add r16=r19,r16
+	mov r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.d r16,r18
+	;;
+	srlz.i
+	;;
+
+	br.sptk.few		b6
+	br.cond.sptk.few        0b
+	.endp relocate_new_kernel#
+
+	.balign 8192
+relocate_new_kernel_end:
+	.global relocate_new_kernel_size
+relocate_new_kernel_size:
+	.long relocate_new_kernel_end - kexec_fake_sal_rendez
+
+	.global kexec_cont
+	.align 8
+kexec_cont:	data8 0xdeadbeefdeadbeef
+	.global kexec_rendez
+kexec_rendez:	data8 0xdeadbeefdeadbeef
+	.global kexec_ptcebase
+kexec_ptcebase:	data8 0xdeadbeefdeadbeef
+	.global kexec_count0
+kexec_count0:	data8 0xdeadbeefdeadbeef
+	.global kexec_count1
+kexec_count1:	data8 0xdeadbeefdeadbeef
+	.global kexec_stride0
+kexec_stride0:	data8 0xdeadbeefdeadbeef
+	.global kexec_stride1
+kexec_stride1:	data8 0xdeadbeefdeadbeef
+	.global kexec_pal_base
+kexec_pal_base:	data8 0xdeadbeefdeadbeef
+
+register_stack:
+	.fill           8192, 1, 0
+register_stack_end:
+memory_stack:
+	.fill           8192, 1, 0
+memory_stack_end:
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/smp.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/smp.c
--- linux-2.6.14-rc4/arch/ia64/kernel/smp.c	2005-08-28 17:41:01.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/smp.c	2005-10-24 10:59:18.000000000 -0600
@@ -30,6 +30,9 @@
 #include <linux/delay.h>
 #include <linux/efi.h>
 #include <linux/bitops.h>
+#ifdef CONFIG_KEXEC
+#include <linux/kexec.h>
+#endif
 
 #include <asm/atomic.h>
 #include <asm/current.h>
@@ -84,6 +87,43 @@ unlock_ipi_calllock(void)
 	spin_unlock_irq(&call_lock);
 }
 
+#ifdef CONFIG_KEXEC
+extern void kexec_fake_sal_rendez(void *start, unsigned long wake_up,
+		unsigned long pal_base);
+
+#define pte_bits	3
+#define vmlpt_bits	(impl_va_bits - PAGE_SHIFT + pte_bits)
+#define POW2(n)		(1ULL << (n))
+
+DECLARE_PER_CPU(u64, ia64_mca_pal_base);
+
+/*
+ * Stop the CPU and put it in fake SAL rendezvous. This allows CPU to wake
+ * up with IPI from boot processor
+ */
+void
+kexec_stop_this_cpu (void *func)
+{
+	unsigned long pta, impl_va_bits, pal_base;
+
+	/*
+	 * Remove this CPU by putting it into fake SAL rendezvous
+	 */
+	cpu_clear(smp_processor_id(), cpu_online_map);
+	max_xtp();
+	ia64_eoi();
+
+	/* Disable VHPT */
+	impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+	pta = POW2(61) - POW2(vmlpt_bits);
+	ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+
+	local_irq_disable();
+	pal_base = __get_cpu_var(ia64_mca_pal_base);
+	kexec_fake_sal_rendez(func, ap_wakeup_vector, pal_base);
+}
+#endif
+
 static void
 stop_this_cpu (void)
 {
diff -urNp linux-2.6.14-rc4/include/asm-ia64/kexec.h linux-2.6.14-rc4-kexec-ia64/include/asm-ia64/kexec.h
--- linux-2.6.14-rc4/include/asm-ia64/kexec.h	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/include/asm-ia64/kexec.h	2005-10-24 10:20:19.000000000 -0600
@@ -0,0 +1,22 @@
+#ifndef _ASM_IA64_KEXEC_H
+#define _ASM_IA64_KEXEC_H
+
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
+
+#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+
+/* The native architecture */
+#define KEXEC_ARCH KEXEC_ARCH_IA_64
+
+#define MAX_NOTE_BYTES 1024
+typedef u32 note_buf_t[MAX_NOTE_BYTES/4];
+
+extern note_buf_t crash_notes[];
+
+#endif /* _ASM_IA64_KEXEC_H */
diff -urNp linux-2.6.14-rc4/kernel/irq/handle.c linux-2.6.14-rc4-kexec-ia64/kernel/irq/handle.c
--- linux-2.6.14-rc4/kernel/irq/handle.c	2005-10-19 09:04:59.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/kernel/irq/handle.c	2005-10-24 09:40:27.000000000 -0600
@@ -100,6 +100,26 @@ fastcall int handle_IRQ_event(unsigned i
 }
 
 /*
+ * Terminate any outstanding interrupts
+ */
+void terminate_irqs(void)
+{
+	struct irqaction * action;
+	irq_desc_t *idesc;
+	unsigned long flags;
+	int i;
+
+	for (i=0; i<NR_IRQS; i++) {
+		idesc = irq_descp(i);
+		action = idesc->action;
+		if (!action)
+			continue;
+		if (idesc->handler->end)
+			idesc->handler->end(i);
+	}
+}
+
+/*
  * do_IRQ handles all normal device IRQ's (the special
  * SMP cross-CPU interrupts have their own specific
  * handlers).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (4 preceding siblings ...)
  2005-10-25 22:52 ` Khalid Aziz
@ 2005-10-26 18:28 ` Gerald Pfeifer
  2005-10-26 19:02 ` Luck, Tony
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Gerald Pfeifer @ 2005-10-26 18:28 UTC (permalink / raw)
  To: linux-ia64

On Tue, 25 Oct 2005, Khalid Aziz wrote:
> I have ported the original patch I had done for kexec on ia64 on 2.6.8
> kernel and fixed a few bugs in the original patch. Attached is a patch
> for kernel 2.6.14-rc4. It works with normal kexec reboot on an HP
> rx2600. I am now working on adding support for crash kexec. I am also
> working on kexec on INIT which I currently have working on 2.6.10
> kernel. I am porting it to 2.6.14-rc kernel.
> 
> Attached patch needs to be applied on top of iomem and efi_memmapwalk
> patches already in ia64 test tree (these patches attached as well for
> those who may need them). 

Cool.  Tony, what are your plans for pushing this to Linus?  Will it make 
2.6.15?

Gerald

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (5 preceding siblings ...)
  2005-10-26 18:28 ` Gerald Pfeifer
@ 2005-10-26 19:02 ` Luck, Tony
  2005-10-26 20:25 ` Eric W. Biederman
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-10-26 19:02 UTC (permalink / raw)
  To: linux-ia64

> Cool.  Tony, what are your plans for pushing this to Linus?  Will it make 
> 2.6.15?

Nanhai Zou here at Intel expressed a few concerns to me last night
about Khalid's patch.  I'll paste them here to speed discussion about
this (as I expect Nanhai is asleep at the moment, he should be
around to start commenting for himself by 4-5pm Pacific).

> I think his patch is still not able to boot an unmodified kernel.
> It appends a kernel parameter to bypass the issue, thus the second kernel need to be modified.

> It also hardcoded initrd logic in kernel patch.
> Command line is still using old command line.
> No purgatory code support etc.

> How, I prefer to put a small and clean patch in kernel while leave most of the things in kexec-tools. 
> That will provide more flexibility.

> There are also some other issues I can see, like, 
> 1. icache flusing miss
> 2. rendez code is fake, I prefer to use hotplug API.
> 3. Disable PCI master code should be in generic PCI driver code instead of IA64 arch code.

Nanhai has his own patches for kexec/kexec-tools, which are
stuck in some Intel bureaucracy at the moment ...  I'm trying
to get them unstuck so that we can get some meaningful
commentary from the community about both versions.

My biggest issue with both patches at the moment is that I
can't see how either of them can be extended to be useful
for use in crash-dump case without some significant surgery.
Both of them over-write the existing kernel with the new one,
which is a big problem when you'd like to dump the data space
of the old kernel.  Ia64 is quite happy to run a kernel loaded
at any suitably aligned address ... so why not load the new
kernel in some different location from the old kernel?

Including this in 2.6.15?  It's possible, but it's looking like
this might be a rush.  Assuming Linus releases 2.6.14 by the
end of this week, we only have a couple of weeks to check that
this runs on all of the weird configurations.  I'd need to see
a lot of "tested on xxx-config ... no problems" e-mail to get
confidence in this.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (6 preceding siblings ...)
  2005-10-26 19:02 ` Luck, Tony
@ 2005-10-26 20:25 ` Eric W. Biederman
  2005-10-26 21:43 ` Luck, Tony
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric W. Biederman @ 2005-10-26 20:25 UTC (permalink / raw)
  To: linux-ia64

"Luck, Tony" <tony.luck@intel.com> writes:

>> Cool.  Tony, what are your plans for pushing this to Linus?  Will it make 
>> 2.6.15?
>
> Nanhai Zou here at Intel expressed a few concerns to me last night
> about Khalid's patch.  I'll paste them here to speed discussion about
> this (as I expect Nanhai is asleep at the moment, he should be
> around to start commenting for himself by 4-5pm Pacific).
>
>> I think his patch is still not able to boot an unmodified kernel.
>> It appends a kernel parameter to bypass the issue, thus the second kernel need
> to be modified.
>
>> It also hardcoded initrd logic in kernel patch.
>> Command line is still using old command line.
>> No purgatory code support etc.

I agree that is an issue that should be addressed.
It would be nice if there was a kernel option to not virtually
map EFI.  Reusing a supplied virtual address is also good, but
it means we can't boot an unpatched kernel.

>> How, I prefer to put a small and clean patch in kernel while leave most of the
> things in kexec-tools.
>> That will provide more flexibility.
>
>> There are also some other issues I can see, like, 
>> 1. icache flusing miss
>> 2. rendez code is fake, I prefer to use hotplug API.
>> 3. Disable PCI master code should be in generic PCI driver code instead of
> IA64 arch code.
>
> Nanhai has his own patches for kexec/kexec-tools, which are
> stuck in some Intel bureaucracy at the moment ...  I'm trying
> to get them unstuck so that we can get some meaningful
> commentary from the community about both versions.
>
> My biggest issue with both patches at the moment is that I
> can't see how either of them can be extended to be useful
> for use in crash-dump case without some significant surgery.
> Both of them over-write the existing kernel with the new one,
> which is a big problem when you'd like to dump the data space
> of the old kernel.  Ia64 is quite happy to run a kernel loaded
> at any suitably aligned address ... so why not load the new
> kernel in some different location from the old kernel?

Interesting.  This should be a decision made by kexec-tools,
not by the kernel.  On x86 the kernel just verifies we load the
crash kernel into the reserved chunk of the address space.  I haven't
looked closely enough to see if the architecture part has fixed
address assumptions yet.  

Tony what were you seeing that made you conclude that the code
would always load over the existing kernel?

I also didn't see the trivial patch to put the 32bit compat support
in.  It's not terribly important or useful but there is no reason
not to include it.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (7 preceding siblings ...)
  2005-10-26 20:25 ` Eric W. Biederman
@ 2005-10-26 21:43 ` Luck, Tony
  2005-10-26 21:49 ` Khalid Aziz
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-10-26 21:43 UTC (permalink / raw)
  To: linux-ia64

On Wed, Oct 26, 2005 at 02:25:56PM -0600, Eric W. Biederman wrote:
> Interesting.  This should be a decision made by kexec-tools,
> not by the kernel.  On x86 the kernel just verifies we load the
> crash kernel into the reserved chunk of the address space.  I haven't
> looked closely enough to see if the architecture part has fixed
> address assumptions yet.  
> 
> Tony what were you seeing that made you conclude that the code
> would always load over the existing kernel?

Ok .. kexectools should be able to make a decision about where to load the
new kernel based on what it finds in /proc/iomem (and in the Elf header
of the new kernel).  I don't know enough Elf (elvish? :-) to know
whether the Elf header we currently generate for a kernel describes
things in a way that would convey that it is OK to drop the image
at any (suitably aligned) address, or whether there will have to be
some ia64 specific magic in the kexectools to choose the load address.

> I also didn't see the trivial patch to put the 32bit compat support
> in.  It's not terribly important or useful but there is no reason
> not to include it.

Usefullness is a key here.  The kexectools definitely include some
architecture specific components.  So taking the x86 version of the
"kexec" binary onto an ia64 system isn't going to be very useful even
if the kernel did happen to have an ia32 entry point for kexec
enabled.  Building an ia32 binary, but with all the ia64 specific
parts enabled would seem to be _challenging_ (Nanhai's version has
purgatory/arch/ia64/entry.S!).  Perhaps there might be a better outlet
for that much creativity? [Which is another way of saying that I'm
not interested in seeing a patch to enable the ia32 kexec entry point
on ia64 ... so don't waste any time creating one].

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (8 preceding siblings ...)
  2005-10-26 21:43 ` Luck, Tony
@ 2005-10-26 21:49 ` Khalid Aziz
  2005-10-26 23:21 ` Zou Nan hai
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2005-10-26 21:49 UTC (permalink / raw)
  To: linux-ia64

On Wed, 2005-10-26 at 12:02 -0700, Luck, Tony wrote:
> > Cool.  Tony, what are your plans for pushing this to Linus?  Will it make 
> > 2.6.15?
> 
> Nanhai Zou here at Intel expressed a few concerns to me last night
> about Khalid's patch.  I'll paste them here to speed discussion about
> this (as I expect Nanhai is asleep at the moment, he should be
> around to start commenting for himself by 4-5pm Pacific).
> 
> > I think his patch is still not able to boot an unmodified kernel.
> > It appends a kernel parameter to bypass the issue, thus the second kernel need to be modified.
> 

True. The only time I use this parameter is to determine whether to
virtualize EFI or not. EFI does not respond well to being virtualized
once it has been virtualized already. So the kernel needs to know if EFI
has already been virtualized by previous kernel. It is possible to pass
this information to the next kernel as a command line parameter, as I
have done, or in one of the kexec segments. One way or the other, kernel
needs to know this. I have not found a way around it. If there is one, I
would like to hear about it. That will make enable unmodified kernel to
be booted.

> > It also hardcoded initrd logic in kernel patch.

I could not find a better way to pass initrd image to ia64 kernel since
it is not placed in a fixed location. Using a fixed kexec segment looked
fairly logical to me. Alternative would be to add a type field to struct
kexec_segment, then kernel can determine which segment holds initrd
image without having to use a fixed kexec segment.

> > Command line is still using old command line.

Please explain.

> > No purgatory code support etc.
> 
> > How, I prefer to put a small and clean patch in kernel while leave most of the things in kexec-tools. 
> > That will provide more flexibility.
> 
> > There are also some other issues I can see, like, 
> > 1. icache flusing miss
> > 2. rendez code is fake, I prefer to use hotplug API.

That would be preferable, and would be a good enhancement over current
code if it can be made to work reliably. I was planning to look into it
after initial implementation (I wrote initial implementation before CPU
hotplug API was available).

> > 3. Disable PCI master code should be in generic PCI driver code instead of IA64 arch code.

Agreed. This is part of some of the cleanup that can still be done.

> 
> Nanhai has his own patches for kexec/kexec-tools, which are
> stuck in some Intel bureaucracy at the moment ...  I'm trying
> to get them unstuck so that we can get some meaningful
> commentary from the community about both versions.
> 
> My biggest issue with both patches at the moment is that I
> can't see how either of them can be extended to be useful
> for use in crash-dump case without some significant surgery.
> Both of them over-write the existing kernel with the new one,
> which is a big problem when you'd like to dump the data space
> of the old kernel.  Ia64 is quite happy to run a kernel loaded
> at any suitably aligned address ... so why not load the new
> kernel in some different location from the old kernel?
> 
> Including this in 2.6.15?  It's possible, but it's looking like
> this might be a rush.  Assuming Linus releases 2.6.14 by the
> end of this week, we only have a couple of weeks to check that
> this runs on all of the weird configurations.  I'd need to see
> a lot of "tested on xxx-config ... no problems" e-mail to get
> confidence in this.
> 
> -Tony

-- 
Khalid

==================================
Khalid Aziz                       Open Source and Linux Organization
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
                                - Alessandro Rubini


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (9 preceding siblings ...)
  2005-10-26 21:49 ` Khalid Aziz
@ 2005-10-26 23:21 ` Zou Nan hai
  2005-10-27  7:10 ` Eric W. Biederman
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Zou Nan hai @ 2005-10-26 23:21 UTC (permalink / raw)
  To: linux-ia64

Hi Khalid,

On Thu, 2005-10-27 at 05:49, Khalid Aziz wrote:
> On Wed, 2005-10-26 at 12:02 -0700, Luck, Tony wrote:
> > > Cool.  Tony, what are your plans for pushing this to Linus?  Will
> it make 
> > > 2.6.15?
> > 
> > Nanhai Zou here at Intel expressed a few concerns to me last night
> > about Khalid's patch.  I'll paste them here to speed discussion
> about
> > this (as I expect Nanhai is asleep at the moment, he should be
> > around to start commenting for himself by 4-5pm Pacific).
> > 
> > > I think his patch is still not able to boot an unmodified kernel.
> > > It appends a kernel parameter to bypass the issue, thus the second
> kernel need to be modified.
> > 
> 
> True. The only time I use this parameter is to determine whether to
> virtualize EFI or not. EFI does not respond well to being virtualized
> once it has been virtualized already. So the kernel needs to know if
> EFI
> has already been virtualized by previous kernel. It is possible to
> pass
> this information to the next kernel as a command line parameter, as I
> have done, or in one of the kexec segments. One way or the other,
> kernel
> needs to know this. I have not found a way around it. If there is one,
> I
> would like to hear about it. That will make enable unmodified kernel
> to
> be booted.
> 
> > > It also hardcoded initrd logic in kernel patch.
> 
> I could not find a better way to pass initrd image to ia64 kernel
> since
> it is not placed in a fixed location. Using a fixed kexec segment
> looked
> fairly logical to me. Alternative would be to add a type field to
> struct
> kexec_segment, then kernel can determine which segment holds initrd
> image without having to use a fixed kexec segment.
> 
> > > Command line is still using old command line.
> 
> Please explain.
> 
  Sorry, I see how your patch can deal with command line. I missed the
machine_kexec_prepare part at the first look. However I prefer to put
command line and initrd logic to kexec tools instead of hack on segment
index.

> > > No purgatory code support etc.
> > 
> > > How, I prefer to put a small and clean patch in kernel while leave
> most of the things in kexec-tools. 
> > > That will provide more flexibility.
> > 
> > > There are also some other issues I can see, like, 
> > > 1. icache flusing miss
> > > 2. rendez code is fake, I prefer to use hotplug API.
> 
> That would be preferable, and would be a good enhancement over current
> code if it can be made to work reliably. I was planning to look into
> it
> after initial implementation (I wrote initial implementation before
> CPU
> hotplug API was available).
> 
> > > 3. Disable PCI master code should be in generic PCI driver code
> instead of IA64 arch code.
> 
> Agreed. This is part of some of the cleanup that can still be done.
> 
> > 
> > Nanhai has his own patches for kexec/kexec-tools, which are
> > stuck in some Intel bureaucracy at the moment ...  I'm trying
> > to get them unstuck so that we can get some meaningful
> > commentary from the community about both versions.
> > 
> > My biggest issue with both patches at the moment is that I
> > can't see how either of them can be extended to be useful
> > for use in crash-dump case without some significant surgery.
> > Both of them over-write the existing kernel with the new one,
> > which is a big problem when you'd like to dump the data space
> > of the old kernel.  Ia64 is quite happy to run a kernel loaded
> > at any suitably aligned address ... so why not load the new
> > kernel in some different location from the old kernel?
> > 
> > Including this in 2.6.15?  It's possible, but it's looking like
> > this might be a rush.  Assuming Linus releases 2.6.14 by the
> > end of this week, we only have a couple of weeks to check that
> > this runs on all of the weird configurations.  I'd need to see
> > a lot of "tested on xxx-config ... no problems" e-mail to get
> > confidence in this.
> > 
> > -Tony
> 
> -- 
> Khalid
> 
	
	As Tony said, I have my kexec and kexec-tools patches
solved those issues. It can boots any unmodified kernel. But they are
pending at Intel bureaucracy.

Hope I can send out them to community for comments soon.

Thanks
Zou Nan hai




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (10 preceding siblings ...)
  2005-10-26 23:21 ` Zou Nan hai
@ 2005-10-27  7:10 ` Eric W. Biederman
  2005-10-27 19:05 ` Khalid Aziz
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric W. Biederman @ 2005-10-27  7:10 UTC (permalink / raw)
  To: linux-ia64

"Luck, Tony" <tony.luck@intel.com> writes:

> On Wed, Oct 26, 2005 at 02:25:56PM -0600, Eric W. Biederman wrote:
>> Interesting.  This should be a decision made by kexec-tools,
>> not by the kernel.  On x86 the kernel just verifies we load the
>> crash kernel into the reserved chunk of the address space.  I haven't
>> looked closely enough to see if the architecture part has fixed
>> address assumptions yet.  
>> 
>> Tony what were you seeing that made you conclude that the code
>> would always load over the existing kernel?
>
> Ok .. kexectools should be able to make a decision about where to load the
> new kernel based on what it finds in /proc/iomem (and in the Elf header
> of the new kernel).  I don't know enough Elf (elvish? :-) to know
> whether the Elf header we currently generate for a kernel describes
> things in a way that would convey that it is OK to drop the image
> at any (suitably aligned) address, or whether there will have to be
> some ia64 specific magic in the kexectools to choose the load address.

I don't think ld can be talked into setting ET_REL instead ET_EXEC right
now, without building as a shared library.  (readelf -a on the kernel
will tell you) but since that is a general problem it is likely worth
an extra flag to /sbin/kexec to tell it to assume an ELF executable is
relocatable even if it doesn't say ET_REL.

>> I also didn't see the trivial patch to put the 32bit compat support
>> in.  It's not terribly important or useful but there is no reason
>> not to include it.
>
> Usefullness is a key here.  The kexectools definitely include some
> architecture specific components.  So taking the x86 version of the
> "kexec" binary onto an ia64 system isn't going to be very useful even
> if the kernel did happen to have an ia32 entry point for kexec
> enabled.  Building an ia32 binary, but with all the ia64 specific
> parts enabled would seem to be _challenging_ (Nanhai's version has
> purgatory/arch/ia64/entry.S!).  Perhaps there might be a better outlet
> for that much creativity? [Which is another way of saying that I'm
> not interested in seeing a patch to enable the ia32 kexec entry point
> on ia64 ... so don't waste any time creating one].

I know of at least one application that before it flashes your rom
chip checks to see if you have kexec in your kernel.  And it does
that by calling sys_kexec and seeing if it gets -EINVAL instead
of -ENOSYS.  At least with kexec present it knows that if something
terribly goes wrong it has the chance to load another kernel, in the
event the mtd drivers in the kernel don't handle some subtle hardware
bug.  That application can safely be distributed as a 32bit binary
on i386, x86_64, and ia64.

I'm not quite certain what the build issues that would be involved
but it wouldn't surprise one of the architectures that normally run
a 32bit user space with a 64bit kernel happened to solve the issue.
So I only expect to use the code that comes pretty much for free.

The kernel side of the implementation already exists and I suspect
it is as useful as any other ia32 compat syscall entry point on the 
ia64 kernel.  I care as this is a completeness issue and I don't
see a reason not to enable the kernel side.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (11 preceding siblings ...)
  2005-10-27  7:10 ` Eric W. Biederman
@ 2005-10-27 19:05 ` Khalid Aziz
  2005-10-27 23:17 ` Zou Nan hai
  2006-04-03 22:20 ` Khalid Aziz
  14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2005-10-27 19:05 UTC (permalink / raw)
  To: linux-ia64

On Thu, 2005-10-27 at 07:21 +0800, Zou Nan hai wrote:
> 	
> 	As Tony said, I have my kexec and kexec-tools patches
> solved those issues. It can boots any unmodified kernel. But they are
> pending at Intel bureaucracy.

Can you give us some idea of how you got around the EFI virtualization
issue?

-- 
Khalid

==================================
Khalid Aziz                       Open Source and Linux Organization
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
                                - Alessandro Rubini


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (12 preceding siblings ...)
  2005-10-27 19:05 ` Khalid Aziz
@ 2005-10-27 23:17 ` Zou Nan hai
  2006-04-03 22:20 ` Khalid Aziz
  14 siblings, 0 replies; 29+ messages in thread
From: Zou Nan hai @ 2005-10-27 23:17 UTC (permalink / raw)
  To: linux-ia64

On Fri, 2005-10-28 at 03:05, Khalid Aziz wrote:
> On Thu, 2005-10-27 at 07:21 +0800, Zou Nan hai wrote:
> > 	
> > 	As Tony said, I have my kexec and kexec-tools patches
> > solved those issues. It can boots any unmodified kernel. But they are
> > pending at Intel bureaucracy.
> 
> Can you give us some idea of how you got around the EFI virtualization
> issue?

 I patched the EFI bootparam pointer in purgatory code to an empty dummy
function.

 Zou Nan hai


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] kexec on ia64
  2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
                   ` (13 preceding siblings ...)
  2005-10-27 23:17 ` Zou Nan hai
@ 2006-04-03 22:20 ` Khalid Aziz
  2006-04-04  4:20   ` Andrew Morton
  2006-04-04 18:13   ` [Fastboot] " Eric W. Biederman
  14 siblings, 2 replies; 29+ messages in thread
From: Khalid Aziz @ 2006-04-03 22:20 UTC (permalink / raw)
  To: LKML, Fastboot mailing list, Linux ia64

Add kexec support on ia64.

Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
---

diff -urNp linux-2.6.16/arch/ia64/hp/common/sba_iommu.c linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c
--- linux-2.6.16/arch/ia64/hp/common/sba_iommu.c	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c	2006-03-27 15:42:47.000000000 -0700
@@ -1624,6 +1624,28 @@ ioc_iova_init(struct ioc *ioc)
 	READ_REG(ioc->ioc_hpa + IOC_IBASE);
 }
 
+#ifdef CONFIG_KEXEC
+void
+ioc_iova_disable(void)
+{
+	struct ioc *ioc;
+
+	ioc = ioc_list;
+
+	while (ioc != NULL) {
+		/* Disable IOVA translation */
+		WRITE_REG(ioc->ibase & 0xfffffffffffffffe, ioc->ioc_hpa + IOC_IBASE);
+		READ_REG(ioc->ioc_hpa + IOC_IBASE);
+
+		/* Clear I/O TLB of any possible entries */
+		WRITE_REG(ioc->ibase | (get_iovp_order(ioc->iov_size) + iovp_shift), ioc->ioc_hpa + IOC_PCOM);
+		READ_REG(ioc->ioc_hpa + IOC_PCOM);
+
+		ioc = ioc->next;
+	}
+}
+#endif
+
 static void __init
 ioc_resource_init(struct ioc *ioc)
 {
diff -urNp linux-2.6.16/arch/ia64/Kconfig linux-2.6.16-kexec/arch/ia64/Kconfig
--- linux-2.6.16/arch/ia64/Kconfig	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/Kconfig	2006-03-27 15:42:47.000000000 -0700
@@ -376,6 +376,23 @@ config IA64_PALINFO
 config SGI_SN
 	def_bool y if (IA64_SGI_SN2 || IA64_GENERIC)
 
+config KEXEC
+	bool "kexec system call (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	help
+	  kexec is a system call that implements the ability to shutdown your
+	  current kernel, and to start another kernel.  It is like a reboot
+	  but it is indepedent of the system firmware.   And like a reboot
+	  you can start any kernel with it, not just Linux.
+
+	  The name comes from the similiarity to the exec system call.
+
+	  It is an ongoing process to be certain the hardware in a machine
+	  is properly shutdown, so do not be surprised if this code does not
+	  initially work for you.  It may help to enable device hotplugging
+	  support.  As of this writing the exact hardware interface is
+	  strongly in flux, so no good recommendation can be made.
+
 source "drivers/firmware/Kconfig"
 
 source "fs/Kconfig.binfmt"
diff -urNp linux-2.6.16/arch/ia64/kernel/crash.c linux-2.6.16-kexec/arch/ia64/kernel/crash.c
--- linux-2.6.16/arch/ia64/kernel/crash.c	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/crash.c	2006-03-27 15:49:44.000000000 -0700
@@ -0,0 +1,43 @@
+/*
+ * arch/ia64/kernel/crash.c
+ *
+ * Architecture specific (ia64) functions for kexec based crash dumps.
+ *
+ * Created by: Khalid Aziz <khalid.aziz@hp.com>
+ *
+ * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
+ *
+ */
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/irq.h>
+#include <linux/reboot.h>
+#include <linux/kexec.h>
+#include <linux/irq.h>
+#include <linux/delay.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/device.h>
+
+void
+machine_crash_shutdown(struct pt_regs *pt)
+{
+	/* This function is only called after the system
+	 * has paniced or is otherwise in a critical state.
+	 * The minimum amount of code to allow a kexec'd kernel
+	 * to run successfully needs to happen here.
+	 *
+	 * In practice this means shooting down the other cpus in
+	 * an SMP system.
+	 */
+	if (in_interrupt()) {
+		terminate_irqs();
+		ia64_eoi();
+	}
+	system_state = SYSTEM_RESTART;
+	device_shutdown();
+	system_state = SYSTEM_BOOTING;
+	machine_shutdown();
+}
diff -urNp linux-2.6.16/arch/ia64/kernel/entry.S linux-2.6.16-kexec/arch/ia64/kernel/entry.S
--- linux-2.6.16/arch/ia64/kernel/entry.S	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/entry.S	2006-03-27 15:42:47.000000000 -0700
@@ -1590,7 +1590,7 @@ sys_call_table:
 	data8 sys_mq_timedreceive		// 1265
 	data8 sys_mq_notify
 	data8 sys_mq_getsetattr
-	data8 sys_ni_syscall			// reserved for kexec_load
+	data8 sys_kexec_load
 	data8 sys_ni_syscall			// reserved for vserver
 	data8 sys_waitid			// 1270
 	data8 sys_add_key
diff -urNp linux-2.6.16/arch/ia64/kernel/machine_kexec.c linux-2.6.16-kexec/arch/ia64/kernel/machine_kexec.c
--- linux-2.6.16/arch/ia64/kernel/machine_kexec.c	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/machine_kexec.c	2006-04-03 13:42:09.000000000 -0600
@@ -0,0 +1,149 @@
+/*
+ * arch/ia64/kernel/machine_kexec.c 
+ *
+ * Handle transition of Linux booting another kernel
+ * Copyright (C) 2005 Hewlett-Packard Development Comapny, L.P.
+ * Copyright (C) 2005 Khalid Aziz <khalid.aziz@hp.com>
+ * Copyright (C) 2006 Intel Corp, Zou Nan hai <nanhai.zou@intel.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <linux/pci.h>
+#include <linux/cpu.h>
+#include <asm/mmu_context.h>
+#include <asm/setup.h>
+#include <asm/mca.h>
+#include <asm/page.h>
+#include <asm/bitops.h>
+#include <asm/tlbflush.h>
+#include <asm/delay.h>
+#include <asm/meminit.h>
+
+extern unsigned long ia64_iobase;
+
+static void set_io_base(void)
+{
+	unsigned long phys_iobase;
+
+	/* set kr0 to iobase */
+	phys_iobase = __pa(ia64_iobase);
+	ia64_set_kr(IA64_KR_IO_BASE, __IA64_UNCACHED_OFFSET | phys_iobase);
+};
+
+typedef void (*relocate_new_kernel_t)( unsigned long, unsigned long, 
+		struct ia64_boot_param *, unsigned long);
+
+/*
+ * Do what every setup is needed on image and the
+ * reboot code buffer to allow us to avoid allocations
+ * later.
+ */
+int machine_kexec_prepare(struct kimage *image)
+{
+	void *control_code_buffer;
+	const unsigned long *func;
+
+	func = (unsigned long *)&relocate_new_kernel;
+	/* Pre-load control code buffer to minimize work in kexec path */
+	control_code_buffer = page_address(image->control_code_page);
+	memcpy((void *)control_code_buffer, (const void *)func[0], 
+			relocate_new_kernel_size);
+	flush_icache_range((unsigned long)control_code_buffer, 
+			(unsigned long)control_code_buffer + relocate_new_kernel_size);
+
+	return 0;
+}
+
+void machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+#ifdef CONFIG_PCI
+void machine_shutdown(void)
+{
+	struct pci_dev *dev;
+	irq_desc_t *idesc;
+	cpumask_t mask = CPU_MASK_NONE;
+
+	/* Disable all PCI devices */
+	list_for_each_entry(dev, &pci_devices, global_list) {
+		if (!(dev->is_enabled))
+			continue;
+		idesc = irq_descp(dev->irq);
+		if (!idesc)
+			continue;
+		cpu_set(0, mask);
+		disable_irq_nosync(dev->irq);
+		idesc->handler->end(dev->irq);
+		idesc->handler->set_affinity(dev->irq, mask);
+		idesc->action = NULL;
+		pci_disable_device(dev);
+		pci_set_power_state(dev, 0);
+	}
+}
+#endif
+
+/*
+ * Do not allocate memory (or fail in any way) in machine_kexec().
+ * We are past the point of no return, committed to rebooting now. 
+ */
+void machine_kexec(struct kimage *image)
+{
+	unsigned long indirection_page;
+	relocate_new_kernel_t rnk;
+	unsigned long pta, impl_va_bits;
+	void *pal_addr = efi_get_pal_addr();
+	unsigned long code_addr = (unsigned long)page_address(image->control_code_page);
+
+#ifdef CONFIG_HOTPLUG_CPU
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (cpu != smp_processor_id())
+			cpu_down(cpu);
+	}
+#elif CONFIG_SMP
+	smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
+#endif
+
+	ia64_set_itv(1<<16);
+	/* Interrupts aren't acceptable while we reboot */
+	local_irq_disable();
+
+	/* set kr0 to the appropriate address */
+	set_io_base();
+
+	/* Disable VHPT */
+	impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+	pta = POW2(61) - POW2(vmlpt_bits);
+	ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+
+#ifdef CONFIG_IA64_HP_ZX1
+	ioc_iova_disable();
+#endif
+	/* now execute the control code.
+	 * We will start by executing the control code linked into the 
+	 * kernel as opposed to the code we copied in control code buffer		 * page. When this code switches to physical mode, we will start
+	 * executing the code in control code buffer page. Reason for
+	 * doing this is we start code execution in virtual address space.
+	 * If we were to try to execute the newly copied code in virtual
+	 * address space, we will need to make an ITLB entry to avoid ITLB 
+	 * miss. By executing the code linked into kernel, we take advantage
+	 * of the ITLB entry already in place for kernel and avoid making
+	 * a new entry.
+	 */
+	indirection_page = image->head & PAGE_MASK;
+
+	rnk = (relocate_new_kernel_t)&code_addr;
+	(*rnk)(indirection_page, image->start, ia64_boot_param,
+		     GRANULEROUNDDOWN((unsigned long) pal_addr));
+	BUG();
+	for (;;)
+		;
+}
diff -urNp linux-2.6.16/arch/ia64/kernel/Makefile linux-2.6.16-kexec/arch/ia64/kernel/Makefile
--- linux-2.6.16/arch/ia64/kernel/Makefile	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/Makefile	2006-03-27 15:42:47.000000000 -0700
@@ -28,6 +28,7 @@ obj-$(CONFIG_IA64_CYCLONE)	+= cyclone.o
 obj-$(CONFIG_CPU_FREQ)		+= cpufreq/
 obj-$(CONFIG_IA64_MCA_RECOVERY)	+= mca_recovery.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o jprobes.o
+obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o crash.o
 obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR)	+= uncached.o
 mca_recovery-y			+= mca_drv.o mca_drv_asm.o
 
diff -urNp linux-2.6.16/arch/ia64/kernel/relocate_kernel.S linux-2.6.16-kexec/arch/ia64/kernel/relocate_kernel.S
--- linux-2.6.16/arch/ia64/kernel/relocate_kernel.S	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/relocate_kernel.S	2006-03-31 09:04:10.000000000 -0700
@@ -0,0 +1,359 @@
+/*
+ * arch/ia64/kernel/relocate_kernel.S 
+ *
+ * Relocate kexec'able kernel and start it
+ *
+ * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
+ * Copyright (C) 2005 Khalid Aziz  <khalid.aziz@hp.com>
+ * Copyright (C) 2005 Intel Corp,  Zou Nan hai <nanhai.zou@intel.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+#include <linux/config.h>
+#include <asm/asmmacro.h>
+#include <asm/kregs.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/mca_asm.h>
+
+       /* Must be relocatable PIC code callable as a C function, that once
+        * it starts can not use the previous processes stack.
+        *
+        */
+GLOBAL_ENTRY(relocate_new_kernel)
+	.prologue
+	alloc r31=ar.pfs,4,0,0,0
+        .body
+.reloc_entry:
+{
+	rsm psr.i| psr.ic
+	mov r2=ip
+}
+	;;
+{
+        flushrs                         // must be first insn in group
+        srlz.i
+}
+	;;
+
+	//first switch to physical mode
+	add r3\x1f-.reloc_entry, r2
+	movl r16 = IA64_PSR_AC|IA64_PSR_BN|IA64_PSR_IC|IA64_PSR_MFL
+	mov ar.rsc=0	          	// put RSE in enforced lazy mode
+	;;
+	add r2=(memory_stack-.reloc_entry), r2
+	;;
+	add sp=(memory_stack_end - .reloc_entry),r2
+	add r8=(register_stack - .reloc_entry),r2
+	;;
+	tpa sp=sp
+	tpa r3=r3
+	;;
+	loadrs
+	;;
+	mov r18=ar.rnat
+	mov ar.bspstore=r8
+	;;
+        mov cr.ipsr=r16
+        mov cr.iip=r3
+        mov cr.ifs=r0
+	srlz.i
+	;;
+	mov ar.rnat=r18
+	rfi
+	;;
+1:
+	//physical mode code begin
+	mov b6=in1
+	tpa r28=in2			// tpa must before TLB purge
+
+	// purge all TC entries
+#define O(member)       IA64_CPUINFO_##member##_OFFSET
+        GET_THIS_PADDR(r2, cpu_info)    // load phys addr of cpu_info into r2
+        ;;
+        addl r17=O(PTCE_STRIDE),r2
+        addl r2=O(PTCE_BASE),r2
+        ;;
+        ld8 r18=[r2],(O(PTCE_COUNT)-O(PTCE_BASE));;    	// r18=ptce_base
+        ld4 r19=[r2],4                                  // r19=ptce_count[0]
+        ld4 r21=[r17],4                                 // r21=ptce_stride[0]
+        ;;
+        ld4 r20=[r2]                                    // r20=ptce_count[1]
+        ld4 r22=[r17]                                   // r22=ptce_stride[1]
+        mov r24=r0
+        ;;
+        adds r20=-1,r20
+        ;;
+#undef O
+2:
+        cmp.ltu p6,p7=r24,r19
+(p7)    br.cond.dpnt.few 4f
+        mov ar.lc=r20
+3:
+        ptc.e r18
+        ;;
+        add r18=r22,r18
+        br.cloop.sptk.few 3b
+        ;;
+        add r18=r21,r18
+        add r24=1,r24
+        ;;
+        br.sptk.few 2b
+4:
+        srlz.i
+        ;;
+	//purge TR entry for kernel text and data
+        movl r16=KERNEL_START
+        mov r18=KERNEL_TR_PAGE_SHIFT<<2
+        ;;
+        ptr.i r16, r18
+        ptr.d r16, r18
+        ;;
+        srlz.i
+        ;;
+
+	// purge TR entry for percpu data
+        movl r16=PERCPU_ADDR
+        mov r18=PERCPU_PAGE_SHIFT<<2
+        ;;
+        ptr.d r16,r18
+        ;;
+        srlz.d
+	;;
+
+        // purge TR entry for pal code
+        mov r16=in3
+        mov r18=IA64_GRANULE_SHIFT<<2
+        ;;
+        ptr.i r16,r18
+        ;;
+        srlz.i
+	;;
+
+        // purge TR entry for stack
+        mov r16=IA64_KR(CURRENT_STACK)
+        ;;
+        shl r16=r16,IA64_GRANULE_SHIFT
+        movl r19=PAGE_OFFSET
+        ;;
+        add r16=r19,r16
+        mov r18=IA64_GRANULE_SHIFT<<2
+        ;;
+        ptr.d r16,r18
+        ;;
+        srlz.i
+	;;
+
+	// copy kexec kernel segments
+	movl r16=PAGE_MASK
+	ld8  r30=[in0],8;;			// in0 is page_list
+	br.sptk.few .dest_page
+	;;
+.loop:
+	ld8  r30=[in0], 8;;
+.dest_page:
+	tbit.z p0, p6=r30, 0;;    	// 0x1 dest page
+(p6)	and r17=r30, r16
+(p6)	br.cond.sptk.few .loop;;
+
+	tbit.z p0, p6=r30, 1;;		// 0x2 indirect page
+(p6)	and in0=r30, r16
+(p6)	br.cond.sptk.few .loop;;
+
+	tbit.z p0, p6=r30, 2;;		// 0x4 end flag
+(p6)	br.cond.sptk.few .end_loop;;
+
+	tbit.z p6, p0=r30, 3;;		// 0x8 source page
+(p6)	br.cond.sptk.few .loop
+
+	and r18=r30, r16
+
+	// simple copy page, may optimize later
+	movl r14=PAGE_SIZE/8 - 1;;
+	mov ar.lc=r14;;
+1:
+	ld8 r14=[r18], 8;;
+	st8 [r17]=r14, 8;;
+	fc.i r17
+	br.ctop.sptk.few 1b
+	br.sptk.few .loop
+	;;
+
+.end_loop:
+	sync.i			// for fc.i
+	;;
+	srlz.i
+	;;
+	srlz.d
+	;;
+	br.call.sptk.many b0¶;;
+memory_stack:
+	.fill           8192, 1, 0
+memory_stack_end:
+register_stack:
+	.fill           8192, 1, 0
+register_stack_end:
+relocate_new_kernel_end:
+END(relocate_new_kernel)
+
+GLOBAL_ENTRY(kexec_fake_sal_rendez)
+	.prologue
+	alloc r31=ar.pfs,3,0,0,0
+	.body
+.rendez_entry:
+	rsm	psr.i | psr.ic
+	mov r25=ip
+	;;
+	{
+		flushrs
+		srlz.i
+	}
+	;;
+       /* See where I am running, and compute gp */
+	{
+		mov     ar.rsc = 0      /* Put RSE in enforce lacy, LE mode */
+		mov     gp = ip         /* gp = relocate_new_kernel */
+	}
+
+	movl r8=0x00000100000000
+	;;
+	mov cr.iva=r8
+	/* Transition from virtual to physical mode */
+	srlz.i
+	;;
+	add	r17_-.rendez_entry, r25
+	movl	r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+	;;
+	tpa	r17=r17
+	mov	cr.ipsr=r16
+	;;
+	mov	cr.iip=r17
+	mov	cr.ifs=r0
+	;;
+	rfi
+	;;
+5:
+	mov     b6=in0			/* _start addr */
+	mov	r8=in1			/* ap_wakeup_vector */
+	mov	r26=in2			/* PAL addr */
+	;;
+	/* Purge kernel TRs */
+	movl	r16=KERNEL_START
+	mov	r18=KERNEL_TR_PAGE_SHIFT<<2
+	;;
+	ptr.i	r16,r18
+	ptr.d	r16,r18
+	;;
+	srlz.i
+	;;
+	srlz.d
+	;;
+	/* Purge percpu TR */
+	movl	r16=PERCPU_ADDR
+	mov	r18=PERCPU_PAGE_SHIFT<<2
+	;;
+	ptr.d	r16,r18
+	;;
+	srlz.d
+	;;
+	/* Purge PAL TR */
+	mov	r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.i	r26,r18
+	;;
+	srlz.i
+	;;
+	/* Purge stack TR */
+	mov	r16=IA64_KR(CURRENT_STACK)
+	;;
+	shl	r16=r16,IA64_GRANULE_SHIFT
+	movl	r19=PAGE_OFFSET
+	;;
+	add	r16=r19,r16
+	mov	r18=IA64_GRANULE_SHIFT<<2
+	;;
+	ptr.d	r16,r18
+	;;
+	srlz.i
+	;;
+
+	/* Ensure we can read and clear external interrupts */
+	mov	cr.tpr=r0
+	srlz.d
+
+	shr.u	r9=r8,6			/* which irr */
+	;;
+	and	r8c,r8		/* bit offset into irr */
+	;;
+	mov	r10=1;;
+	;;
+	shl	r10=r10,r8		/* bit mask off irr we want */
+	cmp.eq	p6,p0=0,r9
+	;;
+(p6)	br.cond.sptk.few        check_irr0
+	cmp.eq	p7,p0=1,r9
+	;;
+(p7)	br.cond.sptk.few        check_irr1
+	cmp.eq	p8,p0=2,r9
+	;;
+(p8)	br.cond.sptk.few        check_irr2
+	cmp.eq	p9,p0=3,r9
+	;;
+(p9)	br.cond.sptk.few        check_irr3
+
+check_irr0:
+	mov	r8=cr.irr0
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr0
+	br.few	call_start
+	
+check_irr1:
+	mov	r8=cr.irr1
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr1
+	br.few	call_start
+	
+check_irr2:
+	mov	r8=cr.irr2
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr2
+	br.few	call_start
+	
+check_irr3:
+	mov	r8=cr.irr3
+	;;
+	and	r8=r8,r10
+	;;
+	cmp.eq	p6,p0=0,r8
+(p6)	br.cond.sptk.few	check_irr3
+	br.few	call_start
+	
+call_start:
+	mov	cr.eoi=r0
+	;;
+	srlz.d
+	;;
+	mov	r8=cr.ivr
+	;;
+	srlz.d
+	;;
+	cmp.eq	p0,p6\x15,r8
+(p6)	br.cond.sptk.few	call_start
+	br.sptk.few		b6
+kexec_fake_sal_rendez_end:
+END(kexec_fake_sal_rendez)
+
+	.global relocate_new_kernel_size
+relocate_new_kernel_size:
+	data8	kexec_fake_sal_rendez_end - relocate_new_kernel
+
diff -urNp linux-2.6.16/arch/ia64/kernel/smp.c linux-2.6.16-kexec/arch/ia64/kernel/smp.c
--- linux-2.6.16/arch/ia64/kernel/smp.c	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/smp.c	2006-03-27 17:14:04.000000000 -0700
@@ -30,6 +30,7 @@
 #include <linux/delay.h>
 #include <linux/efi.h>
 #include <linux/bitops.h>
+#include <linux/kexec.h>
 
 #include <asm/atomic.h>
 #include <asm/current.h>
@@ -84,6 +85,34 @@ unlock_ipi_calllock(void)
 	spin_unlock_irq(&call_lock);
 }
 
+#ifdef CONFIG_KEXEC
+/*
+ * Stop the CPU and put it in fake SAL rendezvous. This allows CPU to wake
+ * up with IPI from boot processor
+ */
+void
+kexec_stop_this_cpu (void *func)
+{
+	unsigned long pta, impl_va_bits, pal_base;
+
+	/*
+	 * Remove this CPU by putting it into fake SAL rendezvous
+	 */
+	cpu_clear(smp_processor_id(), cpu_online_map);
+	max_xtp();
+	ia64_eoi();
+
+	/* Disable VHPT */
+	impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+	pta = POW2(61) - POW2(vmlpt_bits);
+	ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+
+	local_irq_disable();
+	pal_base = __get_cpu_var(ia64_mca_pal_base);
+	kexec_fake_sal_rendez(func, ap_wakeup_vector, pal_base);
+}
+#endif
+
 static void
 stop_this_cpu (void)
 {
diff -urNp linux-2.6.16/include/asm-ia64/kexec.h linux-2.6.16-kexec/include/asm-ia64/kexec.h
--- linux-2.6.16/include/asm-ia64/kexec.h	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/include/asm-ia64/kexec.h	2006-03-30 11:46:46.000000000 -0700
@@ -0,0 +1,36 @@
+#ifndef _ASM_IA64_KEXEC_H
+#define _ASM_IA64_KEXEC_H
+
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
+
+#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+
+/* The native architecture */
+#define KEXEC_ARCH KEXEC_ARCH_IA_64
+
+#define MAX_NOTE_BYTES 1024
+
+#define pte_bits	3
+#define vmlpt_bits	(impl_va_bits - PAGE_SHIFT + pte_bits)
+#define POW2(n)		(1ULL << (n))
+
+DECLARE_PER_CPU(u64, ia64_mca_pal_base);
+
+const extern unsigned int relocate_new_kernel_size;
+volatile extern long kexec_rendez;
+extern void relocate_new_kernel(unsigned long, unsigned long, 
+		struct ia64_boot_param *, unsigned long);
+extern void kexec_fake_sal_rendez(void *start, unsigned long wake_up,
+		unsigned long pal_base);
+
+static inline void
+crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs)
+{
+}
+#endif /* _ASM_IA64_KEXEC_H */
diff -urNp linux-2.6.16/include/asm-ia64/machvec_hpzx1.h linux-2.6.16-kexec/include/asm-ia64/machvec_hpzx1.h
--- linux-2.6.16/include/asm-ia64/machvec_hpzx1.h	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/include/asm-ia64/machvec_hpzx1.h	2006-03-27 15:58:38.000000000 -0700
@@ -34,4 +34,6 @@ extern ia64_mv_dma_mapping_error	sba_dma
 #define platform_dma_supported			sba_dma_supported
 #define platform_dma_mapping_error		sba_dma_mapping_error
 
+extern void ioc_iova_disable(void);
+
 #endif /* _ASM_IA64_MACHVEC_HPZX1_h */
diff -urNp linux-2.6.16/include/asm-ia64/smp.h linux-2.6.16-kexec/include/asm-ia64/smp.h
--- linux-2.6.16/include/asm-ia64/smp.h	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/include/asm-ia64/smp.h	2006-03-27 15:52:51.000000000 -0700
@@ -129,6 +129,9 @@ extern void smp_send_reschedule (int cpu
 extern void lock_ipi_calllock(void);
 extern void unlock_ipi_calllock(void);
 extern void identify_siblings (struct cpuinfo_ia64 *);
+#ifdef CONFIG_KEXEC
+extern void kexec_stop_this_cpu(void *);
+#endif
 
 #else
 
diff -urNp linux-2.6.16/include/linux/irq.h linux-2.6.16-kexec/include/linux/irq.h
--- linux-2.6.16/include/linux/irq.h	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/include/linux/irq.h	2006-03-27 15:49:27.000000000 -0700
@@ -94,6 +94,7 @@ irq_descp (int irq)
 #include <asm/hw_irq.h> /* the arch dependent stuff */
 
 extern int setup_irq(unsigned int irq, struct irqaction * new);
+extern void terminate_irqs(void);
 
 #ifdef CONFIG_GENERIC_HARDIRQS
 extern cpumask_t irq_affinity[NR_IRQS];
diff -urNp linux-2.6.16/kernel/irq/manage.c linux-2.6.16-kexec/kernel/irq/manage.c
--- linux-2.6.16/kernel/irq/manage.c	2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/kernel/irq/manage.c	2006-03-27 17:02:08.000000000 -0700
@@ -377,3 +377,22 @@ int request_irq(unsigned int irq,
 
 EXPORT_SYMBOL(request_irq);
 
+/*
+ * Terminate any outstanding interrupts
+ */
+void terminate_irqs(void)
+{
+	struct irqaction * action;
+	irq_desc_t *idesc;
+	int i;
+
+	for (i=0; i<NR_IRQS; i++) {
+		idesc = irq_descp(i);
+		action = idesc->action;
+		if (!action)
+			continue;
+		if (idesc->handler->end)
+			idesc->handler->end(i);
+	}
+}
+



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] kexec on ia64
  2006-04-03 22:20 ` Khalid Aziz
@ 2006-04-04  4:20   ` Andrew Morton
  2006-04-04  6:07     ` [Fastboot] " Michael Ellerman
  2006-04-05 16:11     ` Khalid Aziz
  2006-04-04 18:13   ` [Fastboot] " Eric W. Biederman
  1 sibling, 2 replies; 29+ messages in thread
From: Andrew Morton @ 2006-04-04  4:20 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: linux-kernel, fastboot, linux-ia64

Khalid Aziz <khalid_aziz@hp.com> wrote:
>
> Add kexec support on ia64.
> 

Neat.  How well does it work?

> +#ifdef CONFIG_PCI
> +void machine_shutdown(void)
> +{
> +	struct pci_dev *dev;
> +	irq_desc_t *idesc;
> +	cpumask_t mask = CPU_MASK_NONE;
> +
> +	/* Disable all PCI devices */
> +	list_for_each_entry(dev, &pci_devices, global_list) {
> +		if (!(dev->is_enabled))
> +			continue;
> +		idesc = irq_descp(dev->irq);
> +		if (!idesc)
> +			continue;
> +		cpu_set(0, mask);
> +		disable_irq_nosync(dev->irq);
> +		idesc->handler->end(dev->irq);
> +		idesc->handler->set_affinity(dev->irq, mask);
> +		idesc->action = NULL;
> +		pci_disable_device(dev);
> +		pci_set_power_state(dev, 0);
> +	}
> +}
> +#endif

Ahem.

  /* Do NOT directly access these two variables, unless you are arch specific pci
   * code, or pci core code. */
  extern struct list_head pci_root_buses;	/* list of all known PCI buses */
  extern struct list_head pci_devices;	/* list of all devices */

I think it would be kinder to the API to use pci_find_device(PCI_ANY_ID,
PCI_ANY_ID, ...) here.

> +/*
> + * Do not allocate memory (or fail in any way) in machine_kexec().
> + * We are past the point of no return, committed to rebooting now. 
> + */
> +void machine_kexec(struct kimage *image)
> +{
> +	unsigned long indirection_page;
> +	relocate_new_kernel_t rnk;
> +	unsigned long pta, impl_va_bits;
> +	void *pal_addr = efi_get_pal_addr();
> +	unsigned long code_addr = (unsigned long)page_address(image->control_code_page);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		if (cpu != smp_processor_id())
> +			cpu_down(cpu);
> +	}
> +#elif CONFIG_SMP

This will generate a CPP warning if CONFIG_SMP is not defined.

	#elif defined(CONFIG_SMP)

would be preferred.

> --- linux-2.6.16/kernel/irq/manage.c	2006-03-19 22:53:29.000000000 -0700
> +++ linux-2.6.16-kexec/kernel/irq/manage.c	2006-03-27 17:02:08.000000000 -0700
> @@ -377,3 +377,22 @@ int request_irq(unsigned int irq,
>  
>  EXPORT_SYMBOL(request_irq);
>  
> +/*
> + * Terminate any outstanding interrupts
> + */
> +void terminate_irqs(void)
> +{
> +	struct irqaction * action;
> +	irq_desc_t *idesc;
> +	int i;
> +
> +	for (i=0; i<NR_IRQS; i++) {

	for (i = 0; i < NR_IRQS; i++) {

> +		idesc = irq_descp(i);
> +		action = idesc->action;
> +		if (!action)
> +			continue;
> +		if (idesc->handler->end)
> +			idesc->handler->end(i);
> +	}
> +}

Could we have a bit more description of what this function does, and why we
need it?

Should other kexec-using architectures be using this?  If not, why does
ia64 need it?

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] Re: [PATCH] kexec on ia64
  2006-04-04  4:20   ` Andrew Morton
@ 2006-04-04  6:07     ` Michael Ellerman
  2006-04-05 16:11     ` Khalid Aziz
  1 sibling, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2006-04-04  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Khalid Aziz, linux-ia64, fastboot, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

On Mon, 2006-04-03 at 21:20 -0700, Andrew Morton wrote:
> Khalid Aziz <khalid_aziz@hp.com> wrote:
> > +/*
> > + * Terminate any outstanding interrupts
> > + */
> > +void terminate_irqs(void)
> > +{
> > +	struct irqaction * action;
> > +	irq_desc_t *idesc;
> > +	int i;
> > +
> > +	for (i=0; i<NR_IRQS; i++) {
> 
> 	for (i = 0; i < NR_IRQS; i++) {
> 
> > +		idesc = irq_descp(i);
> > +		action = idesc->action;
> > +		if (!action)
> > +			continue;
> > +		if (idesc->handler->end)
> > +			idesc->handler->end(i);
> > +	}
> > +}
> 
> Could we have a bit more description of what this function does, and why we
> need it?
> 
> Should other kexec-using architectures be using this?  If not, why does
> ia64 need it?

We've been kicking around a patch to do something similar, we also eoi
anything that's outstanding. I can't find the patch just now, but it's
on linuxppc somewhere I think.

cheers

-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
  2006-04-03 22:20 ` Khalid Aziz
  2006-04-04  4:20   ` Andrew Morton
@ 2006-04-04 18:13   ` Eric W. Biederman
  2006-04-05 16:34     ` Khalid Aziz
  1 sibling, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2006-04-04 18:13 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: LKML, Fastboot mailing list, Linux ia64

Khalid Aziz <khalid_aziz@hp.com> writes:

> Add kexec support on ia64.

This looks like a starting place but this patch needs some
more work.

> Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
> ---
>
> diff -urNp linux-2.6.16/arch/ia64/hp/common/sba_iommu.c
> linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c
> --- linux-2.6.16/arch/ia64/hp/common/sba_iommu.c 2006-03-19 22:53:29.000000000
> -0700
> +++ linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c 2006-03-27
> 15:42:47.000000000 -0700
> @@ -1624,6 +1624,28 @@ ioc_iova_init(struct ioc *ioc)
>  	READ_REG(ioc->ioc_hpa + IOC_IBASE);
>  }
>  
> +#ifdef CONFIG_KEXEC
> +void
> +ioc_iova_disable(void)
> +{
> +	struct ioc *ioc;
> +
> +	ioc = ioc_list;
> +
> +	while (ioc != NULL) {
> +		/* Disable IOVA translation */
> + WRITE_REG(ioc->ibase & 0xfffffffffffffffe, ioc->ioc_hpa + IOC_IBASE);
> +		READ_REG(ioc->ioc_hpa + IOC_IBASE);
> +
> +		/* Clear I/O TLB of any possible entries */
> + WRITE_REG(ioc->ibase | (get_iovp_order(ioc->iov_size) + iovp_shift),
> ioc->ioc_hpa + IOC_PCOM);
> +		READ_REG(ioc->ioc_hpa + IOC_PCOM);
> +
> +		ioc = ioc->next;
> +	}
> +}
> +#endif
> +
>  static void __init
>  ioc_resource_init(struct ioc *ioc)
>  {
> diff -urNp linux-2.6.16/arch/ia64/Kconfig linux-2.6.16-kexec/arch/ia64/Kconfig
> --- linux-2.6.16/arch/ia64/Kconfig	2006-03-19 22:53:29.000000000 -0700
> +++ linux-2.6.16-kexec/arch/ia64/Kconfig 2006-03-27 15:42:47.000000000 -0700
> @@ -376,6 +376,23 @@ config IA64_PALINFO
>  config SGI_SN
>  	def_bool y if (IA64_SGI_SN2 || IA64_GENERIC)
>  
> +config KEXEC
> +	bool "kexec system call (EXPERIMENTAL)"
> +	depends on EXPERIMENTAL
> +	help
> +	  kexec is a system call that implements the ability to shutdown your
> +	  current kernel, and to start another kernel.  It is like a reboot
> +	  but it is indepedent of the system firmware.   And like a reboot
> +	  you can start any kernel with it, not just Linux.
> +
> +	  The name comes from the similiarity to the exec system call.
> +
> +	  It is an ongoing process to be certain the hardware in a machine
> +	  is properly shutdown, so do not be surprised if this code does not
> +	  initially work for you.  It may help to enable device hotplugging
> +	  support.  As of this writing the exact hardware interface is
> +	  strongly in flux, so no good recommendation can be made.
> +
>  source "drivers/firmware/Kconfig"
>  
>  source "fs/Kconfig.binfmt"
> diff -urNp linux-2.6.16/arch/ia64/kernel/crash.c
> linux-2.6.16-kexec/arch/ia64/kernel/crash.c
> --- linux-2.6.16/arch/ia64/kernel/crash.c 1969-12-31 17:00:00.000000000 -0700
> +++ linux-2.6.16-kexec/arch/ia64/kernel/crash.c 2006-03-27 15:49:44.000000000
> -0700
> @@ -0,0 +1,43 @@
> +/*
> + * arch/ia64/kernel/crash.c
> + *
> + * Architecture specific (ia64) functions for kexec based crash dumps.
> + *
> + * Created by: Khalid Aziz <khalid.aziz@hp.com>
> + *
> + * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
> + *
> + */
> +#include <linux/init.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/smp.h>
> +#include <linux/irq.h>
> +#include <linux/reboot.h>
> +#include <linux/kexec.h>
> +#include <linux/irq.h>
> +#include <linux/delay.h>
> +#include <linux/elf.h>
> +#include <linux/elfcore.h>
> +#include <linux/device.h>
> +
> +void
> +machine_crash_shutdown(struct pt_regs *pt)
> +{
> +	/* This function is only called after the system
> +	 * has paniced or is otherwise in a critical state.
> +	 * The minimum amount of code to allow a kexec'd kernel
> +	 * to run successfully needs to happen here.
> +	 *
> +	 * In practice this means shooting down the other cpus in
> +	 * an SMP system.
> +	 */
> +	if (in_interrupt()) {
> +		terminate_irqs();
> +		ia64_eoi();
> +	}
> +	system_state = SYSTEM_RESTART;
> +	device_shutdown();
> +	system_state = SYSTEM_BOOTING;
> +	machine_shutdown();
> +}

machine_crash_shutdown must not call device_shutdown.  That has
been shown to way exceed the minimum necessary to shutdown a system.
I would prefer this to be a noop stub that doesn't work at all than
something like this that does way too much, and makes people think
the code will work.

As for terminate_irqs on x86 we do that on bootup not in the middle
of a crash shutdown.  The apics and xapics are close enough you
should be able to do the same on ia64.

You display remarkable faith in a kernel that has paniced.

> +#ifdef CONFIG_PCI
> +void machine_shutdown(void)
> +{
> +	struct pci_dev *dev;
> +	irq_desc_t *idesc;
> +	cpumask_t mask = CPU_MASK_NONE;
> +
> +	/* Disable all PCI devices */
> +	list_for_each_entry(dev, &pci_devices, global_list) {
> +		if (!(dev->is_enabled))
> +			continue;
> +		idesc = irq_descp(dev->irq);
> +		if (!idesc)
> +			continue;
> +		cpu_set(0, mask);
> +		disable_irq_nosync(dev->irq);
> +		idesc->handler->end(dev->irq);
> +		idesc->handler->set_affinity(dev->irq, mask);
> +		idesc->action = NULL;
> +		pci_disable_device(dev);
> +		pci_set_power_state(dev, 0);
> +	}
> +}
> +#endif

This is peculiar but almost sane.  We don't do this on x86,
because devices are peculiar enough that no generic sequence works.
What you have above belongs in the shutdown methods of the pci
devices.  There is no way to get this right in the general case.

some of the irq disable logic may in fact be sane.

Unless there is a good reason not to machine_shutdown needs
to be called from machine_restart.  So the code is routinely
used and tested.

Having machine_shutdown only build when you have PCI present
and then not making KEXEC depend on PCI is wrong.

The #ifdef needs to move inside machine_shutdown.

> +
> +/*
> + * Do not allocate memory (or fail in any way) in machine_kexec().
> + * We are past the point of no return, committed to rebooting now. 
> + */
> +void machine_kexec(struct kimage *image)
> +{
> +	unsigned long indirection_page;
> +	relocate_new_kernel_t rnk;
> +	unsigned long pta, impl_va_bits;
> +	void *pal_addr = efi_get_pal_addr();
> + unsigned long code_addr = (unsigned
> long)page_address(image->control_code_page);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		if (cpu != smp_processor_id())
> +			cpu_down(cpu);
> +	}
> +#elif CONFIG_SMP
> +	smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
> +#endif

This CPU and HOTPUG_CPU stuff belongs in machine shutdown.

> +
> +	ia64_set_itv(1<<16);
> +	/* Interrupts aren't acceptable while we reboot */
> +	local_irq_disable();
> +
> +	/* set kr0 to the appropriate address */
> +	set_io_base();
> +
> +	/* Disable VHPT */
> +	impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
> +	pta = POW2(61) - POW2(vmlpt_bits);
> +	ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
> +
> +#ifdef CONFIG_IA64_HP_ZX1
> +	ioc_iova_disable();
> +#endif

This also looks like it needs to be part of machine_shutdown.
I have no confidence in ioc_iova_disable when the machine is crashing.
Basically anything that touches a pointer is likely to be bad.

> +	/* now execute the control code.
> +	 * We will start by executing the control code linked into the 
> + * kernel as opposed to the code we copied in control code buffer * page. When
> this code switches to physical mode, we will start
> +	 * executing the code in control code buffer page. Reason for
> +	 * doing this is we start code execution in virtual address space.
> +	 * If we were to try to execute the newly copied code in virtual
> +	 * address space, we will need to make an ITLB entry to avoid ITLB 
> +	 * miss. By executing the code linked into kernel, we take advantage
> +	 * of the ITLB entry already in place for kernel and avoid making
> +	 * a new entry.
> +	 */
> +	indirection_page = image->head & PAGE_MASK;
> +
> +	rnk = (relocate_new_kernel_t)&code_addr;
> +	(*rnk)(indirection_page, image->start, ia64_boot_param,
> +		     GRANULEROUNDDOWN((unsigned long) pal_addr));
> +	BUG();
> +	for (;;)
> +		;
> +}


Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Fastboot] [PATCH] kexec on ia64
  2004-11-15 20:41 Khalid Aziz
  2004-11-16  3:46 ` Khalid Aziz
@ 2006-04-05  0:36 ` Zou, Nanhai
       [not found]   ` <20060405101243.e3e4f772.kamezawa.hiroyu@jp.fujitsu.com>
  2006-04-05  1:13 ` Zou, Nanhai
  2006-04-05  1:34 ` Zou, Nanhai
  3 siblings, 1 reply; 29+ messages in thread
From: Zou, Nanhai @ 2006-04-05  0:36 UTC (permalink / raw)
  To: Eric W. Biederman, Khalid Aziz; +Cc: LKML, Fastboot mailing list, Linux ia64

> -----Original Message-----
> From: linux-ia64-owner@vger.kernel.org
> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Eric W. Biederman
> Sent: 2006Äê4ÔÂ5ÈÕ 2:14
> To: Khalid Aziz
> Cc: LKML; Fastboot mailing list; Linux ia64
> Subject: Re: [Fastboot] [PATCH] kexec on ia64
> 
> Khalid Aziz <khalid_aziz@hp.com> writes:
> 
> > Add kexec support on ia64.
> 
> This looks like a starting place but this patch needs some
> more work.
> 
Eric,
	Khalid is also merging my ia64 kdump patch posted in http://lkml.org/lkml/2006/3/14/46.
	Hopefully those issues you pointed out will be solved once the kdump patch is merged. 

Thanks
Zou Nan hai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Fastboot] [PATCH] kexec on ia64
  2004-11-15 20:41 Khalid Aziz
  2004-11-16  3:46 ` Khalid Aziz
  2006-04-05  0:36 ` Zou, Nanhai
@ 2006-04-05  1:13 ` Zou, Nanhai
  2006-04-05  1:27   ` KAMEZAWA Hiroyuki
  2006-04-05  1:34 ` Zou, Nanhai
  3 siblings, 1 reply; 29+ messages in thread
From: Zou, Nanhai @ 2006-04-05  1:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: ebiederm, khalid_aziz, linux-kernel, fastboot, linux-ia64


> -----Original Message-----
> From: KAMEZAWA Hiroyuki [mailto:kamezawa.hiroyu@jp.fujitsu.com]
> Sent: 2006年4月5日 9:13
> To: Zou, Nanhai
> Cc: ebiederm@xmission.com; khalid_aziz@hp.com;
> linux-kernel@vger.kernel.org; fastboot@lists.osdl.org;
> linux-ia64@vger.kernel.org
> Subject: Re: [Fastboot] [PATCH] kexec on ia64
> 
> On Wed, 5 Apr 2006 08:36:07 +0800
> "Zou, Nanhai" <nanhai.zou@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: linux-ia64-owner@vger.kernel.org
> > > [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Eric W. Biederman
> > > Sent: 2006年4月5日 2:14
> > > To: Khalid Aziz
> > > Cc: LKML; Fastboot mailing list; Linux ia64
> > > Subject: Re: [Fastboot] [PATCH] kexec on ia64
> > >
> > > Khalid Aziz <khalid_aziz@hp.com> writes:
> > >
> > > > Add kexec support on ia64.
> > >
> > > This looks like a starting place but this patch needs some
> > > more work.
> > >
> > Eric,
> > 	Khalid is also merging my ia64 kdump patch posted in
> http://lkml.org/lkml/2006/3/14/46.
> > 	Hopefully those issues you pointed out will be solved once the kdump patch
> is merged.
> >
> Hi, I have a question about kexec/kdump.
> 
> How does kdump know memory layout (of old kernel) now ?
> 
> I'm working for memory hotplug. When memory is hot-added, memory layout
> changes.
> But I think there is no code to manage memory layout information of added memory.
> 
 It reads memory layout from /proc/iomem...,
 If memory is hotpluged, I think we need a reload of kdump.


> Thanks,
> - Kame
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
  2006-04-05  1:13 ` Zou, Nanhai
@ 2006-04-05  1:27   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 29+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-05  1:27 UTC (permalink / raw)
  To: Zou, Nanhai; +Cc: ebiederm, khalid_aziz, linux-kernel, fastboot, linux-ia64

On Wed, 5 Apr 2006 09:13:36 +0800
"Zou, Nanhai" <nanhai.zou@intel.com> wrote:
> > I'm working for memory hotplug. When memory is hot-added, memory layout
> > changes.
> > But I think there is no code to manage memory layout information of added memory.
> > 
>  It reads memory layout from /proc/iomem...,
>  If memory is hotpluged, I think we need a reload of kdump.
> 
If /proc/iomem is updated at hotplug event (this is not updated now),
is there no problem ?

calling insert_resource like  efi_initialize_iomem_resources() is good ?

-Kame


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Fastboot] [PATCH] kexec on ia64
  2004-11-15 20:41 Khalid Aziz
                   ` (2 preceding siblings ...)
  2006-04-05  1:13 ` Zou, Nanhai
@ 2006-04-05  1:34 ` Zou, Nanhai
  3 siblings, 0 replies; 29+ messages in thread
From: Zou, Nanhai @ 2006-04-05  1:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: ebiederm, khalid_aziz, linux-kernel, fastboot, linux-ia64

> -----Original Message-----
> From: KAMEZAWA Hiroyuki [mailto:kamezawa.hiroyu@jp.fujitsu.com]
> Sent: 2006Äê4ÔÂ5ÈÕ 9:28
> To: Zou, Nanhai
> Cc: ebiederm@xmission.com; khalid_aziz@hp.com; linux-kernel@vger.kernel.org;
> fastboot@lists.osdl.org; linux-ia64@vger.kernel.org
> Subject: Re: [Fastboot] [PATCH] kexec on ia64
> 
> On Wed, 5 Apr 2006 09:13:36 +0800
> "Zou, Nanhai" <nanhai.zou@intel.com> wrote:
> > > I'm working for memory hotplug. When memory is hot-added, memory layout
> > > changes.
> > > But I think there is no code to manage memory layout information of added
> memory.
> > >
> >  It reads memory layout from /proc/iomem...,
> >  If memory is hotpluged, I think we need a reload of kdump.
> >
> If /proc/iomem is updated at hotplug event (this is not updated now),
> is there no problem ?
> 
 The crash dumping kernel also needs a reload, because the physical memory list is read and saved at kdump kernel loading time instead of crashing time.

Zou Nan hai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
       [not found]   ` <20060405101243.e3e4f772.kamezawa.hiroyu@jp.fujitsu.com>
@ 2006-04-05  2:49     ` Eric W. Biederman
  2006-04-05  4:31       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2006-04-05  2:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Zou, Nanhai, khalid_aziz, linux-kernel, fastboot, linux-ia64

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> Hi, I have a question about kexec/kdump.
>
> How does kdump know memory layout (of old kernel) now ?
>
> I'm working for memory hotplug. When memory is hot-added, memory layout changes.
> But I think there is no code to manage memory layout information of added
> memory.

It is passed from one kernel to another, and it is memorized when you load
the crash dump kernel.  If your memory layout changes you need to reload
the crash dump kernel from user space with the appropriate hotplug script.  

Unless this happens often it shouldn't be a problem. 

And yes this does leave a small race during which kexec on panic won't
work.

Eric


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
  2006-04-05  2:49     ` Eric W. Biederman
@ 2006-04-05  4:31       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 29+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-05  4:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nanhai.zou, khalid_aziz, linux-kernel, fastboot, linux-ia64

On Tue, 04 Apr 2006 20:49:49 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > Hi, I have a question about kexec/kdump.
> >
> > How does kdump know memory layout (of old kernel) now ?
> >
> > I'm working for memory hotplug. When memory is hot-added, memory layout changes.
> > But I think there is no code to manage memory layout information of added
> > memory.
> 
> It is passed from one kernel to another, and it is memorized when you load
> the crash dump kernel.  If your memory layout changes you need to reload
> the crash dump kernel from user space with the appropriate hotplug script.  
> 
> Unless this happens often it shouldn't be a problem. 
> 

> And yes this does leave a small race during which kexec on panic won't
> work.

Hmm.. Okay. 
Before reloading kdump kernel, kdump continues to use old information.
(when adding, it's not be big problem.)

Thank you.
- Kame


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] kexec on ia64
  2006-04-04  4:20   ` Andrew Morton
  2006-04-04  6:07     ` [Fastboot] " Michael Ellerman
@ 2006-04-05 16:11     ` Khalid Aziz
  1 sibling, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2006-04-05 16:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Fastboot mailing list, Linux ia64

On Mon, 2006-04-03 at 21:20 -0700, Andrew Morton wrote:
> Khalid Aziz <khalid_aziz@hp.com> wrote:
> >
> > Add kexec support on ia64.
> > 
> 
> Neat.  How well does it work?

Works well on my test machines - HP rx2600 and HP cx2600. Hopefully
others can test it on other machines.

> > +/*
> > + * Terminate any outstanding interrupts
> > + */
> > +void terminate_irqs(void)
> > +{
> > +	struct irqaction * action;
> > +	irq_desc_t *idesc;
> > +	int i;
> > +
> > +	for (i=0; i<NR_IRQS; i++) {
> 
> 	for (i = 0; i < NR_IRQS; i++) {
> 
> > +		idesc = irq_descp(i);
> > +		action = idesc->action;
> > +		if (!action)
> > +			continue;
> > +		if (idesc->handler->end)
> > +			idesc->handler->end(i);
> > +	}
> > +}
> 
> Could we have a bit more description of what this function does, and why we
> need it?
> 
> Should other kexec-using architectures be using this?  If not, why does
> ia64 need it?
> 
> Thanks.

This funtion terminates any outstanding interrupts. I found it to be
necessary for devices that use level interrupt. If a device, using level
interrupt, asserted its interrupt as kernel goes into panic, nobody
acknowledges its interrupt. As a result, this interrupt stays asserted
as the new kernel comes up. All drivers in their initialization routine
should clear any pending interrupts, but most do not. As a result, when
driver attempts to use the interrupt, it is unable to since the
interrupt was already asserted and any new interrupts from the device
simply cause interrupt line to continue to be asserted. terminate_irqs()
tries to acknowledge any pending interrupts so the interrupts will be
usable when the new kernel comes up. This is not specific to ia64 and I
would think this problem would show up on other architectures as well. I
happened to find it on ia64 because HP rx2600 uses level interrupts for
SCSI controller.

--
Khalid
 
==================================
Khalid Aziz                       Open Source and Linux Organization
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
                                - Alessandro Rubini



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Fastboot] [PATCH] kexec on ia64
  2006-04-04 18:13   ` [Fastboot] " Eric W. Biederman
@ 2006-04-05 16:34     ` Khalid Aziz
  0 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2006-04-05 16:34 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: LKML, Fastboot mailing list, Linux ia64

On Tue, 2006-04-04 at 12:13 -0600, Eric W. Biederman wrote:
> Khalid Aziz <khalid_aziz@hp.com> writes:
> > +void
> > +machine_crash_shutdown(struct pt_regs *pt)
> > +{
> > +	/* This function is only called after the system
> > +	 * has paniced or is otherwise in a critical state.
> > +	 * The minimum amount of code to allow a kexec'd kernel
> > +	 * to run successfully needs to happen here.
> > +	 *
> > +	 * In practice this means shooting down the other cpus in
> > +	 * an SMP system.
> > +	 */
> > +	if (in_interrupt()) {
> > +		terminate_irqs();
> > +		ia64_eoi();
> > +	}
> > +	system_state = SYSTEM_RESTART;
> > +	device_shutdown();
> > +	system_state = SYSTEM_BOOTING;
> > +	machine_shutdown();
> > +}
> 
> machine_crash_shutdown must not call device_shutdown.  That has
> been shown to way exceed the minimum necessary to shutdown a system.
> I would prefer this to be a noop stub that doesn't work at all than
> something like this that does way too much, and makes people think
> the code will work.
> 
> As for terminate_irqs on x86 we do that on bootup not in the middle
> of a crash shutdown.  The apics and xapics are close enough you
> should be able to do the same on ia64.
> 
> You display remarkable faith in a kernel that has paniced.

I will look into eliminating this as much as possible.

> Having machine_shutdown only build when you have PCI present
> and then not making KEXEC depend on PCI is wrong.
> 
> The #ifdef needs to move inside machine_shutdown.

Fixed.

> 
> > +
> > +/*
> > + * Do not allocate memory (or fail in any way) in machine_kexec().
> > + * We are past the point of no return, committed to rebooting now. 
> > + */
> > +void machine_kexec(struct kimage *image)
> > +{
> > +	unsigned long indirection_page;
> > +	relocate_new_kernel_t rnk;
> > +	unsigned long pta, impl_va_bits;
> > +	void *pal_addr = efi_get_pal_addr();
> > + unsigned long code_addr = (unsigned
> > long)page_address(image->control_code_page);
> > +
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +	int cpu;
> > +
> > +	for_each_online_cpu(cpu) {
> > +		if (cpu != smp_processor_id())
> > +			cpu_down(cpu);
> > +	}
> > +#elif CONFIG_SMP
> > +	smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
> > +#endif
> 
> This CPU and HOTPUG_CPU stuff belongs in machine shutdown.

Moved to machine_shutdown().

> 
> > +
> > +	ia64_set_itv(1<<16);
> > +	/* Interrupts aren't acceptable while we reboot */
> > +	local_irq_disable();
> > +
> > +	/* set kr0 to the appropriate address */
> > +	set_io_base();
> > +
> > +	/* Disable VHPT */
> > +	impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
> > +	pta = POW2(61) - POW2(vmlpt_bits);
> > +	ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
> > +
> > +#ifdef CONFIG_IA64_HP_ZX1
> > +	ioc_iova_disable();
> > +#endif
> 
> This also looks like it needs to be part of machine_shutdown.
> I have no confidence in ioc_iova_disable when the machine is crashing.
> Basically anything that touches a pointer is likely to be bad.

I have moved above code to machine_shutdown. I would prefer to delay
disabling VHPT as much as possible, but since machine_kexec gets called
soon after machine_shutdown and we should be executing kernel code
strictly at this point which uses pinned TR entries, disabling VHPT
should not have any deleterious effect.

> 
> > +	/* now execute the control code.
> > +	 * We will start by executing the control code linked into the 
> > + * kernel as opposed to the code we copied in control code buffer * page. When
> > this code switches to physical mode, we will start
> > +	 * executing the code in control code buffer page. Reason for
> > +	 * doing this is we start code execution in virtual address space.
> > +	 * If we were to try to execute the newly copied code in virtual
> > +	 * address space, we will need to make an ITLB entry to avoid ITLB 
> > +	 * miss. By executing the code linked into kernel, we take advantage
> > +	 * of the ITLB entry already in place for kernel and avoid making
> > +	 * a new entry.
> > +	 */
> > +	indirection_page = image->head & PAGE_MASK;
> > +
> > +	rnk = (relocate_new_kernel_t)&code_addr;
> > +	(*rnk)(indirection_page, image->start, ia64_boot_param,
> > +		     GRANULEROUNDDOWN((unsigned long) pal_addr));
> > +	BUG();
> > +	for (;;)
> > +		;
> > +}
> 
> 
> Eric

Thanks for the review.

-- 
Khalid

==================================
Khalid Aziz                       Open Source and Linux Organization
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
                                - Alessandro Rubini



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2006-04-05 16:34 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
2004-11-15 21:15 ` Luck, Tony
2004-11-15 22:03 ` David Mosberger
2004-11-15 22:14 ` Khalid Aziz
2004-11-16 17:28 ` Khalid Aziz
2005-10-25 22:52 ` Khalid Aziz
2005-10-26 18:28 ` Gerald Pfeifer
2005-10-26 19:02 ` Luck, Tony
2005-10-26 20:25 ` Eric W. Biederman
2005-10-26 21:43 ` Luck, Tony
2005-10-26 21:49 ` Khalid Aziz
2005-10-26 23:21 ` Zou Nan hai
2005-10-27  7:10 ` Eric W. Biederman
2005-10-27 19:05 ` Khalid Aziz
2005-10-27 23:17 ` Zou Nan hai
2006-04-03 22:20 ` Khalid Aziz
2006-04-04  4:20   ` Andrew Morton
2006-04-04  6:07     ` [Fastboot] " Michael Ellerman
2006-04-05 16:11     ` Khalid Aziz
2006-04-04 18:13   ` [Fastboot] " Eric W. Biederman
2006-04-05 16:34     ` Khalid Aziz
  -- strict thread matches above, loose matches on Subject: below --
2004-11-15 20:41 Khalid Aziz
2004-11-16  3:46 ` Khalid Aziz
2006-04-05  0:36 ` Zou, Nanhai
     [not found]   ` <20060405101243.e3e4f772.kamezawa.hiroyu@jp.fujitsu.com>
2006-04-05  2:49     ` Eric W. Biederman
2006-04-05  4:31       ` KAMEZAWA Hiroyuki
2006-04-05  1:13 ` Zou, Nanhai
2006-04-05  1:27   ` KAMEZAWA Hiroyuki
2006-04-05  1:34 ` Zou, Nanhai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox