* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
@ 2004-11-15 21:15 ` Luck, Tony
2004-11-15 22:03 ` David Mosberger
` (13 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2004-11-15 21:15 UTC (permalink / raw)
To: linux-ia64
>Here is what I am working on next:
>
>1. Save EFI memory map before it is trimmed.
This code has been "evolving" for a long time now, more layers
get addded to solve each new problem. If you get time, please
step back about a half-mile and take a look at the big picture
and see you you can see a better way to do the scanning and
trimming and re-scanning. The overall problem statement (ignore
anything except complete granules, honour the command-line arguments
max_mem/max_addr, allocate a temporary bitmap for bootmem) seems
like it shouldn't require such complex code :-) You can add your
own new requirement to not modify the original EFI tables so that
they can be re-scanned by a new kernel after kexec (new kernel
might have a different granule size).
-Tony
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
2004-11-15 21:15 ` Luck, Tony
@ 2004-11-15 22:03 ` David Mosberger
2004-11-15 22:14 ` Khalid Aziz
` (12 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-11-15 22:03 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 15 Nov 2004 13:15:25 -0800, "Luck, Tony" <tony.luck@intel.com> said:
Tony> You can add your own new requirement to not modify the
Tony> original EFI tables so that they can be re-scanned by a new
Tony> kernel after kexec (new kernel might have a different granule
Tony> size).
That certainly would be the right way to go about it. It would also
make it less likely that something else might get confused when
changing the memory map underneath it.
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
2004-11-15 21:15 ` Luck, Tony
2004-11-15 22:03 ` David Mosberger
@ 2004-11-15 22:14 ` Khalid Aziz
2004-11-16 17:28 ` Khalid Aziz
` (11 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-15 22:14 UTC (permalink / raw)
To: linux-ia64
On Mon, 2004-11-15 at 14:15, Luck, Tony wrote:
> >Here is what I am working on next:
> >
> >1. Save EFI memory map before it is trimmed.
>
> This code has been "evolving" for a long time now, more layers
> get addded to solve each new problem. If you get time, please
> step back about a half-mile and take a look at the big picture
> and see you you can see a better way to do the scanning and
> trimming and re-scanning. The overall problem statement (ignore
> anything except complete granules, honour the command-line arguments
> max_mem/max_addr, allocate a temporary bitmap for bootmem) seems
> like it shouldn't require such complex code :-) You can add your
> own new requirement to not modify the original EFI tables so that
> they can be re-scanned by a new kernel after kexec (new kernel
> might have a different granule size).
>
> -Tony
Tony,
I definitely like this idea better. I have been talking to another
developer who is struggling with efi_mem_map_walk() trimming original
EFI memory map for "mem=" and "max_addr=". We have discussed separating
efi_mem_map_walk() into three separate routines, one to simply walk
memory map and compute the physical memory size without touching map,
one to trim memory map for granule size and one to trim memory map for
"mem=" and "max_addr=". This will allow us to save an untouched memory
map in between calls to these routines. Now that I know you guys are
open to something like this, we will pursue it further :)
--
Khalid
==================================
Khalid Aziz Linux and Open Source Lab
(970)898-9214 Hewlett-Packard
khalid_aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (2 preceding siblings ...)
2004-11-15 22:14 ` Khalid Aziz
@ 2004-11-16 17:28 ` Khalid Aziz
2005-10-25 22:52 ` Khalid Aziz
` (10 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2004-11-16 17:28 UTC (permalink / raw)
To: linux-ia64
I have noticed that on x86, trimming memory with "mem=" has no effect on
RAM reported by /proc/iomem. I assume we want the same behavior on ia64.
This would mean we definitely need to save an untrimmed EFI memory map.
--
Khalid
On Mon, 2004-11-15 at 15:03, David Mosberger wrote:
> >>>>> On Mon, 15 Nov 2004 13:15:25 -0800, "Luck, Tony" <tony.luck@intel.com> said:
>
>
> Tony> You can add your own new requirement to not modify the
> Tony> original EFI tables so that they can be re-scanned by a new
> Tony> kernel after kexec (new kernel might have a different granule
> Tony> size).
>
> That certainly would be the right way to go about it. It would also
> make it less likely that something else might get confused when
> changing the memory map underneath it.
>
> --david
--
==================================
Khalid Aziz Linux and Open Source Lab
(970)898-9214 Hewlett-Packard
khalid_aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 29+ messages in thread* [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (3 preceding siblings ...)
2004-11-16 17:28 ` Khalid Aziz
@ 2005-10-25 22:52 ` Khalid Aziz
2005-10-26 18:28 ` Gerald Pfeifer
` (9 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2005-10-25 22:52 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]
I have ported the original patch I had done for kexec on ia64 on 2.6.8
kernel and fixed a few bugs in the original patch. Attached is a patch
for kernel 2.6.14-rc4. It works with normal kexec reboot on an HP
rx2600. I am now working on adding support for crash kexec. I am also
working on kexec on INIT which I currently have working on 2.6.10
kernel. I am porting it to 2.6.14-rc kernel.
Attached patch needs to be applied on top of iomem and efi_memmapwalk
patches already in ia64 test tree (these patches attached as well for
those who may need them).
Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
--
Khalid
====================================================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
khalid.aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
[-- Attachment #2: iomem-2.6.14-rc4.patch --]
[-- Type: text/x-patch, Size: 3518 bytes --]
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -923,3 +923,90 @@ efi_memmap_init(unsigned long *s, unsign
*s = (u64)kern_memmap;
*e = (u64)++k;
}
+
+void
+efi_initialize_iomem_resources(struct resource *code_resource,
+ struct resource *data_resource)
+{
+ struct resource *res;
+ void *efi_map_start, *efi_map_end, *p;
+ efi_memory_desc_t *md;
+ u64 efi_desc_size;
+ char *name;
+ unsigned long flags;
+
+ efi_map_start = __va(ia64_boot_param->efi_memmap);
+ efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
+ efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+ res = NULL;
+
+ for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
+ md = p;
+
+ if (md->num_pages == 0) /* should not happen */
+ continue;
+
+ flags = IORESOURCE_MEM;
+ switch (md->type) {
+
+ case EFI_MEMORY_MAPPED_IO:
+ case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
+ continue;
+
+ case EFI_LOADER_CODE:
+ case EFI_LOADER_DATA:
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_BOOT_SERVICES_CODE:
+ case EFI_CONVENTIONAL_MEMORY:
+ if (md->attribute & EFI_MEMORY_WP) {
+ name = "System ROM";
+ flags |= IORESOURCE_READONLY;
+ } else {
+ name = "System RAM";
+ }
+ break;
+
+ case EFI_ACPI_MEMORY_NVS:
+ name = "ACPI Non-volatile Storage";
+ flags |= IORESOURCE_BUSY;
+ break;
+
+ case EFI_UNUSABLE_MEMORY:
+ name = "reserved";
+ flags |= IORESOURCE_BUSY | IORESOURCE_DISABLED;
+ break;
+
+ case EFI_RESERVED_TYPE:
+ case EFI_RUNTIME_SERVICES_CODE:
+ case EFI_RUNTIME_SERVICES_DATA:
+ case EFI_ACPI_RECLAIM_MEMORY:
+ default:
+ name = "reserved";
+ flags |= IORESOURCE_BUSY;
+ break;
+ }
+
+ if ((res = kcalloc(1, sizeof(struct resource), GFP_KERNEL)) == NULL) {
+ printk(KERN_ERR "failed to alocate resource for iomem\n");
+ return;
+ }
+
+ res->name = name;
+ res->start = md->phys_addr;
+ res->end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
+ res->flags = flags;
+
+ if (insert_resource(&iomem_resource, res) < 0)
+ kfree(res);
+ else {
+ /*
+ * We don't know which region contains
+ * kernel data so we try it repeatedly and
+ * let the resource manager test it.
+ */
+ insert_resource(res, code_resource);
+ insert_resource(res, data_resource);
+ }
+ }
+}
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -78,6 +78,19 @@ struct screen_info screen_info;
unsigned long vga_console_iobase;
unsigned long vga_console_membase;
+static struct resource data_resource = {
+ .name = "Kernel data",
+ .flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
+
+static struct resource code_resource = {
+ .name = "Kernel code",
+ .flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
+extern void efi_initialize_iomem_resources(struct resource *,
+ struct resource *);
+extern char _text[], _edata[], _etext[];
+
unsigned long ia64_max_cacheline_size;
unsigned long ia64_iobase; /* virtual address for I/O accesses */
EXPORT_SYMBOL(ia64_iobase);
@@ -171,6 +184,22 @@ sort_regions (struct rsvd_region *rsvd_r
}
}
+/*
+ * Request address space for all standard resources
+ */
+static int __init register_memory(void)
+{
+ code_resource.start = ia64_tpa(_text);
+ code_resource.end = ia64_tpa(_etext) - 1;
+ data_resource.start = ia64_tpa(_etext);
+ data_resource.end = ia64_tpa(_edata) - 1;
+ efi_initialize_iomem_resources(&code_resource, &data_resource);
+
+ return 0;
+}
+
+__initcall(register_memory);
+
/**
* reserve_memory - setup reserved memory areas
*
[-- Attachment #3: efi_memmapwalk-2.6.14-rc4.patch --]
[-- Type: text/x-patch, Size: 15529 bytes --]
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -239,57 +239,30 @@ is_available_memory (efi_memory_desc_t *
return 0;
}
-/*
- * Trim descriptor MD so its starts at address START_ADDR. If the descriptor covers
- * memory that is normally available to the kernel, issue a warning that some memory
- * is being ignored.
- */
-static void
-trim_bottom (efi_memory_desc_t *md, u64 start_addr)
-{
- u64 num_skipped_pages;
+typedef struct kern_memdesc {
+ u64 attribute;
+ u64 start;
+ u64 num_pages;
+} kern_memdesc_t;
- if (md->phys_addr >= start_addr || !md->num_pages)
- return;
-
- num_skipped_pages = (start_addr - md->phys_addr) >> EFI_PAGE_SHIFT;
- if (num_skipped_pages > md->num_pages)
- num_skipped_pages = md->num_pages;
-
- if (is_available_memory(md))
- printk(KERN_NOTICE "efi.%s: ignoring %luKB of memory at 0x%lx due to granule hole "
- "at 0x%lx\n", __FUNCTION__,
- (num_skipped_pages << EFI_PAGE_SHIFT) >> 10,
- md->phys_addr, start_addr - IA64_GRANULE_SIZE);
- /*
- * NOTE: Don't set md->phys_addr to START_ADDR because that could cause the memory
- * descriptor list to become unsorted. In such a case, md->num_pages will be
- * zero, so the Right Thing will happen.
- */
- md->phys_addr += num_skipped_pages << EFI_PAGE_SHIFT;
- md->num_pages -= num_skipped_pages;
-}
+static kern_memdesc_t *kern_memmap;
static void
-trim_top (efi_memory_desc_t *md, u64 end_addr)
+walk (efi_freemem_callback_t callback, void *arg, u64 attr)
{
- u64 num_dropped_pages, md_end_addr;
+ kern_memdesc_t *k;
+ u64 start, end, voff;
- md_end_addr = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
-
- if (md_end_addr <= end_addr || !md->num_pages)
- return;
-
- num_dropped_pages = (md_end_addr - end_addr) >> EFI_PAGE_SHIFT;
- if (num_dropped_pages > md->num_pages)
- num_dropped_pages = md->num_pages;
-
- if (is_available_memory(md))
- printk(KERN_NOTICE "efi.%s: ignoring %luKB of memory at 0x%lx due to granule hole "
- "at 0x%lx\n", __FUNCTION__,
- (num_dropped_pages << EFI_PAGE_SHIFT) >> 10,
- md->phys_addr, end_addr);
- md->num_pages -= num_dropped_pages;
+ voff = (attr == EFI_MEMORY_WB) ? PAGE_OFFSET : __IA64_UNCACHED_OFFSET;
+ for (k = kern_memmap; k->start != ~0UL; k++) {
+ if (k->attribute != attr)
+ continue;
+ start = PAGE_ALIGN(k->start);
+ end = (k->start + (k->num_pages << EFI_PAGE_SHIFT)) & PAGE_MASK;
+ if (start < end)
+ if ((*callback)(start + voff, end + voff, arg) < 0)
+ return;
+ }
}
/*
@@ -299,148 +272,19 @@ trim_top (efi_memory_desc_t *md, u64 end
void
efi_memmap_walk (efi_freemem_callback_t callback, void *arg)
{
- int prev_valid = 0;
- struct range {
- u64 start;
- u64 end;
- } prev, curr;
- void *efi_map_start, *efi_map_end, *p, *q;
- efi_memory_desc_t *md, *check_md;
- u64 efi_desc_size, start, end, granule_addr, last_granule_addr, first_non_wb_addr = 0;
- unsigned long total_mem = 0;
-
- efi_map_start = __va(ia64_boot_param->efi_memmap);
- efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
- efi_desc_size = ia64_boot_param->efi_memdesc_size;
-
- for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
- md = p;
-
- /* skip over non-WB memory descriptors; that's all we're interested in... */
- if (!(md->attribute & EFI_MEMORY_WB))
- continue;
-
- /*
- * granule_addr is the base of md's first granule.
- * [granule_addr - first_non_wb_addr) is guaranteed to
- * be contiguous WB memory.
- */
- granule_addr = GRANULEROUNDDOWN(md->phys_addr);
- first_non_wb_addr = max(first_non_wb_addr, granule_addr);
-
- if (first_non_wb_addr < md->phys_addr) {
- trim_bottom(md, granule_addr + IA64_GRANULE_SIZE);
- granule_addr = GRANULEROUNDDOWN(md->phys_addr);
- first_non_wb_addr = max(first_non_wb_addr, granule_addr);
- }
-
- for (q = p; q < efi_map_end; q += efi_desc_size) {
- check_md = q;
-
- if ((check_md->attribute & EFI_MEMORY_WB) &&
- (check_md->phys_addr == first_non_wb_addr))
- first_non_wb_addr += check_md->num_pages << EFI_PAGE_SHIFT;
- else
- break; /* non-WB or hole */
- }
-
- last_granule_addr = GRANULEROUNDDOWN(first_non_wb_addr);
- if (last_granule_addr < md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT))
- trim_top(md, last_granule_addr);
-
- if (is_available_memory(md)) {
- if (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) >= max_addr) {
- if (md->phys_addr >= max_addr)
- continue;
- md->num_pages = (max_addr - md->phys_addr) >> EFI_PAGE_SHIFT;
- first_non_wb_addr = max_addr;
- }
-
- if (total_mem >= mem_limit)
- continue;
-
- if (total_mem + (md->num_pages << EFI_PAGE_SHIFT) > mem_limit) {
- unsigned long limit_addr = md->phys_addr;
-
- limit_addr += mem_limit - total_mem;
- limit_addr = GRANULEROUNDDOWN(limit_addr);
-
- if (md->phys_addr > limit_addr)
- continue;
-
- md->num_pages = (limit_addr - md->phys_addr) >>
- EFI_PAGE_SHIFT;
- first_non_wb_addr = max_addr = md->phys_addr +
- (md->num_pages << EFI_PAGE_SHIFT);
- }
- total_mem += (md->num_pages << EFI_PAGE_SHIFT);
-
- if (md->num_pages == 0)
- continue;
-
- curr.start = PAGE_OFFSET + md->phys_addr;
- curr.end = curr.start + (md->num_pages << EFI_PAGE_SHIFT);
-
- if (!prev_valid) {
- prev = curr;
- prev_valid = 1;
- } else {
- if (curr.start < prev.start)
- printk(KERN_ERR "Oops: EFI memory table not ordered!\n");
-
- if (prev.end == curr.start) {
- /* merge two consecutive memory ranges */
- prev.end = curr.end;
- } else {
- start = PAGE_ALIGN(prev.start);
- end = prev.end & PAGE_MASK;
- if ((end > start) && (*callback)(start, end, arg) < 0)
- return;
- prev = curr;
- }
- }
- }
- }
- if (prev_valid) {
- start = PAGE_ALIGN(prev.start);
- end = prev.end & PAGE_MASK;
- if (end > start)
- (*callback)(start, end, arg);
- }
+ walk(callback, arg, EFI_MEMORY_WB);
}
/*
- * Walk the EFI memory map to pull out leftover pages in the lower
- * memory regions which do not end up in the regular memory map and
- * stick them into the uncached allocator
- *
- * The regular walk function is significantly more complex than the
- * uncached walk which means it really doesn't make sense to try and
- * marge the two.
+ * Walks the EFI memory map and calls CALLBACK once for each EFI memory descriptor that
+ * has memory that is available for uncached allocator.
*/
-void __init
-efi_memmap_walk_uc (efi_freemem_callback_t callback)
+void
+efi_memmap_walk_uc (efi_freemem_callback_t callback, void *arg)
{
- void *efi_map_start, *efi_map_end, *p;
- efi_memory_desc_t *md;
- u64 efi_desc_size, start, end;
-
- efi_map_start = __va(ia64_boot_param->efi_memmap);
- efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
- efi_desc_size = ia64_boot_param->efi_memdesc_size;
-
- for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
- md = p;
- if (md->attribute == EFI_MEMORY_UC) {
- start = PAGE_ALIGN(md->phys_addr);
- end = PAGE_ALIGN((md->phys_addr+(md->num_pages << EFI_PAGE_SHIFT)) & PAGE_MASK);
- if ((*callback)(start, end, NULL) < 0)
- return;
- }
- }
+ walk(callback, arg, EFI_MEMORY_UC);
}
-
/*
* Look for the PAL_CODE region reported by EFI and maps it using an
* ITR to enable safe PAL calls in virtual mode. See IA-64 Processor
@@ -862,3 +706,220 @@ efi_uart_console_only(void)
printk(KERN_ERR "Malformed %s value\n", name);
return 0;
}
+
+#define efi_md_size(md) (md->num_pages << EFI_PAGE_SHIFT)
+
+static inline u64
+kmd_end(kern_memdesc_t *kmd)
+{
+ return (kmd->start + (kmd->num_pages << EFI_PAGE_SHIFT));
+}
+
+static inline u64
+efi_md_end(efi_memory_desc_t *md)
+{
+ return (md->phys_addr + efi_md_size(md));
+}
+
+static inline int
+efi_wb(efi_memory_desc_t *md)
+{
+ return (md->attribute & EFI_MEMORY_WB);
+}
+
+static inline int
+efi_uc(efi_memory_desc_t *md)
+{
+ return (md->attribute & EFI_MEMORY_UC);
+}
+
+/*
+ * Look for the first granule aligned memory descriptor memory
+ * that is big enough to hold EFI memory map. Make sure this
+ * descriptor is atleast granule sized so it does not get trimmed
+ */
+struct kern_memdesc *
+find_memmap_space (void)
+{
+ u64 contig_low=0, contig_high=0;
+ u64 as = 0, ae;
+ void *efi_map_start, *efi_map_end, *p, *q;
+ efi_memory_desc_t *md, *pmd = NULL, *check_md;
+ u64 space_needed, efi_desc_size;
+ unsigned long total_mem = 0;
+
+ efi_map_start = __va(ia64_boot_param->efi_memmap);
+ efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
+ efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+ /*
+ * Worst case: we need 3 kernel descriptors for each efi descriptor
+ * (if every entry has a WB part in the middle, and UC head and tail),
+ * plus one for the end marker.
+ */
+ space_needed = sizeof(kern_memdesc_t) *
+ (3 * (ia64_boot_param->efi_memmap_size/efi_desc_size) + 1);
+
+ for (p = efi_map_start; p < efi_map_end; pmd = md, p += efi_desc_size) {
+ md = p;
+ if (!efi_wb(md)) {
+ continue;
+ }
+ if (pmd == NULL || !efi_wb(pmd) || efi_md_end(pmd) != md->phys_addr) {
+ contig_low = GRANULEROUNDUP(md->phys_addr);
+ contig_high = efi_md_end(md);
+ for (q = p + efi_desc_size; q < efi_map_end; q += efi_desc_size) {
+ check_md = q;
+ if (!efi_wb(check_md))
+ break;
+ if (contig_high != check_md->phys_addr)
+ break;
+ contig_high = efi_md_end(check_md);
+ }
+ contig_high = GRANULEROUNDDOWN(contig_high);
+ }
+ if (!is_available_memory(md) || md->type == EFI_LOADER_DATA)
+ continue;
+
+ /* Round ends inward to granule boundaries */
+ as = max(contig_low, md->phys_addr);
+ ae = min(contig_high, efi_md_end(md));
+
+ /* keep within max_addr= command line arg */
+ ae = min(ae, max_addr);
+ if (ae <= as)
+ continue;
+
+ /* avoid going over mem= command line arg */
+ if (total_mem + (ae - as) > mem_limit)
+ ae -= total_mem + (ae - as) - mem_limit;
+
+ if (ae <= as)
+ continue;
+
+ if (ae - as > space_needed)
+ break;
+ }
+ if (p >= efi_map_end)
+ panic("Can't allocate space for kernel memory descriptors");
+
+ return __va(as);
+}
+
+/*
+ * Walk the EFI memory map and gather all memory available for kernel
+ * to use. We can allocate partial granules only if the unavailable
+ * parts exist, and are WB.
+ */
+void
+efi_memmap_init(unsigned long *s, unsigned long *e)
+{
+ struct kern_memdesc *k, *prev = 0;
+ u64 contig_low=0, contig_high=0;
+ u64 as, ae, lim;
+ void *efi_map_start, *efi_map_end, *p, *q;
+ efi_memory_desc_t *md, *pmd = NULL, *check_md;
+ u64 efi_desc_size;
+ unsigned long total_mem = 0;
+
+ k = kern_memmap = find_memmap_space();
+
+ efi_map_start = __va(ia64_boot_param->efi_memmap);
+ efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
+ efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+ for (p = efi_map_start; p < efi_map_end; pmd = md, p += efi_desc_size) {
+ md = p;
+ if (!efi_wb(md)) {
+ if (efi_uc(md) && (md->type == EFI_CONVENTIONAL_MEMORY ||
+ md->type == EFI_BOOT_SERVICES_DATA)) {
+ k->attribute = EFI_MEMORY_UC;
+ k->start = md->phys_addr;
+ k->num_pages = md->num_pages;
+ k++;
+ }
+ continue;
+ }
+ if (pmd == NULL || !efi_wb(pmd) || efi_md_end(pmd) != md->phys_addr) {
+ contig_low = GRANULEROUNDUP(md->phys_addr);
+ contig_high = efi_md_end(md);
+ for (q = p + efi_desc_size; q < efi_map_end; q += efi_desc_size) {
+ check_md = q;
+ if (!efi_wb(check_md))
+ break;
+ if (contig_high != check_md->phys_addr)
+ break;
+ contig_high = efi_md_end(check_md);
+ }
+ contig_high = GRANULEROUNDDOWN(contig_high);
+ }
+ if (!is_available_memory(md))
+ continue;
+
+ /*
+ * Round ends inward to granule boundaries
+ * Give trimmings to uncached allocator
+ */
+ if (md->phys_addr < contig_low) {
+ lim = min(efi_md_end(md), contig_low);
+ if (efi_uc(md)) {
+ if (k > kern_memmap && (k-1)->attribute == EFI_MEMORY_UC &&
+ kmd_end(k-1) == md->phys_addr) {
+ (k-1)->num_pages += (lim - md->phys_addr) >> EFI_PAGE_SHIFT;
+ } else {
+ k->attribute = EFI_MEMORY_UC;
+ k->start = md->phys_addr;
+ k->num_pages = (lim - md->phys_addr) >> EFI_PAGE_SHIFT;
+ k++;
+ }
+ }
+ as = contig_low;
+ } else
+ as = md->phys_addr;
+
+ if (efi_md_end(md) > contig_high) {
+ lim = max(md->phys_addr, contig_high);
+ if (efi_uc(md)) {
+ if (lim == md->phys_addr && k > kern_memmap &&
+ (k-1)->attribute == EFI_MEMORY_UC &&
+ kmd_end(k-1) == md->phys_addr) {
+ (k-1)->num_pages += md->num_pages;
+ } else {
+ k->attribute = EFI_MEMORY_UC;
+ k->start = lim;
+ k->num_pages = (efi_md_end(md) - lim) >> EFI_PAGE_SHIFT;
+ k++;
+ }
+ }
+ ae = contig_high;
+ } else
+ ae = efi_md_end(md);
+
+ /* keep within max_addr= command line arg */
+ ae = min(ae, max_addr);
+ if (ae <= as)
+ continue;
+
+ /* avoid going over mem= command line arg */
+ if (total_mem + (ae - as) > mem_limit)
+ ae -= total_mem + (ae - as) - mem_limit;
+
+ if (ae <= as)
+ continue;
+ if (prev && kmd_end(prev) == md->phys_addr) {
+ prev->num_pages += (ae - as) >> EFI_PAGE_SHIFT;
+ total_mem += ae - as;
+ continue;
+ }
+ k->attribute = EFI_MEMORY_WB;
+ k->start = as;
+ k->num_pages = (ae - as) >> EFI_PAGE_SHIFT;
+ total_mem += ae - as;
+ prev = k++;
+ }
+ k->start = ~0L; /* end-marker */
+
+ /* reserve the memory we are using for kern_memmap */
+ *s = (u64)kern_memmap;
+ *e = (u64)++k;
+}
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -211,6 +211,9 @@ reserve_memory (void)
}
#endif
+ efi_memmap_init(&rsvd_region[n].start, &rsvd_region[n].end);
+ n++;
+
/* end of memory marker */
rsvd_region[n].start = ~0UL;
rsvd_region[n].end = ~0UL;
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -205,23 +205,18 @@ EXPORT_SYMBOL(uncached_free_page);
static int __init
uncached_build_memmap(unsigned long start, unsigned long end, void *arg)
{
- long length;
- unsigned long vstart, vend;
+ long length = end - start;
int node;
- length = end - start;
- vstart = start + __IA64_UNCACHED_OFFSET;
- vend = end + __IA64_UNCACHED_OFFSET;
-
dprintk(KERN_ERR "uncached_build_memmap(%lx %lx)\n", start, end);
- memset((char *)vstart, 0, length);
+ memset((char *)start, 0, length);
- node = paddr_to_nid(start);
+ node = paddr_to_nid(start - __IA64_UNCACHED_OFFSET);
- for (; vstart < vend ; vstart += PAGE_SIZE) {
- dprintk(KERN_INFO "sticking %lx into the pool!\n", vstart);
- gen_pool_free(uncached_pool[node], vstart, PAGE_SIZE);
+ for (; start < end ; start += PAGE_SIZE) {
+ dprintk(KERN_INFO "sticking %lx into the pool!\n", start);
+ gen_pool_free(uncached_pool[node], start, PAGE_SIZE);
}
return 0;
--- a/include/asm-ia64/meminit.h
+++ b/include/asm-ia64/meminit.h
@@ -16,10 +16,11 @@
* - initrd (optional)
* - command line string
* - kernel code & data
+ * - Kernel memory map built from EFI memory map
*
* More could be added if necessary
*/
-#define IA64_MAX_RSVD_REGIONS 5
+#define IA64_MAX_RSVD_REGIONS 6
struct rsvd_region {
unsigned long start; /* virtual address of beginning of element */
@@ -33,6 +34,7 @@ extern void find_memory (void);
extern void reserve_memory (void);
extern void find_initrd (void);
extern int filter_rsvd_memory (unsigned long start, unsigned long end, void *arg);
+extern void efi_memmap_init(unsigned long *, unsigned long *);
/*
* For rounding an address to the next IA64_GRANULE_SIZE or order
[-- Attachment #4: kexec-ia64-2.6.14-rc4.patch --]
[-- Type: text/x-patch, Size: 26143 bytes --]
diff -urNp linux-2.6.14-rc4/arch/ia64/hp/common/sba_iommu.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/hp/common/sba_iommu.c
--- linux-2.6.14-rc4/arch/ia64/hp/common/sba_iommu.c 2005-08-28 17:41:01.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/hp/common/sba_iommu.c 2005-10-24 09:18:19.000000000 -0600
@@ -1624,6 +1624,28 @@ ioc_iova_init(struct ioc *ioc)
READ_REG(ioc->ioc_hpa + IOC_IBASE);
}
+#ifdef CONFIG_KEXEC
+void
+ioc_iova_disable(void)
+{
+ struct ioc *ioc;
+
+ ioc = ioc_list;
+
+ while (ioc != NULL) {
+ /* Disable IOVA translation */
+ WRITE_REG(ioc->ibase & 0xfffffffffffffffe, ioc->ioc_hpa + IOC_IBASE);
+ READ_REG(ioc->ioc_hpa + IOC_IBASE);
+
+ /* Clear I/O TLB of any possible entries */
+ WRITE_REG(ioc->ibase | (get_iovp_order(ioc->iov_size) + iovp_shift), ioc->ioc_hpa + IOC_PCOM);
+ READ_REG(ioc->ioc_hpa + IOC_PCOM);
+
+ ioc = ioc->next;
+ }
+}
+#endif
+
static void __init
ioc_resource_init(struct ioc *ioc)
{
diff -urNp linux-2.6.14-rc4/arch/ia64/Kconfig linux-2.6.14-rc4-kexec-ia64/arch/ia64/Kconfig
--- linux-2.6.14-rc4/arch/ia64/Kconfig 2005-10-19 09:04:33.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/Kconfig 2005-10-24 09:18:19.000000000 -0600
@@ -323,6 +323,23 @@ config PERFMON
little bigger and slows down execution a bit, but it is generally
a good idea to turn this on. If you're unsure, say Y.
+config KEXEC
+ bool "kexec system call (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is indepedent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similiarity to the exec system call.
+
+ It is an ongoing process to be certain the hardware in a machine
+ is properly shutdown, so do not be surprised if this code does not
+ initially work for you. It may help to enable device hotplugging
+ support. As of this writing the exact hardware interface is
+ strongly in flux, so no good recommendation can be made.
+
config IA64_PALINFO
tristate "/proc/pal support"
help
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/crash.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/crash.c
--- linux-2.6.14-rc4/arch/ia64/kernel/crash.c 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/crash.c 2005-10-24 11:06:50.000000000 -0600
@@ -0,0 +1,44 @@
+/*
+ * Architecture specific (ia64) functions for kexec based crash dumps.
+ *
+ * Created by: Khalid Aziz (khalid.aziz@hp.com)
+ *
+ * Copyright (C) Hewlett Packard, 2005. All rights reserved.
+ *
+ */
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/irq.h>
+#include <linux/reboot.h>
+#include <linux/kexec.h>
+#include <linux/irq.h>
+#include <linux/delay.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+
+note_buf_t crash_notes[NR_CPUS];
+
+void
+machine_crash_shutdown(struct pt_regs *pt)
+{
+ extern void terminate_irqs(void);
+
+ /* This function is only called after the system
+ * has paniced or is otherwise in a critical state.
+ * The minimum amount of code to allow a kexec'd kernel
+ * to run successfully needs to happen here.
+ *
+ * In practice this means shooting down the other cpus in
+ * an SMP system.
+ */
+ if (in_interrupt()) {
+ terminate_irqs();
+ ia64_eoi();
+ }
+ system_state = SYSTEM_RESTART;
+ device_shutdown();
+ system_state = SYSTEM_BOOTING;
+ machine_shutdown();
+}
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/efi.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/efi.c
--- linux-2.6.14-rc4/arch/ia64/kernel/efi.c 2005-10-20 16:44:30.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/efi.c 2005-10-24 09:25:03.000000000 -0600
@@ -38,6 +38,9 @@
extern efi_status_t efi_call_phys (void *, ...);
struct efi efi;
+#ifdef CONFIG_KEXEC
+unsigned long kexec_reboot = 0;
+#endif
EXPORT_SYMBOL(efi);
static efi_runtime_services_t *runtime;
static unsigned long mem_limit = ~0UL, max_addr = ~0UL;
@@ -526,6 +529,9 @@ efi_map_pal_code (void)
* Cannot write to CRx with PSR.ic=1
*/
psr = ia64_clear_ic();
+#ifdef CONFIG_KEXEC
+ ia64_ptr(0x01, GRANULEROUNDDOWN((unsigned long) pal_vaddr), IA64_GRANULE_SHIFT);
+#endif
ia64_itr(0x1, IA64_TR_PALCODE, GRANULEROUNDDOWN((unsigned long) pal_vaddr),
pte_val(pfn_pte(__pa(pal_vaddr) >> PAGE_SHIFT, PAGE_KERNEL)),
IA64_GRANULE_SHIFT);
@@ -549,15 +555,22 @@ efi_init (void)
if (memcmp(cp, "mem=", 4) == 0) {
cp += 4;
mem_limit = memparse(cp, &end);
- if (end != cp)
- break;
cp = end;
+ while (*cp == ' ')
+ ++cp;
} else if (memcmp(cp, "max_addr=", 9) == 0) {
cp += 9;
max_addr = GRANULEROUNDDOWN(memparse(cp, &end));
- if (end != cp)
- break;
cp = end;
+ while (*cp == ' ')
+ ++cp;
+#ifdef CONFIG_KEXEC
+ } else if (memcmp(cp, "kexec_reboot", 12) == 0) {
+ cp += 13;
+ kexec_reboot = 1;
+ while (*cp == ' ')
+ ++cp;
+#endif
} else {
while (*cp != ' ' && *cp)
++cp;
@@ -702,10 +715,17 @@ efi_enter_virtual_mode (void)
}
}
+#ifdef CONFIG_KEXEC
+ if (kexec_reboot == 0)
+#endif
status = efi_call_phys(__va(runtime->set_virtual_address_map),
ia64_boot_param->efi_memmap_size,
efi_desc_size, ia64_boot_param->efi_memdesc_version,
ia64_boot_param->efi_memmap);
+#ifdef CONFIG_KEXEC
+ else
+ status = EFI_SUCCESS;
+#endif
if (status != EFI_SUCCESS) {
printk(KERN_WARNING "warning: unable to switch EFI into virtual mode "
"(status=%lu)\n", status);
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/entry.S linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/entry.S
--- linux-2.6.14-rc4/arch/ia64/kernel/entry.S 2005-10-19 09:04:34.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/entry.S 2005-10-24 09:25:39.000000000 -0600
@@ -1588,7 +1588,7 @@ sys_call_table:
data8 sys_mq_timedreceive // 1265
data8 sys_mq_notify
data8 sys_mq_getsetattr
- data8 sys_ni_syscall // reserved for kexec_load
+ data8 sys_kexec_load
data8 sys_ni_syscall // reserved for vserver
data8 sys_waitid // 1270
data8 sys_add_key
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/machine_kexec.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/machine_kexec.c
--- linux-2.6.14-rc4/arch/ia64/kernel/machine_kexec.c 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/machine_kexec.c 2005-10-25 14:42:35.000000000 -0600
@@ -0,0 +1,224 @@
+/*
+ * machine_kexec.c - handle transition of Linux booting another kernel
+ * Copyright (C) 2002-2003 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (C) 2005 Khalid Aziz <khalid.aziz@hp.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <linux/pci.h>
+#include <asm/mmu_context.h>
+#include <asm/setup.h>
+#include <asm/mca.h>
+#include <asm/page.h>
+#include <asm/bitops.h>
+#include <asm/tlbflush.h>
+
+DECLARE_PER_CPU(u64, ia64_mca_pal_base);
+
+unsigned int kexec_on_init = 0;
+extern unsigned long ia64_iobase;
+extern unsigned long kexec_reboot;
+extern void kexec_stop_this_cpu(void *);
+extern struct subsystem devices_subsys;
+
+static void set_io_base(void)
+{
+ unsigned long phys_iobase;
+
+ /* set kr0 to iobase */
+ phys_iobase = __pa(ia64_iobase);
+ ia64_set_kr(IA64_KR_IO_BASE, __IA64_UNCACHED_OFFSET | phys_iobase);
+};
+
+typedef void (*relocate_new_kernel_t)(
+ unsigned long indirection_page, unsigned long start_address,
+ unsigned long boot_param_address);
+
+const extern unsigned long relocate_new_kernel[];
+const extern unsigned long kexec_fake_sal_rendez[];
+const extern unsigned int relocate_new_kernel_size;
+extern void use_mm(struct mm_struct *mm);
+extern void ioc_iova_disable(void);
+
+volatile extern long kexec_cont;
+volatile const extern unsigned char kexec_reloc[];
+volatile extern long kexec_rendez;
+volatile const extern unsigned char kexec_rendez_reloc[];
+volatile extern long kexec_ptcebase, kexec_count0, kexec_count1;
+volatile extern long kexec_stride0, kexec_stride1;
+volatile extern long kexec_pal_base;
+
+static void *kexec_boot_param;
+
+/*
+ * Do what every setup is needed on image and the
+ * reboot code buffer to allow us to avoid allocations
+ * later.
+ */
+int machine_kexec_prepare(struct kimage *image)
+{
+ void *control_code_buffer;
+ unsigned long cmdline_size;
+
+ /*
+ * We need to save the boot parameters in kernel pages.
+ */
+ cmdline_size = (COMMAND_LINE_SIZE + PAGE_SIZE) & PAGE_MASK;
+ if (image->segment[0].bufsz > cmdline_size) {
+ printk(KERN_ERR "Not enough space to load kernel command line (%d)\n", image->segment[0].bufsz);
+ return -ENOMEM;
+ }
+ kexec_boot_param = kmalloc(cmdline_size, GFP_KERNEL);
+ if (kexec_boot_param == NULL)
+ return -ENOMEM;
+ memset(kexec_boot_param, 0, cmdline_size);
+ memcpy(kexec_boot_param, image->segment[0].buf,
+ image->segment[0].bufsz);
+ /*
+ * We do not want command line parameters loaded in memory later
+ * when kernel is relocated just before kexec. So zero out memory
+ * size for command line param segment
+ */
+ image->segment[0].memsz = 0;
+
+#if 0
+ /* Pre-load control code buffer in case of INIT */
+ control_code_buffer = ((unsigned long)phys_to_virt(page_to_pfn(image->control_code_page) << PAGE_SHIFT) & (unsigned long)0x1fffffffffffffffL) | __IA64_UNCACHED_OFFSET;
+ kexec_rendez = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_rendez_reloc - (long)kexec_fake_sal_rendez;
+
+ /* copy it out */
+ memcpy((void *)control_code_buffer, kexec_fake_sal_rendez, relocate_new_kernel_size);
+#endif
+
+ return 0;
+}
+
+void machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+void machine_shutdown(void)
+{
+ struct pci_dev *dev;
+ struct list_head *n;
+ u16 command;
+
+ /* Disable bus mastering on all PCI devices */
+ n = pci_devices.next;
+ while (n && (n != &pci_devices)) {
+ dev = pci_dev_g(n);
+ pci_read_config_word(dev, PCI_COMMAND, &command);
+ command &= ~PCI_COMMAND_MASTER;
+ pci_write_config_word(dev, PCI_COMMAND, command);
+ n = n->next;
+ }
+
+#ifdef CONFIG_SMP
+ int reboot_cpu_id;
+
+ /* The boot cpu is always logical cpu 0 */
+ reboot_cpu_id = 0;
+
+ /* Make certain the cpu I'm rebooting on is online */
+ if (!cpu_isset(reboot_cpu_id, cpu_online_map)) {
+ reboot_cpu_id = smp_processor_id();
+ }
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed(current, cpumask_of_cpu(reboot_cpu_id));
+#endif
+}
+
+/*
+ * Do not allocate memory (or fail in any way) in machine_kexec().
+ * We are past the point of no return, committed to rebooting now.
+ */
+void machine_kexec(struct kimage *image)
+{
+ unsigned long indirection_page;
+ void *control_code_buffer;
+ relocate_new_kernel_t rnk;
+ unsigned char *cmdline;
+ int cpu;
+ unsigned long initrd_start, initrd_size;
+
+ control_code_buffer = (void *) (((unsigned long)phys_to_virt(page_to_pfn(image->control_code_page) << PAGE_SHIFT) & (unsigned long)0x1fffffffffffffffL) | __IA64_UNCACHED_OFFSET);
+ indirection_page = image->head & PAGE_MASK;
+
+ /* copy it out */
+ memcpy((void *)control_code_buffer, kexec_fake_sal_rendez, relocate_new_kernel_size);
+
+ /* Save PTCE data for cache flush later */
+ kexec_ptcebase = local_cpu_data->ptce_base;
+ kexec_count0 = local_cpu_data->ptce_count[0];
+ kexec_count1 = local_cpu_data->ptce_count[1];
+ kexec_stride0 = local_cpu_data->ptce_stride[0];
+ kexec_stride1 = local_cpu_data->ptce_stride[1];
+
+#ifdef CONFIG_SMP
+ kexec_rendez = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_rendez_reloc - (long)kexec_fake_sal_rendez;
+ if (!kexec_on_init)
+ smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
+
+#endif
+ /* Interrupts aren't acceptable while we reboot */
+ local_irq_disable();
+
+ kexec_cont = (long)(page_to_pfn(image->control_code_page) << PAGE_SHIFT) + (long)kexec_reloc - (long) kexec_fake_sal_rendez;
+
+ /* Save PAL mapping for TR flush later */
+ cpu = smp_processor_id();
+ kexec_pal_base = __get_cpu_var(ia64_mca_pal_base);
+
+ /* set kr0 to the appropriate address */
+ set_io_base();
+
+ /* now execute the control code
+ * We will start by executing the control code linked into the
+ * kernel as opposed to the code we copied in control code buffer * page. When this code switches to physical mode, we will start
+ * executing the code in control code buffer page. Reason for
+ * doing this is we start code execution in virtual address space.
+ * If we were to try to execute the newly copied code in virtual
+ * address space, we will need to make an ITLB entry to avoid ITLB
+ * miss. By executing the code linked into kernel, we take advantage
+ * of the ITLB entry already in place of kernel and avoid making
+ * a new entry.
+ */
+ control_code_buffer = (void *) relocate_new_kernel;
+ rnk = (relocate_new_kernel_t) &control_code_buffer;
+ if (strstr(kexec_boot_param, "kexec_reboot") == NULL)
+ strcat(kexec_boot_param, " kexec_reboot ");
+ cmdline = __va(ia64_boot_param->command_line);
+ strlcpy(cmdline, kexec_boot_param, COMMAND_LINE_SIZE);
+ initrd_start = image->segment[image->nr_segments-1].mem;
+ initrd_size = image->segment[image->nr_segments-1].memsz;
+ if (initrd_size != 0)
+ ia64_boot_param->initrd_start = initrd_start;
+ else
+ ia64_boot_param->initrd_start = 0UL;
+ ia64_boot_param->initrd_size = initrd_size;
+
+ {
+ unsigned long pta, impl_va_bits;
+
+# define pte_bits 3
+# define vmlpt_bits (impl_va_bits - PAGE_SHIFT + pte_bits)
+# define POW2(n) (1ULL << (n))
+
+ /* Disable VHPT */
+ impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+ pta = POW2(61) - POW2(vmlpt_bits);
+ ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+ }
+
+#ifdef CONFIG_IA64_HP_ZX1
+ ioc_iova_disable();
+#endif
+ rnk(indirection_page, image->start, (unsigned long) ia64_boot_param);
+}
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/Makefile linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/Makefile
--- linux-2.6.14-rc4/arch/ia64/kernel/Makefile 2005-10-19 09:04:34.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/Makefile 2005-10-24 09:19:10.000000000 -0600
@@ -22,6 +22,7 @@ obj-$(CONFIG_PERFMON) += perfmon_defaul
obj-$(CONFIG_IA64_CYCLONE) += cyclone.o
obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o
+obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o
obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o
obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR) += uncached.o
mca_recovery-y += mca_drv.o mca_drv_asm.o
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/relocate_kernel.S linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/relocate_kernel.S
--- linux-2.6.14-rc4/arch/ia64/kernel/relocate_kernel.S 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/relocate_kernel.S 2005-10-25 14:43:42.000000000 -0600
@@ -0,0 +1,385 @@
+/*
+ * relocate_kernel.S - Relocate kexec'able kernel and start it
+ * Copyright (C) 2005 Khalid Aziz <khalid.aziz@hp.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/config.h>
+#include <asm/asmmacro.h>
+#include <asm/kregs.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
+ /* Must be relocatable PIC code callable as a C function, that once
+ * it starts can not use the previous processes stack.
+ *
+ */
+ /* Q: Do I want to setup an interrupt vector, so what happens
+ * when exceptions occur is well defined?
+ */
+ .text
+ .align 32
+ .global kexec_fake_sal_rendez#
+ .proc kexec_fake_sal_rendez#
+kexec_fake_sal_rendez:
+ mf.a
+ ;;
+ movl r25=kexec_rendez
+ ;;
+ ld8 r17=[r25]
+ {
+ flushrs
+ srlz.i
+ }
+ ;;
+ /* See where I am running, and compute gp */
+ {
+ mov ar.rsc = 0 /* Put RSE in enforce lacy, LE mode */
+ mov gp = ip /* gp == relocate_new_kernel */
+ }
+
+ movl r8=0x00000100000000
+ ;;
+ mov cr.iva=r8
+ /* Transition from virtual to physical mode */
+ rsm psr.i | psr.ic
+ srlz.i
+ movl r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+ ;;
+ mov cr.ipsr=r16
+ ;;
+ mov cr.iip=r17
+ mov cr.ifs=r0
+ ;;
+ rfi
+ ;;
+ .global kexec_rendez_reloc
+kexec_rendez_reloc: /* Now we are in physical mode */
+
+ mov b6=r32 /* _start addr */
+ mov r8=r33 /* ap_wakeup_vector */
+ mov r26=r34 /* PAL addr */
+ ;;
+ /* Purge kernel TRs */
+ movl r16=KERNEL_START
+ mov r18=KERNEL_TR_PAGE_SHIFT<<2
+ ;;
+ ptr.i r16,r18
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+ srlz.d
+ ;;
+ /* Purge percpu TR */
+ movl r16=PERCPU_ADDR
+ mov r18=PERCPU_PAGE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.d
+ ;;
+ /* Purge PAL TR */
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.i r26,r18
+ ;;
+ srlz.i
+ ;;
+ /* Purge stack TR */
+ mov r16=IA64_KR(CURRENT_STACK)
+ ;;
+ shl r16=r16,IA64_GRANULE_SHIFT
+ movl r19=PAGE_OFFSET
+ ;;
+ add r16=r19,r16
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+
+ /* Ensure we can read and clear external interrupts */
+ mov cr.tpr=r0
+ srlz.d
+
+ shr.u r9=r8,6 /* which irr */
+ ;;
+ and r8=63,r8 /* bit offset into irr */
+ ;;
+ mov r10=1;;
+ ;;
+ shl r10=r10,r8 /* bit mask off irr we want */
+ cmp.eq p6,p0=0,r9
+ ;;
+(p6) br.cond.sptk.few check_irr0
+ cmp.eq p7,p0=1,r9
+ ;;
+(p7) br.cond.sptk.few check_irr1
+ cmp.eq p8,p0=2,r9
+ ;;
+(p8) br.cond.sptk.few check_irr2
+ cmp.eq p9,p0=3,r9
+ ;;
+(p9) br.cond.sptk.few check_irr3
+
+check_irr0:
+ mov r8=cr.irr0
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr0
+ br.few call_start
+
+check_irr1:
+ mov r8=cr.irr1
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr1
+ br.few call_start
+
+check_irr2:
+ mov r8=cr.irr2
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr2
+ br.few call_start
+
+check_irr3:
+ mov r8=cr.irr3
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr3
+ br.few call_start
+
+call_start:
+ mov cr.eoi=r0
+ ;;
+ srlz.d
+ ;;
+ mov r8=cr.ivr
+ ;;
+ srlz.d
+ ;;
+ cmp.eq p0,p6=15,r8
+(p6) br.cond.sptk.few call_start
+ br.sptk.few b6
+ .endp kexec_fake_sal_rendez#
+
+ .global relocate_new_kernel#
+ .proc relocate_new_kernel#
+relocate_new_kernel:
+ mf
+ ;;
+ /* Save the ptce information for translation cache purge later */
+ movl r25=kexec_cont
+ movl r27=kexec_ptcebase
+ movl r28=kexec_count0
+ ;;
+ ld8 r17=[r25]
+ ld8 r22=[r27]
+ ld8 r20=[r28]
+ ;;
+ movl r25=kexec_count1
+ movl r27=kexec_stride0
+ movl r28=kexec_stride1
+ ;;
+ ld8 r21=[r25]
+ ld8 r23=[r27]
+ ld8 r24=[r28]
+ ;;
+ movl r27=kexec_pal_base
+ ;;
+ adds r25=48,r27
+ ;;
+ ld8 r26=[r25]
+ ;;
+
+ {
+ flushrs
+ srlz.i
+ }
+ ;;
+ /* See where I am running, and compute gp */
+ {
+ mov ar.rsc = 0 /* Put RSE in enforce lacy, LE mode */
+ mov gp = ip /* gp == relocate_new_kernel */
+ }
+
+ movl r8=0x00000100000000
+ ;;
+ mov cr.iva=r8
+
+ /* Transition from virtual to physical mode */
+ rsm psr.i | psr.ic
+ srlz.i
+ movl r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+ ;;
+ mov cr.ipsr=r16
+ ;;
+ mov cr.iip=r17
+ mov cr.ifs=r0
+ ;;
+ rfi
+ ;;
+ .global kexec_reloc
+kexec_reloc: /* Now we are in physical mode */
+ /* Setup the memory stack */
+ add r12=(memory_stack_end - relocate_new_kernel),gp
+ /* Setup the register stack */
+ add r8=(register_stack - relocate_new_kernel),gp
+ ;;
+ loadrs
+ ;;
+ mov ar.bspstore=r8
+ ;;
+
+ /* Do the copies */
+ mov r8=r32
+ mov b6=r33
+ tpa r28=r34
+ mov r9=0
+ mov r11=PAGE_SIZE
+ ;;
+ /* top, read another word for the indirection page */
+top: ld8 r10=[r8], 8
+ ;;
+ tbit.nz p6,p0 = r10, 0 /* Is it a destination page? */
+ tbit.nz p7,p0 = r10, 1 /* Is it an indirection page? */
+ tbit.nz p8,p0 = r10, 3 /* Is it the source indicator? */
+ tbit.nz p9,p0 = r10, 2 /* Is it the done indicator? */
+ movl r19 = PAGE_MASK
+ ;;
+ and r10 = r10, r19 /* Clear the low 12 bits of r10 */
+ ;;
+(p6) mov r9 = r10 /* destination addr */
+(p7) mov r8 = r10 /* indirection addr */
+(p8) br.cond.sptk.few source
+(p9) br.cond.sptk.few done
+ br.cond.sptk.few top
+source:
+ add r16 = r11, r10
+ add r14 = 8, r10
+ add r15 = 8, r9
+ ;;
+0:
+ ld8 r17 = [r10],16
+ ld8 r18 = [r14],16
+ ;;
+ st8 [r9] = r17, 16
+ st8 [r15] = r18, 16
+ cmp.ne p6,p0 = r16, r10
+ ;;
+(p6) br.cond.sptk.few 0b
+ br.cond.sptk.few top
+done:
+ srlz.i
+ srlz.d
+ ;;
+
+ /* Now purge local tlb */
+ mov r19 = r0
+ adds r21=-1,r20
+ ;;
+2:
+ cmp.ltu p6,p7=r19,r20
+(p7) br.cond.dpnt.few 4f
+ mov ar.lc=r21
+3:
+ ptc.e r22
+ ;;
+ add r22=r24,r22
+ br.cloop.sptk.few 3b
+ ;;
+ add r22=r23,r22
+ add r19=1,r19
+ ;;
+ br.sptk.few 2b
+4:
+ srlz.i ;;
+
+ // Now purge addresses formerly mapped by TR registers
+ // Purge ITR&DTR for kernel.
+ movl r16=KERNEL_START
+ mov r18=KERNEL_TR_PAGE_SHIFT<<2
+ ;;
+ ptr.i r16, r18
+ ptr.d r16, r18
+ ;;
+ srlz.i
+ ;;
+ srlz.d
+ ;;
+ // Purge DTR for PERCPU data.
+ movl r16=PERCPU_ADDR
+ mov r18=PERCPU_PAGE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.d
+ ;;
+ // Purge ITR for PAL code
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.i r26,r18
+ ;;
+ srlz.i
+ ;;
+ // Purge DTR for stack.
+ mov r16=IA64_KR(CURRENT_STACK)
+ ;;
+ shl r16=r16,IA64_GRANULE_SHIFT
+ movl r19=PAGE_OFFSET
+ ;;
+ add r16=r19,r16
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+
+ br.sptk.few b6
+ br.cond.sptk.few 0b
+ .endp relocate_new_kernel#
+
+ .balign 8192
+relocate_new_kernel_end:
+ .global relocate_new_kernel_size
+relocate_new_kernel_size:
+ .long relocate_new_kernel_end - kexec_fake_sal_rendez
+
+ .global kexec_cont
+ .align 8
+kexec_cont: data8 0xdeadbeefdeadbeef
+ .global kexec_rendez
+kexec_rendez: data8 0xdeadbeefdeadbeef
+ .global kexec_ptcebase
+kexec_ptcebase: data8 0xdeadbeefdeadbeef
+ .global kexec_count0
+kexec_count0: data8 0xdeadbeefdeadbeef
+ .global kexec_count1
+kexec_count1: data8 0xdeadbeefdeadbeef
+ .global kexec_stride0
+kexec_stride0: data8 0xdeadbeefdeadbeef
+ .global kexec_stride1
+kexec_stride1: data8 0xdeadbeefdeadbeef
+ .global kexec_pal_base
+kexec_pal_base: data8 0xdeadbeefdeadbeef
+
+register_stack:
+ .fill 8192, 1, 0
+register_stack_end:
+memory_stack:
+ .fill 8192, 1, 0
+memory_stack_end:
diff -urNp linux-2.6.14-rc4/arch/ia64/kernel/smp.c linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/smp.c
--- linux-2.6.14-rc4/arch/ia64/kernel/smp.c 2005-08-28 17:41:01.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/arch/ia64/kernel/smp.c 2005-10-24 10:59:18.000000000 -0600
@@ -30,6 +30,9 @@
#include <linux/delay.h>
#include <linux/efi.h>
#include <linux/bitops.h>
+#ifdef CONFIG_KEXEC
+#include <linux/kexec.h>
+#endif
#include <asm/atomic.h>
#include <asm/current.h>
@@ -84,6 +87,43 @@ unlock_ipi_calllock(void)
spin_unlock_irq(&call_lock);
}
+#ifdef CONFIG_KEXEC
+extern void kexec_fake_sal_rendez(void *start, unsigned long wake_up,
+ unsigned long pal_base);
+
+#define pte_bits 3
+#define vmlpt_bits (impl_va_bits - PAGE_SHIFT + pte_bits)
+#define POW2(n) (1ULL << (n))
+
+DECLARE_PER_CPU(u64, ia64_mca_pal_base);
+
+/*
+ * Stop the CPU and put it in fake SAL rendezvous. This allows CPU to wake
+ * up with IPI from boot processor
+ */
+void
+kexec_stop_this_cpu (void *func)
+{
+ unsigned long pta, impl_va_bits, pal_base;
+
+ /*
+ * Remove this CPU by putting it into fake SAL rendezvous
+ */
+ cpu_clear(smp_processor_id(), cpu_online_map);
+ max_xtp();
+ ia64_eoi();
+
+ /* Disable VHPT */
+ impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+ pta = POW2(61) - POW2(vmlpt_bits);
+ ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+
+ local_irq_disable();
+ pal_base = __get_cpu_var(ia64_mca_pal_base);
+ kexec_fake_sal_rendez(func, ap_wakeup_vector, pal_base);
+}
+#endif
+
static void
stop_this_cpu (void)
{
diff -urNp linux-2.6.14-rc4/include/asm-ia64/kexec.h linux-2.6.14-rc4-kexec-ia64/include/asm-ia64/kexec.h
--- linux-2.6.14-rc4/include/asm-ia64/kexec.h 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.14-rc4-kexec-ia64/include/asm-ia64/kexec.h 2005-10-24 10:20:19.000000000 -0600
@@ -0,0 +1,22 @@
+#ifndef _ASM_IA64_KEXEC_H
+#define _ASM_IA64_KEXEC_H
+
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
+
+#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+
+/* The native architecture */
+#define KEXEC_ARCH KEXEC_ARCH_IA_64
+
+#define MAX_NOTE_BYTES 1024
+typedef u32 note_buf_t[MAX_NOTE_BYTES/4];
+
+extern note_buf_t crash_notes[];
+
+#endif /* _ASM_IA64_KEXEC_H */
diff -urNp linux-2.6.14-rc4/kernel/irq/handle.c linux-2.6.14-rc4-kexec-ia64/kernel/irq/handle.c
--- linux-2.6.14-rc4/kernel/irq/handle.c 2005-10-19 09:04:59.000000000 -0600
+++ linux-2.6.14-rc4-kexec-ia64/kernel/irq/handle.c 2005-10-24 09:40:27.000000000 -0600
@@ -100,6 +100,26 @@ fastcall int handle_IRQ_event(unsigned i
}
/*
+ * Terminate any outstanding interrupts
+ */
+void terminate_irqs(void)
+{
+ struct irqaction * action;
+ irq_desc_t *idesc;
+ unsigned long flags;
+ int i;
+
+ for (i=0; i<NR_IRQS; i++) {
+ idesc = irq_descp(i);
+ action = idesc->action;
+ if (!action)
+ continue;
+ if (idesc->handler->end)
+ idesc->handler->end(i);
+ }
+}
+
+/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
* handlers).
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (4 preceding siblings ...)
2005-10-25 22:52 ` Khalid Aziz
@ 2005-10-26 18:28 ` Gerald Pfeifer
2005-10-26 19:02 ` Luck, Tony
` (8 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Gerald Pfeifer @ 2005-10-26 18:28 UTC (permalink / raw)
To: linux-ia64
On Tue, 25 Oct 2005, Khalid Aziz wrote:
> I have ported the original patch I had done for kexec on ia64 on 2.6.8
> kernel and fixed a few bugs in the original patch. Attached is a patch
> for kernel 2.6.14-rc4. It works with normal kexec reboot on an HP
> rx2600. I am now working on adding support for crash kexec. I am also
> working on kexec on INIT which I currently have working on 2.6.10
> kernel. I am porting it to 2.6.14-rc kernel.
>
> Attached patch needs to be applied on top of iomem and efi_memmapwalk
> patches already in ia64 test tree (these patches attached as well for
> those who may need them).
Cool. Tony, what are your plans for pushing this to Linus? Will it make
2.6.15?
Gerald
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (5 preceding siblings ...)
2005-10-26 18:28 ` Gerald Pfeifer
@ 2005-10-26 19:02 ` Luck, Tony
2005-10-26 20:25 ` Eric W. Biederman
` (7 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-10-26 19:02 UTC (permalink / raw)
To: linux-ia64
> Cool. Tony, what are your plans for pushing this to Linus? Will it make
> 2.6.15?
Nanhai Zou here at Intel expressed a few concerns to me last night
about Khalid's patch. I'll paste them here to speed discussion about
this (as I expect Nanhai is asleep at the moment, he should be
around to start commenting for himself by 4-5pm Pacific).
> I think his patch is still not able to boot an unmodified kernel.
> It appends a kernel parameter to bypass the issue, thus the second kernel need to be modified.
> It also hardcoded initrd logic in kernel patch.
> Command line is still using old command line.
> No purgatory code support etc.
> How, I prefer to put a small and clean patch in kernel while leave most of the things in kexec-tools.
> That will provide more flexibility.
> There are also some other issues I can see, like,
> 1. icache flusing miss
> 2. rendez code is fake, I prefer to use hotplug API.
> 3. Disable PCI master code should be in generic PCI driver code instead of IA64 arch code.
Nanhai has his own patches for kexec/kexec-tools, which are
stuck in some Intel bureaucracy at the moment ... I'm trying
to get them unstuck so that we can get some meaningful
commentary from the community about both versions.
My biggest issue with both patches at the moment is that I
can't see how either of them can be extended to be useful
for use in crash-dump case without some significant surgery.
Both of them over-write the existing kernel with the new one,
which is a big problem when you'd like to dump the data space
of the old kernel. Ia64 is quite happy to run a kernel loaded
at any suitably aligned address ... so why not load the new
kernel in some different location from the old kernel?
Including this in 2.6.15? It's possible, but it's looking like
this might be a rush. Assuming Linus releases 2.6.14 by the
end of this week, we only have a couple of weeks to check that
this runs on all of the weird configurations. I'd need to see
a lot of "tested on xxx-config ... no problems" e-mail to get
confidence in this.
-Tony
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (6 preceding siblings ...)
2005-10-26 19:02 ` Luck, Tony
@ 2005-10-26 20:25 ` Eric W. Biederman
2005-10-26 21:43 ` Luck, Tony
` (6 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Eric W. Biederman @ 2005-10-26 20:25 UTC (permalink / raw)
To: linux-ia64
"Luck, Tony" <tony.luck@intel.com> writes:
>> Cool. Tony, what are your plans for pushing this to Linus? Will it make
>> 2.6.15?
>
> Nanhai Zou here at Intel expressed a few concerns to me last night
> about Khalid's patch. I'll paste them here to speed discussion about
> this (as I expect Nanhai is asleep at the moment, he should be
> around to start commenting for himself by 4-5pm Pacific).
>
>> I think his patch is still not able to boot an unmodified kernel.
>> It appends a kernel parameter to bypass the issue, thus the second kernel need
> to be modified.
>
>> It also hardcoded initrd logic in kernel patch.
>> Command line is still using old command line.
>> No purgatory code support etc.
I agree that is an issue that should be addressed.
It would be nice if there was a kernel option to not virtually
map EFI. Reusing a supplied virtual address is also good, but
it means we can't boot an unpatched kernel.
>> How, I prefer to put a small and clean patch in kernel while leave most of the
> things in kexec-tools.
>> That will provide more flexibility.
>
>> There are also some other issues I can see, like,
>> 1. icache flusing miss
>> 2. rendez code is fake, I prefer to use hotplug API.
>> 3. Disable PCI master code should be in generic PCI driver code instead of
> IA64 arch code.
>
> Nanhai has his own patches for kexec/kexec-tools, which are
> stuck in some Intel bureaucracy at the moment ... I'm trying
> to get them unstuck so that we can get some meaningful
> commentary from the community about both versions.
>
> My biggest issue with both patches at the moment is that I
> can't see how either of them can be extended to be useful
> for use in crash-dump case without some significant surgery.
> Both of them over-write the existing kernel with the new one,
> which is a big problem when you'd like to dump the data space
> of the old kernel. Ia64 is quite happy to run a kernel loaded
> at any suitably aligned address ... so why not load the new
> kernel in some different location from the old kernel?
Interesting. This should be a decision made by kexec-tools,
not by the kernel. On x86 the kernel just verifies we load the
crash kernel into the reserved chunk of the address space. I haven't
looked closely enough to see if the architecture part has fixed
address assumptions yet.
Tony what were you seeing that made you conclude that the code
would always load over the existing kernel?
I also didn't see the trivial patch to put the 32bit compat support
in. It's not terribly important or useful but there is no reason
not to include it.
Eric
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (7 preceding siblings ...)
2005-10-26 20:25 ` Eric W. Biederman
@ 2005-10-26 21:43 ` Luck, Tony
2005-10-26 21:49 ` Khalid Aziz
` (5 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-10-26 21:43 UTC (permalink / raw)
To: linux-ia64
On Wed, Oct 26, 2005 at 02:25:56PM -0600, Eric W. Biederman wrote:
> Interesting. This should be a decision made by kexec-tools,
> not by the kernel. On x86 the kernel just verifies we load the
> crash kernel into the reserved chunk of the address space. I haven't
> looked closely enough to see if the architecture part has fixed
> address assumptions yet.
>
> Tony what were you seeing that made you conclude that the code
> would always load over the existing kernel?
Ok .. kexectools should be able to make a decision about where to load the
new kernel based on what it finds in /proc/iomem (and in the Elf header
of the new kernel). I don't know enough Elf (elvish? :-) to know
whether the Elf header we currently generate for a kernel describes
things in a way that would convey that it is OK to drop the image
at any (suitably aligned) address, or whether there will have to be
some ia64 specific magic in the kexectools to choose the load address.
> I also didn't see the trivial patch to put the 32bit compat support
> in. It's not terribly important or useful but there is no reason
> not to include it.
Usefullness is a key here. The kexectools definitely include some
architecture specific components. So taking the x86 version of the
"kexec" binary onto an ia64 system isn't going to be very useful even
if the kernel did happen to have an ia32 entry point for kexec
enabled. Building an ia32 binary, but with all the ia64 specific
parts enabled would seem to be _challenging_ (Nanhai's version has
purgatory/arch/ia64/entry.S!). Perhaps there might be a better outlet
for that much creativity? [Which is another way of saying that I'm
not interested in seeing a patch to enable the ia32 kexec entry point
on ia64 ... so don't waste any time creating one].
-Tony
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (8 preceding siblings ...)
2005-10-26 21:43 ` Luck, Tony
@ 2005-10-26 21:49 ` Khalid Aziz
2005-10-26 23:21 ` Zou Nan hai
` (4 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2005-10-26 21:49 UTC (permalink / raw)
To: linux-ia64
On Wed, 2005-10-26 at 12:02 -0700, Luck, Tony wrote:
> > Cool. Tony, what are your plans for pushing this to Linus? Will it make
> > 2.6.15?
>
> Nanhai Zou here at Intel expressed a few concerns to me last night
> about Khalid's patch. I'll paste them here to speed discussion about
> this (as I expect Nanhai is asleep at the moment, he should be
> around to start commenting for himself by 4-5pm Pacific).
>
> > I think his patch is still not able to boot an unmodified kernel.
> > It appends a kernel parameter to bypass the issue, thus the second kernel need to be modified.
>
True. The only time I use this parameter is to determine whether to
virtualize EFI or not. EFI does not respond well to being virtualized
once it has been virtualized already. So the kernel needs to know if EFI
has already been virtualized by previous kernel. It is possible to pass
this information to the next kernel as a command line parameter, as I
have done, or in one of the kexec segments. One way or the other, kernel
needs to know this. I have not found a way around it. If there is one, I
would like to hear about it. That will make enable unmodified kernel to
be booted.
> > It also hardcoded initrd logic in kernel patch.
I could not find a better way to pass initrd image to ia64 kernel since
it is not placed in a fixed location. Using a fixed kexec segment looked
fairly logical to me. Alternative would be to add a type field to struct
kexec_segment, then kernel can determine which segment holds initrd
image without having to use a fixed kexec segment.
> > Command line is still using old command line.
Please explain.
> > No purgatory code support etc.
>
> > How, I prefer to put a small and clean patch in kernel while leave most of the things in kexec-tools.
> > That will provide more flexibility.
>
> > There are also some other issues I can see, like,
> > 1. icache flusing miss
> > 2. rendez code is fake, I prefer to use hotplug API.
That would be preferable, and would be a good enhancement over current
code if it can be made to work reliably. I was planning to look into it
after initial implementation (I wrote initial implementation before CPU
hotplug API was available).
> > 3. Disable PCI master code should be in generic PCI driver code instead of IA64 arch code.
Agreed. This is part of some of the cleanup that can still be done.
>
> Nanhai has his own patches for kexec/kexec-tools, which are
> stuck in some Intel bureaucracy at the moment ... I'm trying
> to get them unstuck so that we can get some meaningful
> commentary from the community about both versions.
>
> My biggest issue with both patches at the moment is that I
> can't see how either of them can be extended to be useful
> for use in crash-dump case without some significant surgery.
> Both of them over-write the existing kernel with the new one,
> which is a big problem when you'd like to dump the data space
> of the old kernel. Ia64 is quite happy to run a kernel loaded
> at any suitably aligned address ... so why not load the new
> kernel in some different location from the old kernel?
>
> Including this in 2.6.15? It's possible, but it's looking like
> this might be a rush. Assuming Linus releases 2.6.14 by the
> end of this week, we only have a couple of weeks to check that
> this runs on all of the weird configurations. I'd need to see
> a lot of "tested on xxx-config ... no problems" e-mail to get
> confidence in this.
>
> -Tony
--
Khalid
==================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
khalid.aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (9 preceding siblings ...)
2005-10-26 21:49 ` Khalid Aziz
@ 2005-10-26 23:21 ` Zou Nan hai
2005-10-27 7:10 ` Eric W. Biederman
` (3 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Zou Nan hai @ 2005-10-26 23:21 UTC (permalink / raw)
To: linux-ia64
Hi Khalid,
On Thu, 2005-10-27 at 05:49, Khalid Aziz wrote:
> On Wed, 2005-10-26 at 12:02 -0700, Luck, Tony wrote:
> > > Cool. Tony, what are your plans for pushing this to Linus? Will
> it make
> > > 2.6.15?
> >
> > Nanhai Zou here at Intel expressed a few concerns to me last night
> > about Khalid's patch. I'll paste them here to speed discussion
> about
> > this (as I expect Nanhai is asleep at the moment, he should be
> > around to start commenting for himself by 4-5pm Pacific).
> >
> > > I think his patch is still not able to boot an unmodified kernel.
> > > It appends a kernel parameter to bypass the issue, thus the second
> kernel need to be modified.
> >
>
> True. The only time I use this parameter is to determine whether to
> virtualize EFI or not. EFI does not respond well to being virtualized
> once it has been virtualized already. So the kernel needs to know if
> EFI
> has already been virtualized by previous kernel. It is possible to
> pass
> this information to the next kernel as a command line parameter, as I
> have done, or in one of the kexec segments. One way or the other,
> kernel
> needs to know this. I have not found a way around it. If there is one,
> I
> would like to hear about it. That will make enable unmodified kernel
> to
> be booted.
>
> > > It also hardcoded initrd logic in kernel patch.
>
> I could not find a better way to pass initrd image to ia64 kernel
> since
> it is not placed in a fixed location. Using a fixed kexec segment
> looked
> fairly logical to me. Alternative would be to add a type field to
> struct
> kexec_segment, then kernel can determine which segment holds initrd
> image without having to use a fixed kexec segment.
>
> > > Command line is still using old command line.
>
> Please explain.
>
Sorry, I see how your patch can deal with command line. I missed the
machine_kexec_prepare part at the first look. However I prefer to put
command line and initrd logic to kexec tools instead of hack on segment
index.
> > > No purgatory code support etc.
> >
> > > How, I prefer to put a small and clean patch in kernel while leave
> most of the things in kexec-tools.
> > > That will provide more flexibility.
> >
> > > There are also some other issues I can see, like,
> > > 1. icache flusing miss
> > > 2. rendez code is fake, I prefer to use hotplug API.
>
> That would be preferable, and would be a good enhancement over current
> code if it can be made to work reliably. I was planning to look into
> it
> after initial implementation (I wrote initial implementation before
> CPU
> hotplug API was available).
>
> > > 3. Disable PCI master code should be in generic PCI driver code
> instead of IA64 arch code.
>
> Agreed. This is part of some of the cleanup that can still be done.
>
> >
> > Nanhai has his own patches for kexec/kexec-tools, which are
> > stuck in some Intel bureaucracy at the moment ... I'm trying
> > to get them unstuck so that we can get some meaningful
> > commentary from the community about both versions.
> >
> > My biggest issue with both patches at the moment is that I
> > can't see how either of them can be extended to be useful
> > for use in crash-dump case without some significant surgery.
> > Both of them over-write the existing kernel with the new one,
> > which is a big problem when you'd like to dump the data space
> > of the old kernel. Ia64 is quite happy to run a kernel loaded
> > at any suitably aligned address ... so why not load the new
> > kernel in some different location from the old kernel?
> >
> > Including this in 2.6.15? It's possible, but it's looking like
> > this might be a rush. Assuming Linus releases 2.6.14 by the
> > end of this week, we only have a couple of weeks to check that
> > this runs on all of the weird configurations. I'd need to see
> > a lot of "tested on xxx-config ... no problems" e-mail to get
> > confidence in this.
> >
> > -Tony
>
> --
> Khalid
>
As Tony said, I have my kexec and kexec-tools patches
solved those issues. It can boots any unmodified kernel. But they are
pending at Intel bureaucracy.
Hope I can send out them to community for comments soon.
Thanks
Zou Nan hai
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (10 preceding siblings ...)
2005-10-26 23:21 ` Zou Nan hai
@ 2005-10-27 7:10 ` Eric W. Biederman
2005-10-27 19:05 ` Khalid Aziz
` (2 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Eric W. Biederman @ 2005-10-27 7:10 UTC (permalink / raw)
To: linux-ia64
"Luck, Tony" <tony.luck@intel.com> writes:
> On Wed, Oct 26, 2005 at 02:25:56PM -0600, Eric W. Biederman wrote:
>> Interesting. This should be a decision made by kexec-tools,
>> not by the kernel. On x86 the kernel just verifies we load the
>> crash kernel into the reserved chunk of the address space. I haven't
>> looked closely enough to see if the architecture part has fixed
>> address assumptions yet.
>>
>> Tony what were you seeing that made you conclude that the code
>> would always load over the existing kernel?
>
> Ok .. kexectools should be able to make a decision about where to load the
> new kernel based on what it finds in /proc/iomem (and in the Elf header
> of the new kernel). I don't know enough Elf (elvish? :-) to know
> whether the Elf header we currently generate for a kernel describes
> things in a way that would convey that it is OK to drop the image
> at any (suitably aligned) address, or whether there will have to be
> some ia64 specific magic in the kexectools to choose the load address.
I don't think ld can be talked into setting ET_REL instead ET_EXEC right
now, without building as a shared library. (readelf -a on the kernel
will tell you) but since that is a general problem it is likely worth
an extra flag to /sbin/kexec to tell it to assume an ELF executable is
relocatable even if it doesn't say ET_REL.
>> I also didn't see the trivial patch to put the 32bit compat support
>> in. It's not terribly important or useful but there is no reason
>> not to include it.
>
> Usefullness is a key here. The kexectools definitely include some
> architecture specific components. So taking the x86 version of the
> "kexec" binary onto an ia64 system isn't going to be very useful even
> if the kernel did happen to have an ia32 entry point for kexec
> enabled. Building an ia32 binary, but with all the ia64 specific
> parts enabled would seem to be _challenging_ (Nanhai's version has
> purgatory/arch/ia64/entry.S!). Perhaps there might be a better outlet
> for that much creativity? [Which is another way of saying that I'm
> not interested in seeing a patch to enable the ia32 kexec entry point
> on ia64 ... so don't waste any time creating one].
I know of at least one application that before it flashes your rom
chip checks to see if you have kexec in your kernel. And it does
that by calling sys_kexec and seeing if it gets -EINVAL instead
of -ENOSYS. At least with kexec present it knows that if something
terribly goes wrong it has the chance to load another kernel, in the
event the mtd drivers in the kernel don't handle some subtle hardware
bug. That application can safely be distributed as a 32bit binary
on i386, x86_64, and ia64.
I'm not quite certain what the build issues that would be involved
but it wouldn't surprise one of the architectures that normally run
a 32bit user space with a 64bit kernel happened to solve the issue.
So I only expect to use the code that comes pretty much for free.
The kernel side of the implementation already exists and I suspect
it is as useful as any other ia32 compat syscall entry point on the
ia64 kernel. I care as this is a completeness issue and I don't
see a reason not to enable the kernel side.
Eric
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (11 preceding siblings ...)
2005-10-27 7:10 ` Eric W. Biederman
@ 2005-10-27 19:05 ` Khalid Aziz
2005-10-27 23:17 ` Zou Nan hai
2006-04-03 22:20 ` Khalid Aziz
14 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2005-10-27 19:05 UTC (permalink / raw)
To: linux-ia64
On Thu, 2005-10-27 at 07:21 +0800, Zou Nan hai wrote:
>
> As Tony said, I have my kexec and kexec-tools patches
> solved those issues. It can boots any unmodified kernel. But they are
> pending at Intel bureaucracy.
Can you give us some idea of how you got around the EFI virtualization
issue?
--
Khalid
==================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
khalid.aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (12 preceding siblings ...)
2005-10-27 19:05 ` Khalid Aziz
@ 2005-10-27 23:17 ` Zou Nan hai
2006-04-03 22:20 ` Khalid Aziz
14 siblings, 0 replies; 29+ messages in thread
From: Zou Nan hai @ 2005-10-27 23:17 UTC (permalink / raw)
To: linux-ia64
On Fri, 2005-10-28 at 03:05, Khalid Aziz wrote:
> On Thu, 2005-10-27 at 07:21 +0800, Zou Nan hai wrote:
> >
> > As Tony said, I have my kexec and kexec-tools patches
> > solved those issues. It can boots any unmodified kernel. But they are
> > pending at Intel bureaucracy.
>
> Can you give us some idea of how you got around the EFI virtualization
> issue?
I patched the EFI bootparam pointer in purgatory code to an empty dummy
function.
Zou Nan hai
^ permalink raw reply [flat|nested] 29+ messages in thread* [PATCH] kexec on ia64
2004-11-15 20:32 [PATCH] kexec on ia64 Khalid Aziz
` (13 preceding siblings ...)
2005-10-27 23:17 ` Zou Nan hai
@ 2006-04-03 22:20 ` Khalid Aziz
2006-04-04 4:20 ` Andrew Morton
2006-04-04 18:13 ` [Fastboot] " Eric W. Biederman
14 siblings, 2 replies; 29+ messages in thread
From: Khalid Aziz @ 2006-04-03 22:20 UTC (permalink / raw)
To: LKML, Fastboot mailing list, Linux ia64
Add kexec support on ia64.
Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
---
diff -urNp linux-2.6.16/arch/ia64/hp/common/sba_iommu.c linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c
--- linux-2.6.16/arch/ia64/hp/common/sba_iommu.c 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c 2006-03-27 15:42:47.000000000 -0700
@@ -1624,6 +1624,28 @@ ioc_iova_init(struct ioc *ioc)
READ_REG(ioc->ioc_hpa + IOC_IBASE);
}
+#ifdef CONFIG_KEXEC
+void
+ioc_iova_disable(void)
+{
+ struct ioc *ioc;
+
+ ioc = ioc_list;
+
+ while (ioc != NULL) {
+ /* Disable IOVA translation */
+ WRITE_REG(ioc->ibase & 0xfffffffffffffffe, ioc->ioc_hpa + IOC_IBASE);
+ READ_REG(ioc->ioc_hpa + IOC_IBASE);
+
+ /* Clear I/O TLB of any possible entries */
+ WRITE_REG(ioc->ibase | (get_iovp_order(ioc->iov_size) + iovp_shift), ioc->ioc_hpa + IOC_PCOM);
+ READ_REG(ioc->ioc_hpa + IOC_PCOM);
+
+ ioc = ioc->next;
+ }
+}
+#endif
+
static void __init
ioc_resource_init(struct ioc *ioc)
{
diff -urNp linux-2.6.16/arch/ia64/Kconfig linux-2.6.16-kexec/arch/ia64/Kconfig
--- linux-2.6.16/arch/ia64/Kconfig 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/Kconfig 2006-03-27 15:42:47.000000000 -0700
@@ -376,6 +376,23 @@ config IA64_PALINFO
config SGI_SN
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC)
+config KEXEC
+ bool "kexec system call (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is indepedent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similiarity to the exec system call.
+
+ It is an ongoing process to be certain the hardware in a machine
+ is properly shutdown, so do not be surprised if this code does not
+ initially work for you. It may help to enable device hotplugging
+ support. As of this writing the exact hardware interface is
+ strongly in flux, so no good recommendation can be made.
+
source "drivers/firmware/Kconfig"
source "fs/Kconfig.binfmt"
diff -urNp linux-2.6.16/arch/ia64/kernel/crash.c linux-2.6.16-kexec/arch/ia64/kernel/crash.c
--- linux-2.6.16/arch/ia64/kernel/crash.c 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/crash.c 2006-03-27 15:49:44.000000000 -0700
@@ -0,0 +1,43 @@
+/*
+ * arch/ia64/kernel/crash.c
+ *
+ * Architecture specific (ia64) functions for kexec based crash dumps.
+ *
+ * Created by: Khalid Aziz <khalid.aziz@hp.com>
+ *
+ * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
+ *
+ */
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/irq.h>
+#include <linux/reboot.h>
+#include <linux/kexec.h>
+#include <linux/irq.h>
+#include <linux/delay.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/device.h>
+
+void
+machine_crash_shutdown(struct pt_regs *pt)
+{
+ /* This function is only called after the system
+ * has paniced or is otherwise in a critical state.
+ * The minimum amount of code to allow a kexec'd kernel
+ * to run successfully needs to happen here.
+ *
+ * In practice this means shooting down the other cpus in
+ * an SMP system.
+ */
+ if (in_interrupt()) {
+ terminate_irqs();
+ ia64_eoi();
+ }
+ system_state = SYSTEM_RESTART;
+ device_shutdown();
+ system_state = SYSTEM_BOOTING;
+ machine_shutdown();
+}
diff -urNp linux-2.6.16/arch/ia64/kernel/entry.S linux-2.6.16-kexec/arch/ia64/kernel/entry.S
--- linux-2.6.16/arch/ia64/kernel/entry.S 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/entry.S 2006-03-27 15:42:47.000000000 -0700
@@ -1590,7 +1590,7 @@ sys_call_table:
data8 sys_mq_timedreceive // 1265
data8 sys_mq_notify
data8 sys_mq_getsetattr
- data8 sys_ni_syscall // reserved for kexec_load
+ data8 sys_kexec_load
data8 sys_ni_syscall // reserved for vserver
data8 sys_waitid // 1270
data8 sys_add_key
diff -urNp linux-2.6.16/arch/ia64/kernel/machine_kexec.c linux-2.6.16-kexec/arch/ia64/kernel/machine_kexec.c
--- linux-2.6.16/arch/ia64/kernel/machine_kexec.c 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/machine_kexec.c 2006-04-03 13:42:09.000000000 -0600
@@ -0,0 +1,149 @@
+/*
+ * arch/ia64/kernel/machine_kexec.c
+ *
+ * Handle transition of Linux booting another kernel
+ * Copyright (C) 2005 Hewlett-Packard Development Comapny, L.P.
+ * Copyright (C) 2005 Khalid Aziz <khalid.aziz@hp.com>
+ * Copyright (C) 2006 Intel Corp, Zou Nan hai <nanhai.zou@intel.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <linux/pci.h>
+#include <linux/cpu.h>
+#include <asm/mmu_context.h>
+#include <asm/setup.h>
+#include <asm/mca.h>
+#include <asm/page.h>
+#include <asm/bitops.h>
+#include <asm/tlbflush.h>
+#include <asm/delay.h>
+#include <asm/meminit.h>
+
+extern unsigned long ia64_iobase;
+
+static void set_io_base(void)
+{
+ unsigned long phys_iobase;
+
+ /* set kr0 to iobase */
+ phys_iobase = __pa(ia64_iobase);
+ ia64_set_kr(IA64_KR_IO_BASE, __IA64_UNCACHED_OFFSET | phys_iobase);
+};
+
+typedef void (*relocate_new_kernel_t)( unsigned long, unsigned long,
+ struct ia64_boot_param *, unsigned long);
+
+/*
+ * Do what every setup is needed on image and the
+ * reboot code buffer to allow us to avoid allocations
+ * later.
+ */
+int machine_kexec_prepare(struct kimage *image)
+{
+ void *control_code_buffer;
+ const unsigned long *func;
+
+ func = (unsigned long *)&relocate_new_kernel;
+ /* Pre-load control code buffer to minimize work in kexec path */
+ control_code_buffer = page_address(image->control_code_page);
+ memcpy((void *)control_code_buffer, (const void *)func[0],
+ relocate_new_kernel_size);
+ flush_icache_range((unsigned long)control_code_buffer,
+ (unsigned long)control_code_buffer + relocate_new_kernel_size);
+
+ return 0;
+}
+
+void machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+#ifdef CONFIG_PCI
+void machine_shutdown(void)
+{
+ struct pci_dev *dev;
+ irq_desc_t *idesc;
+ cpumask_t mask = CPU_MASK_NONE;
+
+ /* Disable all PCI devices */
+ list_for_each_entry(dev, &pci_devices, global_list) {
+ if (!(dev->is_enabled))
+ continue;
+ idesc = irq_descp(dev->irq);
+ if (!idesc)
+ continue;
+ cpu_set(0, mask);
+ disable_irq_nosync(dev->irq);
+ idesc->handler->end(dev->irq);
+ idesc->handler->set_affinity(dev->irq, mask);
+ idesc->action = NULL;
+ pci_disable_device(dev);
+ pci_set_power_state(dev, 0);
+ }
+}
+#endif
+
+/*
+ * Do not allocate memory (or fail in any way) in machine_kexec().
+ * We are past the point of no return, committed to rebooting now.
+ */
+void machine_kexec(struct kimage *image)
+{
+ unsigned long indirection_page;
+ relocate_new_kernel_t rnk;
+ unsigned long pta, impl_va_bits;
+ void *pal_addr = efi_get_pal_addr();
+ unsigned long code_addr = (unsigned long)page_address(image->control_code_page);
+
+#ifdef CONFIG_HOTPLUG_CPU
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ if (cpu != smp_processor_id())
+ cpu_down(cpu);
+ }
+#elif CONFIG_SMP
+ smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
+#endif
+
+ ia64_set_itv(1<<16);
+ /* Interrupts aren't acceptable while we reboot */
+ local_irq_disable();
+
+ /* set kr0 to the appropriate address */
+ set_io_base();
+
+ /* Disable VHPT */
+ impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+ pta = POW2(61) - POW2(vmlpt_bits);
+ ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+
+#ifdef CONFIG_IA64_HP_ZX1
+ ioc_iova_disable();
+#endif
+ /* now execute the control code.
+ * We will start by executing the control code linked into the
+ * kernel as opposed to the code we copied in control code buffer * page. When this code switches to physical mode, we will start
+ * executing the code in control code buffer page. Reason for
+ * doing this is we start code execution in virtual address space.
+ * If we were to try to execute the newly copied code in virtual
+ * address space, we will need to make an ITLB entry to avoid ITLB
+ * miss. By executing the code linked into kernel, we take advantage
+ * of the ITLB entry already in place for kernel and avoid making
+ * a new entry.
+ */
+ indirection_page = image->head & PAGE_MASK;
+
+ rnk = (relocate_new_kernel_t)&code_addr;
+ (*rnk)(indirection_page, image->start, ia64_boot_param,
+ GRANULEROUNDDOWN((unsigned long) pal_addr));
+ BUG();
+ for (;;)
+ ;
+}
diff -urNp linux-2.6.16/arch/ia64/kernel/Makefile linux-2.6.16-kexec/arch/ia64/kernel/Makefile
--- linux-2.6.16/arch/ia64/kernel/Makefile 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/Makefile 2006-03-27 15:42:47.000000000 -0700
@@ -28,6 +28,7 @@ obj-$(CONFIG_IA64_CYCLONE) += cyclone.o
obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o
obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o
+obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o
obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR) += uncached.o
mca_recovery-y += mca_drv.o mca_drv_asm.o
diff -urNp linux-2.6.16/arch/ia64/kernel/relocate_kernel.S linux-2.6.16-kexec/arch/ia64/kernel/relocate_kernel.S
--- linux-2.6.16/arch/ia64/kernel/relocate_kernel.S 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/relocate_kernel.S 2006-03-31 09:04:10.000000000 -0700
@@ -0,0 +1,359 @@
+/*
+ * arch/ia64/kernel/relocate_kernel.S
+ *
+ * Relocate kexec'able kernel and start it
+ *
+ * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
+ * Copyright (C) 2005 Khalid Aziz <khalid.aziz@hp.com>
+ * Copyright (C) 2005 Intel Corp, Zou Nan hai <nanhai.zou@intel.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+#include <linux/config.h>
+#include <asm/asmmacro.h>
+#include <asm/kregs.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/mca_asm.h>
+
+ /* Must be relocatable PIC code callable as a C function, that once
+ * it starts can not use the previous processes stack.
+ *
+ */
+GLOBAL_ENTRY(relocate_new_kernel)
+ .prologue
+ alloc r31=ar.pfs,4,0,0,0
+ .body
+.reloc_entry:
+{
+ rsm psr.i| psr.ic
+ mov r2=ip
+}
+ ;;
+{
+ flushrs // must be first insn in group
+ srlz.i
+}
+ ;;
+
+ //first switch to physical mode
+ add r3\x1f-.reloc_entry, r2
+ movl r16 = IA64_PSR_AC|IA64_PSR_BN|IA64_PSR_IC|IA64_PSR_MFL
+ mov ar.rsc=0 // put RSE in enforced lazy mode
+ ;;
+ add r2=(memory_stack-.reloc_entry), r2
+ ;;
+ add sp=(memory_stack_end - .reloc_entry),r2
+ add r8=(register_stack - .reloc_entry),r2
+ ;;
+ tpa sp=sp
+ tpa r3=r3
+ ;;
+ loadrs
+ ;;
+ mov r18=ar.rnat
+ mov ar.bspstore=r8
+ ;;
+ mov cr.ipsr=r16
+ mov cr.iip=r3
+ mov cr.ifs=r0
+ srlz.i
+ ;;
+ mov ar.rnat=r18
+ rfi
+ ;;
+1:
+ //physical mode code begin
+ mov b6=in1
+ tpa r28=in2 // tpa must before TLB purge
+
+ // purge all TC entries
+#define O(member) IA64_CPUINFO_##member##_OFFSET
+ GET_THIS_PADDR(r2, cpu_info) // load phys addr of cpu_info into r2
+ ;;
+ addl r17=O(PTCE_STRIDE),r2
+ addl r2=O(PTCE_BASE),r2
+ ;;
+ ld8 r18=[r2],(O(PTCE_COUNT)-O(PTCE_BASE));; // r18=ptce_base
+ ld4 r19=[r2],4 // r19=ptce_count[0]
+ ld4 r21=[r17],4 // r21=ptce_stride[0]
+ ;;
+ ld4 r20=[r2] // r20=ptce_count[1]
+ ld4 r22=[r17] // r22=ptce_stride[1]
+ mov r24=r0
+ ;;
+ adds r20=-1,r20
+ ;;
+#undef O
+2:
+ cmp.ltu p6,p7=r24,r19
+(p7) br.cond.dpnt.few 4f
+ mov ar.lc=r20
+3:
+ ptc.e r18
+ ;;
+ add r18=r22,r18
+ br.cloop.sptk.few 3b
+ ;;
+ add r18=r21,r18
+ add r24=1,r24
+ ;;
+ br.sptk.few 2b
+4:
+ srlz.i
+ ;;
+ //purge TR entry for kernel text and data
+ movl r16=KERNEL_START
+ mov r18=KERNEL_TR_PAGE_SHIFT<<2
+ ;;
+ ptr.i r16, r18
+ ptr.d r16, r18
+ ;;
+ srlz.i
+ ;;
+
+ // purge TR entry for percpu data
+ movl r16=PERCPU_ADDR
+ mov r18=PERCPU_PAGE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.d
+ ;;
+
+ // purge TR entry for pal code
+ mov r16=in3
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.i r16,r18
+ ;;
+ srlz.i
+ ;;
+
+ // purge TR entry for stack
+ mov r16=IA64_KR(CURRENT_STACK)
+ ;;
+ shl r16=r16,IA64_GRANULE_SHIFT
+ movl r19=PAGE_OFFSET
+ ;;
+ add r16=r19,r16
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+
+ // copy kexec kernel segments
+ movl r16=PAGE_MASK
+ ld8 r30=[in0],8;; // in0 is page_list
+ br.sptk.few .dest_page
+ ;;
+.loop:
+ ld8 r30=[in0], 8;;
+.dest_page:
+ tbit.z p0, p6=r30, 0;; // 0x1 dest page
+(p6) and r17=r30, r16
+(p6) br.cond.sptk.few .loop;;
+
+ tbit.z p0, p6=r30, 1;; // 0x2 indirect page
+(p6) and in0=r30, r16
+(p6) br.cond.sptk.few .loop;;
+
+ tbit.z p0, p6=r30, 2;; // 0x4 end flag
+(p6) br.cond.sptk.few .end_loop;;
+
+ tbit.z p6, p0=r30, 3;; // 0x8 source page
+(p6) br.cond.sptk.few .loop
+
+ and r18=r30, r16
+
+ // simple copy page, may optimize later
+ movl r14=PAGE_SIZE/8 - 1;;
+ mov ar.lc=r14;;
+1:
+ ld8 r14=[r18], 8;;
+ st8 [r17]=r14, 8;;
+ fc.i r17
+ br.ctop.sptk.few 1b
+ br.sptk.few .loop
+ ;;
+
+.end_loop:
+ sync.i // for fc.i
+ ;;
+ srlz.i
+ ;;
+ srlz.d
+ ;;
+ br.call.sptk.many b0¶;;
+memory_stack:
+ .fill 8192, 1, 0
+memory_stack_end:
+register_stack:
+ .fill 8192, 1, 0
+register_stack_end:
+relocate_new_kernel_end:
+END(relocate_new_kernel)
+
+GLOBAL_ENTRY(kexec_fake_sal_rendez)
+ .prologue
+ alloc r31=ar.pfs,3,0,0,0
+ .body
+.rendez_entry:
+ rsm psr.i | psr.ic
+ mov r25=ip
+ ;;
+ {
+ flushrs
+ srlz.i
+ }
+ ;;
+ /* See where I am running, and compute gp */
+ {
+ mov ar.rsc = 0 /* Put RSE in enforce lacy, LE mode */
+ mov gp = ip /* gp = relocate_new_kernel */
+ }
+
+ movl r8=0x00000100000000
+ ;;
+ mov cr.iva=r8
+ /* Transition from virtual to physical mode */
+ srlz.i
+ ;;
+ add r17_-.rendez_entry, r25
+ movl r16=(IA64_PSR_AC | IA64_PSR_BN | IA64_PSR_IC | IA64_PSR_MFL)
+ ;;
+ tpa r17=r17
+ mov cr.ipsr=r16
+ ;;
+ mov cr.iip=r17
+ mov cr.ifs=r0
+ ;;
+ rfi
+ ;;
+5:
+ mov b6=in0 /* _start addr */
+ mov r8=in1 /* ap_wakeup_vector */
+ mov r26=in2 /* PAL addr */
+ ;;
+ /* Purge kernel TRs */
+ movl r16=KERNEL_START
+ mov r18=KERNEL_TR_PAGE_SHIFT<<2
+ ;;
+ ptr.i r16,r18
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+ srlz.d
+ ;;
+ /* Purge percpu TR */
+ movl r16=PERCPU_ADDR
+ mov r18=PERCPU_PAGE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.d
+ ;;
+ /* Purge PAL TR */
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.i r26,r18
+ ;;
+ srlz.i
+ ;;
+ /* Purge stack TR */
+ mov r16=IA64_KR(CURRENT_STACK)
+ ;;
+ shl r16=r16,IA64_GRANULE_SHIFT
+ movl r19=PAGE_OFFSET
+ ;;
+ add r16=r19,r16
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+
+ /* Ensure we can read and clear external interrupts */
+ mov cr.tpr=r0
+ srlz.d
+
+ shr.u r9=r8,6 /* which irr */
+ ;;
+ and r8c,r8 /* bit offset into irr */
+ ;;
+ mov r10=1;;
+ ;;
+ shl r10=r10,r8 /* bit mask off irr we want */
+ cmp.eq p6,p0=0,r9
+ ;;
+(p6) br.cond.sptk.few check_irr0
+ cmp.eq p7,p0=1,r9
+ ;;
+(p7) br.cond.sptk.few check_irr1
+ cmp.eq p8,p0=2,r9
+ ;;
+(p8) br.cond.sptk.few check_irr2
+ cmp.eq p9,p0=3,r9
+ ;;
+(p9) br.cond.sptk.few check_irr3
+
+check_irr0:
+ mov r8=cr.irr0
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr0
+ br.few call_start
+
+check_irr1:
+ mov r8=cr.irr1
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr1
+ br.few call_start
+
+check_irr2:
+ mov r8=cr.irr2
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr2
+ br.few call_start
+
+check_irr3:
+ mov r8=cr.irr3
+ ;;
+ and r8=r8,r10
+ ;;
+ cmp.eq p6,p0=0,r8
+(p6) br.cond.sptk.few check_irr3
+ br.few call_start
+
+call_start:
+ mov cr.eoi=r0
+ ;;
+ srlz.d
+ ;;
+ mov r8=cr.ivr
+ ;;
+ srlz.d
+ ;;
+ cmp.eq p0,p6\x15,r8
+(p6) br.cond.sptk.few call_start
+ br.sptk.few b6
+kexec_fake_sal_rendez_end:
+END(kexec_fake_sal_rendez)
+
+ .global relocate_new_kernel_size
+relocate_new_kernel_size:
+ data8 kexec_fake_sal_rendez_end - relocate_new_kernel
+
diff -urNp linux-2.6.16/arch/ia64/kernel/smp.c linux-2.6.16-kexec/arch/ia64/kernel/smp.c
--- linux-2.6.16/arch/ia64/kernel/smp.c 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/arch/ia64/kernel/smp.c 2006-03-27 17:14:04.000000000 -0700
@@ -30,6 +30,7 @@
#include <linux/delay.h>
#include <linux/efi.h>
#include <linux/bitops.h>
+#include <linux/kexec.h>
#include <asm/atomic.h>
#include <asm/current.h>
@@ -84,6 +85,34 @@ unlock_ipi_calllock(void)
spin_unlock_irq(&call_lock);
}
+#ifdef CONFIG_KEXEC
+/*
+ * Stop the CPU and put it in fake SAL rendezvous. This allows CPU to wake
+ * up with IPI from boot processor
+ */
+void
+kexec_stop_this_cpu (void *func)
+{
+ unsigned long pta, impl_va_bits, pal_base;
+
+ /*
+ * Remove this CPU by putting it into fake SAL rendezvous
+ */
+ cpu_clear(smp_processor_id(), cpu_online_map);
+ max_xtp();
+ ia64_eoi();
+
+ /* Disable VHPT */
+ impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
+ pta = POW2(61) - POW2(vmlpt_bits);
+ ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
+
+ local_irq_disable();
+ pal_base = __get_cpu_var(ia64_mca_pal_base);
+ kexec_fake_sal_rendez(func, ap_wakeup_vector, pal_base);
+}
+#endif
+
static void
stop_this_cpu (void)
{
diff -urNp linux-2.6.16/include/asm-ia64/kexec.h linux-2.6.16-kexec/include/asm-ia64/kexec.h
--- linux-2.6.16/include/asm-ia64/kexec.h 1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.16-kexec/include/asm-ia64/kexec.h 2006-03-30 11:46:46.000000000 -0700
@@ -0,0 +1,36 @@
+#ifndef _ASM_IA64_KEXEC_H
+#define _ASM_IA64_KEXEC_H
+
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
+
+#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+
+/* The native architecture */
+#define KEXEC_ARCH KEXEC_ARCH_IA_64
+
+#define MAX_NOTE_BYTES 1024
+
+#define pte_bits 3
+#define vmlpt_bits (impl_va_bits - PAGE_SHIFT + pte_bits)
+#define POW2(n) (1ULL << (n))
+
+DECLARE_PER_CPU(u64, ia64_mca_pal_base);
+
+const extern unsigned int relocate_new_kernel_size;
+volatile extern long kexec_rendez;
+extern void relocate_new_kernel(unsigned long, unsigned long,
+ struct ia64_boot_param *, unsigned long);
+extern void kexec_fake_sal_rendez(void *start, unsigned long wake_up,
+ unsigned long pal_base);
+
+static inline void
+crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs)
+{
+}
+#endif /* _ASM_IA64_KEXEC_H */
diff -urNp linux-2.6.16/include/asm-ia64/machvec_hpzx1.h linux-2.6.16-kexec/include/asm-ia64/machvec_hpzx1.h
--- linux-2.6.16/include/asm-ia64/machvec_hpzx1.h 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/include/asm-ia64/machvec_hpzx1.h 2006-03-27 15:58:38.000000000 -0700
@@ -34,4 +34,6 @@ extern ia64_mv_dma_mapping_error sba_dma
#define platform_dma_supported sba_dma_supported
#define platform_dma_mapping_error sba_dma_mapping_error
+extern void ioc_iova_disable(void);
+
#endif /* _ASM_IA64_MACHVEC_HPZX1_h */
diff -urNp linux-2.6.16/include/asm-ia64/smp.h linux-2.6.16-kexec/include/asm-ia64/smp.h
--- linux-2.6.16/include/asm-ia64/smp.h 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/include/asm-ia64/smp.h 2006-03-27 15:52:51.000000000 -0700
@@ -129,6 +129,9 @@ extern void smp_send_reschedule (int cpu
extern void lock_ipi_calllock(void);
extern void unlock_ipi_calllock(void);
extern void identify_siblings (struct cpuinfo_ia64 *);
+#ifdef CONFIG_KEXEC
+extern void kexec_stop_this_cpu(void *);
+#endif
#else
diff -urNp linux-2.6.16/include/linux/irq.h linux-2.6.16-kexec/include/linux/irq.h
--- linux-2.6.16/include/linux/irq.h 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/include/linux/irq.h 2006-03-27 15:49:27.000000000 -0700
@@ -94,6 +94,7 @@ irq_descp (int irq)
#include <asm/hw_irq.h> /* the arch dependent stuff */
extern int setup_irq(unsigned int irq, struct irqaction * new);
+extern void terminate_irqs(void);
#ifdef CONFIG_GENERIC_HARDIRQS
extern cpumask_t irq_affinity[NR_IRQS];
diff -urNp linux-2.6.16/kernel/irq/manage.c linux-2.6.16-kexec/kernel/irq/manage.c
--- linux-2.6.16/kernel/irq/manage.c 2006-03-19 22:53:29.000000000 -0700
+++ linux-2.6.16-kexec/kernel/irq/manage.c 2006-03-27 17:02:08.000000000 -0700
@@ -377,3 +377,22 @@ int request_irq(unsigned int irq,
EXPORT_SYMBOL(request_irq);
+/*
+ * Terminate any outstanding interrupts
+ */
+void terminate_irqs(void)
+{
+ struct irqaction * action;
+ irq_desc_t *idesc;
+ int i;
+
+ for (i=0; i<NR_IRQS; i++) {
+ idesc = irq_descp(i);
+ action = idesc->action;
+ if (!action)
+ continue;
+ if (idesc->handler->end)
+ idesc->handler->end(i);
+ }
+}
+
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] kexec on ia64
2006-04-03 22:20 ` Khalid Aziz
@ 2006-04-04 4:20 ` Andrew Morton
2006-04-04 6:07 ` [Fastboot] " Michael Ellerman
2006-04-05 16:11 ` Khalid Aziz
2006-04-04 18:13 ` [Fastboot] " Eric W. Biederman
1 sibling, 2 replies; 29+ messages in thread
From: Andrew Morton @ 2006-04-04 4:20 UTC (permalink / raw)
To: Khalid Aziz; +Cc: linux-kernel, fastboot, linux-ia64
Khalid Aziz <khalid_aziz@hp.com> wrote:
>
> Add kexec support on ia64.
>
Neat. How well does it work?
> +#ifdef CONFIG_PCI
> +void machine_shutdown(void)
> +{
> + struct pci_dev *dev;
> + irq_desc_t *idesc;
> + cpumask_t mask = CPU_MASK_NONE;
> +
> + /* Disable all PCI devices */
> + list_for_each_entry(dev, &pci_devices, global_list) {
> + if (!(dev->is_enabled))
> + continue;
> + idesc = irq_descp(dev->irq);
> + if (!idesc)
> + continue;
> + cpu_set(0, mask);
> + disable_irq_nosync(dev->irq);
> + idesc->handler->end(dev->irq);
> + idesc->handler->set_affinity(dev->irq, mask);
> + idesc->action = NULL;
> + pci_disable_device(dev);
> + pci_set_power_state(dev, 0);
> + }
> +}
> +#endif
Ahem.
/* Do NOT directly access these two variables, unless you are arch specific pci
* code, or pci core code. */
extern struct list_head pci_root_buses; /* list of all known PCI buses */
extern struct list_head pci_devices; /* list of all devices */
I think it would be kinder to the API to use pci_find_device(PCI_ANY_ID,
PCI_ANY_ID, ...) here.
> +/*
> + * Do not allocate memory (or fail in any way) in machine_kexec().
> + * We are past the point of no return, committed to rebooting now.
> + */
> +void machine_kexec(struct kimage *image)
> +{
> + unsigned long indirection_page;
> + relocate_new_kernel_t rnk;
> + unsigned long pta, impl_va_bits;
> + void *pal_addr = efi_get_pal_addr();
> + unsigned long code_addr = (unsigned long)page_address(image->control_code_page);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + if (cpu != smp_processor_id())
> + cpu_down(cpu);
> + }
> +#elif CONFIG_SMP
This will generate a CPP warning if CONFIG_SMP is not defined.
#elif defined(CONFIG_SMP)
would be preferred.
> --- linux-2.6.16/kernel/irq/manage.c 2006-03-19 22:53:29.000000000 -0700
> +++ linux-2.6.16-kexec/kernel/irq/manage.c 2006-03-27 17:02:08.000000000 -0700
> @@ -377,3 +377,22 @@ int request_irq(unsigned int irq,
>
> EXPORT_SYMBOL(request_irq);
>
> +/*
> + * Terminate any outstanding interrupts
> + */
> +void terminate_irqs(void)
> +{
> + struct irqaction * action;
> + irq_desc_t *idesc;
> + int i;
> +
> + for (i=0; i<NR_IRQS; i++) {
for (i = 0; i < NR_IRQS; i++) {
> + idesc = irq_descp(i);
> + action = idesc->action;
> + if (!action)
> + continue;
> + if (idesc->handler->end)
> + idesc->handler->end(i);
> + }
> +}
Could we have a bit more description of what this function does, and why we
need it?
Should other kexec-using architectures be using this? If not, why does
ia64 need it?
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [Fastboot] Re: [PATCH] kexec on ia64
2006-04-04 4:20 ` Andrew Morton
@ 2006-04-04 6:07 ` Michael Ellerman
2006-04-05 16:11 ` Khalid Aziz
1 sibling, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2006-04-04 6:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: Khalid Aziz, linux-ia64, fastboot, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]
On Mon, 2006-04-03 at 21:20 -0700, Andrew Morton wrote:
> Khalid Aziz <khalid_aziz@hp.com> wrote:
> > +/*
> > + * Terminate any outstanding interrupts
> > + */
> > +void terminate_irqs(void)
> > +{
> > + struct irqaction * action;
> > + irq_desc_t *idesc;
> > + int i;
> > +
> > + for (i=0; i<NR_IRQS; i++) {
>
> for (i = 0; i < NR_IRQS; i++) {
>
> > + idesc = irq_descp(i);
> > + action = idesc->action;
> > + if (!action)
> > + continue;
> > + if (idesc->handler->end)
> > + idesc->handler->end(i);
> > + }
> > +}
>
> Could we have a bit more description of what this function does, and why we
> need it?
>
> Should other kexec-using architectures be using this? If not, why does
> ia64 need it?
We've been kicking around a patch to do something similar, we also eoi
anything that's outstanding. I can't find the patch just now, but it's
on linuxppc somewhere I think.
cheers
--
Michael Ellerman
IBM OzLabs
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] kexec on ia64
2006-04-04 4:20 ` Andrew Morton
2006-04-04 6:07 ` [Fastboot] " Michael Ellerman
@ 2006-04-05 16:11 ` Khalid Aziz
1 sibling, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2006-04-05 16:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: LKML, Fastboot mailing list, Linux ia64
On Mon, 2006-04-03 at 21:20 -0700, Andrew Morton wrote:
> Khalid Aziz <khalid_aziz@hp.com> wrote:
> >
> > Add kexec support on ia64.
> >
>
> Neat. How well does it work?
Works well on my test machines - HP rx2600 and HP cx2600. Hopefully
others can test it on other machines.
> > +/*
> > + * Terminate any outstanding interrupts
> > + */
> > +void terminate_irqs(void)
> > +{
> > + struct irqaction * action;
> > + irq_desc_t *idesc;
> > + int i;
> > +
> > + for (i=0; i<NR_IRQS; i++) {
>
> for (i = 0; i < NR_IRQS; i++) {
>
> > + idesc = irq_descp(i);
> > + action = idesc->action;
> > + if (!action)
> > + continue;
> > + if (idesc->handler->end)
> > + idesc->handler->end(i);
> > + }
> > +}
>
> Could we have a bit more description of what this function does, and why we
> need it?
>
> Should other kexec-using architectures be using this? If not, why does
> ia64 need it?
>
> Thanks.
This funtion terminates any outstanding interrupts. I found it to be
necessary for devices that use level interrupt. If a device, using level
interrupt, asserted its interrupt as kernel goes into panic, nobody
acknowledges its interrupt. As a result, this interrupt stays asserted
as the new kernel comes up. All drivers in their initialization routine
should clear any pending interrupts, but most do not. As a result, when
driver attempts to use the interrupt, it is unable to since the
interrupt was already asserted and any new interrupts from the device
simply cause interrupt line to continue to be asserted. terminate_irqs()
tries to acknowledge any pending interrupts so the interrupts will be
usable when the new kernel comes up. This is not specific to ia64 and I
would think this problem would show up on other architectures as well. I
happened to find it on ia64 because HP rx2600 uses level interrupts for
SCSI controller.
--
Khalid
==================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
khalid.aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Fastboot] [PATCH] kexec on ia64
2006-04-03 22:20 ` Khalid Aziz
2006-04-04 4:20 ` Andrew Morton
@ 2006-04-04 18:13 ` Eric W. Biederman
2006-04-05 16:34 ` Khalid Aziz
1 sibling, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2006-04-04 18:13 UTC (permalink / raw)
To: Khalid Aziz; +Cc: LKML, Fastboot mailing list, Linux ia64
Khalid Aziz <khalid_aziz@hp.com> writes:
> Add kexec support on ia64.
This looks like a starting place but this patch needs some
more work.
> Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
> ---
>
> diff -urNp linux-2.6.16/arch/ia64/hp/common/sba_iommu.c
> linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c
> --- linux-2.6.16/arch/ia64/hp/common/sba_iommu.c 2006-03-19 22:53:29.000000000
> -0700
> +++ linux-2.6.16-kexec/arch/ia64/hp/common/sba_iommu.c 2006-03-27
> 15:42:47.000000000 -0700
> @@ -1624,6 +1624,28 @@ ioc_iova_init(struct ioc *ioc)
> READ_REG(ioc->ioc_hpa + IOC_IBASE);
> }
>
> +#ifdef CONFIG_KEXEC
> +void
> +ioc_iova_disable(void)
> +{
> + struct ioc *ioc;
> +
> + ioc = ioc_list;
> +
> + while (ioc != NULL) {
> + /* Disable IOVA translation */
> + WRITE_REG(ioc->ibase & 0xfffffffffffffffe, ioc->ioc_hpa + IOC_IBASE);
> + READ_REG(ioc->ioc_hpa + IOC_IBASE);
> +
> + /* Clear I/O TLB of any possible entries */
> + WRITE_REG(ioc->ibase | (get_iovp_order(ioc->iov_size) + iovp_shift),
> ioc->ioc_hpa + IOC_PCOM);
> + READ_REG(ioc->ioc_hpa + IOC_PCOM);
> +
> + ioc = ioc->next;
> + }
> +}
> +#endif
> +
> static void __init
> ioc_resource_init(struct ioc *ioc)
> {
> diff -urNp linux-2.6.16/arch/ia64/Kconfig linux-2.6.16-kexec/arch/ia64/Kconfig
> --- linux-2.6.16/arch/ia64/Kconfig 2006-03-19 22:53:29.000000000 -0700
> +++ linux-2.6.16-kexec/arch/ia64/Kconfig 2006-03-27 15:42:47.000000000 -0700
> @@ -376,6 +376,23 @@ config IA64_PALINFO
> config SGI_SN
> def_bool y if (IA64_SGI_SN2 || IA64_GENERIC)
>
> +config KEXEC
> + bool "kexec system call (EXPERIMENTAL)"
> + depends on EXPERIMENTAL
> + help
> + kexec is a system call that implements the ability to shutdown your
> + current kernel, and to start another kernel. It is like a reboot
> + but it is indepedent of the system firmware. And like a reboot
> + you can start any kernel with it, not just Linux.
> +
> + The name comes from the similiarity to the exec system call.
> +
> + It is an ongoing process to be certain the hardware in a machine
> + is properly shutdown, so do not be surprised if this code does not
> + initially work for you. It may help to enable device hotplugging
> + support. As of this writing the exact hardware interface is
> + strongly in flux, so no good recommendation can be made.
> +
> source "drivers/firmware/Kconfig"
>
> source "fs/Kconfig.binfmt"
> diff -urNp linux-2.6.16/arch/ia64/kernel/crash.c
> linux-2.6.16-kexec/arch/ia64/kernel/crash.c
> --- linux-2.6.16/arch/ia64/kernel/crash.c 1969-12-31 17:00:00.000000000 -0700
> +++ linux-2.6.16-kexec/arch/ia64/kernel/crash.c 2006-03-27 15:49:44.000000000
> -0700
> @@ -0,0 +1,43 @@
> +/*
> + * arch/ia64/kernel/crash.c
> + *
> + * Architecture specific (ia64) functions for kexec based crash dumps.
> + *
> + * Created by: Khalid Aziz <khalid.aziz@hp.com>
> + *
> + * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
> + *
> + */
> +#include <linux/init.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/smp.h>
> +#include <linux/irq.h>
> +#include <linux/reboot.h>
> +#include <linux/kexec.h>
> +#include <linux/irq.h>
> +#include <linux/delay.h>
> +#include <linux/elf.h>
> +#include <linux/elfcore.h>
> +#include <linux/device.h>
> +
> +void
> +machine_crash_shutdown(struct pt_regs *pt)
> +{
> + /* This function is only called after the system
> + * has paniced or is otherwise in a critical state.
> + * The minimum amount of code to allow a kexec'd kernel
> + * to run successfully needs to happen here.
> + *
> + * In practice this means shooting down the other cpus in
> + * an SMP system.
> + */
> + if (in_interrupt()) {
> + terminate_irqs();
> + ia64_eoi();
> + }
> + system_state = SYSTEM_RESTART;
> + device_shutdown();
> + system_state = SYSTEM_BOOTING;
> + machine_shutdown();
> +}
machine_crash_shutdown must not call device_shutdown. That has
been shown to way exceed the minimum necessary to shutdown a system.
I would prefer this to be a noop stub that doesn't work at all than
something like this that does way too much, and makes people think
the code will work.
As for terminate_irqs on x86 we do that on bootup not in the middle
of a crash shutdown. The apics and xapics are close enough you
should be able to do the same on ia64.
You display remarkable faith in a kernel that has paniced.
> +#ifdef CONFIG_PCI
> +void machine_shutdown(void)
> +{
> + struct pci_dev *dev;
> + irq_desc_t *idesc;
> + cpumask_t mask = CPU_MASK_NONE;
> +
> + /* Disable all PCI devices */
> + list_for_each_entry(dev, &pci_devices, global_list) {
> + if (!(dev->is_enabled))
> + continue;
> + idesc = irq_descp(dev->irq);
> + if (!idesc)
> + continue;
> + cpu_set(0, mask);
> + disable_irq_nosync(dev->irq);
> + idesc->handler->end(dev->irq);
> + idesc->handler->set_affinity(dev->irq, mask);
> + idesc->action = NULL;
> + pci_disable_device(dev);
> + pci_set_power_state(dev, 0);
> + }
> +}
> +#endif
This is peculiar but almost sane. We don't do this on x86,
because devices are peculiar enough that no generic sequence works.
What you have above belongs in the shutdown methods of the pci
devices. There is no way to get this right in the general case.
some of the irq disable logic may in fact be sane.
Unless there is a good reason not to machine_shutdown needs
to be called from machine_restart. So the code is routinely
used and tested.
Having machine_shutdown only build when you have PCI present
and then not making KEXEC depend on PCI is wrong.
The #ifdef needs to move inside machine_shutdown.
> +
> +/*
> + * Do not allocate memory (or fail in any way) in machine_kexec().
> + * We are past the point of no return, committed to rebooting now.
> + */
> +void machine_kexec(struct kimage *image)
> +{
> + unsigned long indirection_page;
> + relocate_new_kernel_t rnk;
> + unsigned long pta, impl_va_bits;
> + void *pal_addr = efi_get_pal_addr();
> + unsigned long code_addr = (unsigned
> long)page_address(image->control_code_page);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + if (cpu != smp_processor_id())
> + cpu_down(cpu);
> + }
> +#elif CONFIG_SMP
> + smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
> +#endif
This CPU and HOTPUG_CPU stuff belongs in machine shutdown.
> +
> + ia64_set_itv(1<<16);
> + /* Interrupts aren't acceptable while we reboot */
> + local_irq_disable();
> +
> + /* set kr0 to the appropriate address */
> + set_io_base();
> +
> + /* Disable VHPT */
> + impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
> + pta = POW2(61) - POW2(vmlpt_bits);
> + ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
> +
> +#ifdef CONFIG_IA64_HP_ZX1
> + ioc_iova_disable();
> +#endif
This also looks like it needs to be part of machine_shutdown.
I have no confidence in ioc_iova_disable when the machine is crashing.
Basically anything that touches a pointer is likely to be bad.
> + /* now execute the control code.
> + * We will start by executing the control code linked into the
> + * kernel as opposed to the code we copied in control code buffer * page. When
> this code switches to physical mode, we will start
> + * executing the code in control code buffer page. Reason for
> + * doing this is we start code execution in virtual address space.
> + * If we were to try to execute the newly copied code in virtual
> + * address space, we will need to make an ITLB entry to avoid ITLB
> + * miss. By executing the code linked into kernel, we take advantage
> + * of the ITLB entry already in place for kernel and avoid making
> + * a new entry.
> + */
> + indirection_page = image->head & PAGE_MASK;
> +
> + rnk = (relocate_new_kernel_t)&code_addr;
> + (*rnk)(indirection_page, image->start, ia64_boot_param,
> + GRANULEROUNDDOWN((unsigned long) pal_addr));
> + BUG();
> + for (;;)
> + ;
> +}
Eric
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [Fastboot] [PATCH] kexec on ia64
2006-04-04 18:13 ` [Fastboot] " Eric W. Biederman
@ 2006-04-05 16:34 ` Khalid Aziz
0 siblings, 0 replies; 29+ messages in thread
From: Khalid Aziz @ 2006-04-05 16:34 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: LKML, Fastboot mailing list, Linux ia64
On Tue, 2006-04-04 at 12:13 -0600, Eric W. Biederman wrote:
> Khalid Aziz <khalid_aziz@hp.com> writes:
> > +void
> > +machine_crash_shutdown(struct pt_regs *pt)
> > +{
> > + /* This function is only called after the system
> > + * has paniced or is otherwise in a critical state.
> > + * The minimum amount of code to allow a kexec'd kernel
> > + * to run successfully needs to happen here.
> > + *
> > + * In practice this means shooting down the other cpus in
> > + * an SMP system.
> > + */
> > + if (in_interrupt()) {
> > + terminate_irqs();
> > + ia64_eoi();
> > + }
> > + system_state = SYSTEM_RESTART;
> > + device_shutdown();
> > + system_state = SYSTEM_BOOTING;
> > + machine_shutdown();
> > +}
>
> machine_crash_shutdown must not call device_shutdown. That has
> been shown to way exceed the minimum necessary to shutdown a system.
> I would prefer this to be a noop stub that doesn't work at all than
> something like this that does way too much, and makes people think
> the code will work.
>
> As for terminate_irqs on x86 we do that on bootup not in the middle
> of a crash shutdown. The apics and xapics are close enough you
> should be able to do the same on ia64.
>
> You display remarkable faith in a kernel that has paniced.
I will look into eliminating this as much as possible.
> Having machine_shutdown only build when you have PCI present
> and then not making KEXEC depend on PCI is wrong.
>
> The #ifdef needs to move inside machine_shutdown.
Fixed.
>
> > +
> > +/*
> > + * Do not allocate memory (or fail in any way) in machine_kexec().
> > + * We are past the point of no return, committed to rebooting now.
> > + */
> > +void machine_kexec(struct kimage *image)
> > +{
> > + unsigned long indirection_page;
> > + relocate_new_kernel_t rnk;
> > + unsigned long pta, impl_va_bits;
> > + void *pal_addr = efi_get_pal_addr();
> > + unsigned long code_addr = (unsigned
> > long)page_address(image->control_code_page);
> > +
> > +#ifdef CONFIG_HOTPLUG_CPU
> > + int cpu;
> > +
> > + for_each_online_cpu(cpu) {
> > + if (cpu != smp_processor_id())
> > + cpu_down(cpu);
> > + }
> > +#elif CONFIG_SMP
> > + smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
> > +#endif
>
> This CPU and HOTPUG_CPU stuff belongs in machine shutdown.
Moved to machine_shutdown().
>
> > +
> > + ia64_set_itv(1<<16);
> > + /* Interrupts aren't acceptable while we reboot */
> > + local_irq_disable();
> > +
> > + /* set kr0 to the appropriate address */
> > + set_io_base();
> > +
> > + /* Disable VHPT */
> > + impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
> > + pta = POW2(61) - POW2(vmlpt_bits);
> > + ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
> > +
> > +#ifdef CONFIG_IA64_HP_ZX1
> > + ioc_iova_disable();
> > +#endif
>
> This also looks like it needs to be part of machine_shutdown.
> I have no confidence in ioc_iova_disable when the machine is crashing.
> Basically anything that touches a pointer is likely to be bad.
I have moved above code to machine_shutdown. I would prefer to delay
disabling VHPT as much as possible, but since machine_kexec gets called
soon after machine_shutdown and we should be executing kernel code
strictly at this point which uses pinned TR entries, disabling VHPT
should not have any deleterious effect.
>
> > + /* now execute the control code.
> > + * We will start by executing the control code linked into the
> > + * kernel as opposed to the code we copied in control code buffer * page. When
> > this code switches to physical mode, we will start
> > + * executing the code in control code buffer page. Reason for
> > + * doing this is we start code execution in virtual address space.
> > + * If we were to try to execute the newly copied code in virtual
> > + * address space, we will need to make an ITLB entry to avoid ITLB
> > + * miss. By executing the code linked into kernel, we take advantage
> > + * of the ITLB entry already in place for kernel and avoid making
> > + * a new entry.
> > + */
> > + indirection_page = image->head & PAGE_MASK;
> > +
> > + rnk = (relocate_new_kernel_t)&code_addr;
> > + (*rnk)(indirection_page, image->start, ia64_boot_param,
> > + GRANULEROUNDDOWN((unsigned long) pal_addr));
> > + BUG();
> > + for (;;)
> > + ;
> > +}
>
>
> Eric
Thanks for the review.
--
Khalid
==================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
khalid.aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 29+ messages in thread