* kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
[not found] <1614106428.1991831285470588200.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-09-26 3:11 ` caiqian
2010-09-26 6:44 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: caiqian @ 2010-09-26 3:11 UTC (permalink / raw)
To: Yinghai Lu, H. Peter Anvin; +Cc: linux-next, kexec
# /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc3+kdump.img /boot/vmlinuz-2.6.36-rc3+
BUG: unable to handle kernel paging request at ffff8800dfffe400
IP: [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120
PGD 1a26063 PUD 1fffc067 PMD 1fffd067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu0/crash_notes
CPU 3
Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon pcspkr 8139too 8139cp mii snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg i2c_piix4 i2c_core ext4 mbcache jbd2 floppy sd_mod crc_t10dif virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
Pid: 5671, comm: kexec Not tainted 2.6.35+ #11 /KVM
RIP: 0010:[<ffffffff8113376b>] [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120
RSP: 0018:ffff88064567fe38 EFLAGS: 00010286
RAX: ffff8800df440000 RBX: ffff8800df41d990 RCX: ffff8800df400000
RDX: ffff8800dfff6400 RSI: 0000000000001000 RDI: ffff8800df41d990
RBP: ffff88064567fe58 R08: ffffffff81651f20 R09: ffff8800df40cb38
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88083dcd18e0
R13: ffff88064567ff48 R14: 0000000000001000 R15: 00007f969401b000
FS: 00007f96952e4700(0000) GS:ffff8800df4c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8800dfffe400 CR3: 0000000818130000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kexec (pid: 5671, threadinfo ffff88064567e000, task ffff8808ddba0180)
Stack:
ffff88064567fe68 ffff88083d4f8000 ffff88083dcd18e0 ffff88064567ff48
<0> ffff88064567fe78 ffffffff812ea28b ffff88064567fe78 ffff88083dcd18c0
<0> ffff88064567fe88 ffffffff812e4f0f ffff88064567fee8 ffffffff811a5d11
Call Trace:
[<ffffffff812ea28b>] show_crash_notes+0x2b/0x50
[<ffffffff812e4f0f>] sysdev_show+0x1f/0x30
[<ffffffff811a5d11>] sysfs_read_file+0x111/0x1f0
[<ffffffff8113e7e5>] vfs_read+0xb5/0x1a0
[<ffffffff810b5952>] ? audit_syscall_entry+0x252/0x280
[<ffffffff8113e921>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 00 00 48 8b 05 bf 81 e2 00 8b 35 dd 46 9c 00 48 8b 15 0a 47 9c 00 48 89 fb 48 8b 48 18 8b 05 a5 46 9c 00 c1 e0 0c 48 98 48 01 c8 <48> 03 04 f2 48 39 c7 0f 83 a0 00 00 00 8b 05 aa 46 9c 00 48 03
RIP [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120
RSP <ffff88064567fe38>
CR2: ffff8800dfffe400
---[ end trace 1f847047fea7430c ]---
It was discovered that this commit introduced the regression,
commit a9ce6bc15100023b411f8117e53a016d61889800
Author: Yinghai Lu <yinghai@kernel.org>
Date: Wed Aug 25 13:39:17 2010 -0700
x86, memblock: Replace e820_/_early string with memblock_
1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly
-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8406ed7..8e4a165 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,7 +90,7 @@ extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
#endif /* CONFIG_X86_32 */
extern int add_efi_memmap;
-extern void efi_reserve_early(void);
+extern void efi_memblock_x86_reserve_range(void);
extern void efi_call_phys_prelog(void);
extern void efi_call_phys_epilog(void);
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index fcc3c61..d829e75 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -7,6 +7,7 @@
#include <linux/acpi.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/dmi.h>
#include <linux/cpumask.h>
#include <asm/segment.h>
@@ -125,7 +126,7 @@ void acpi_restore_state_mem(void)
*/
void __init acpi_reserve_wakeup_memory(void)
{
- unsigned long mem;
+ phys_addr_t mem;
if ((&wakeup_code_end - &wakeup_code_start) > WAKEUP_SIZE) {
printk(KERN_ERR
@@ -133,15 +134,15 @@ void __init acpi_reserve_wakeup_memory(void)
return;
}
- mem = find_e820_area(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE);
+ mem = memblock_find_in_range(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE);
- if (mem == -1L) {
+ if (mem == MEMBLOCK_ERROR) {
printk(KERN_ERR "ACPI: Cannot allocate lowmem, S3 disabled.\n");
return;
}
acpi_realmode = (unsigned long) phys_to_virt(mem);
acpi_wakeup_address = mem;
- reserve_early(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP");
+ memblock_x86_reserve_range(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP");
}
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index 3e28401..960f26a 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -26,6 +26,7 @@
#include <linux/nodemask.h>
#include <linux/topology.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/kernel.h>
@@ -88,7 +89,7 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
node_end_pfn[node] =
MB_TO_PAGES(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
- e820_register_active_regions(node, node_start_pfn[node],
+ memblock_x86_register_active_regions(node, node_start_pfn[node],
node_end_pfn[node]);
memory_present(node, node_start_pfn[node], node_end_pfn[node]);
diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c
index c2fa9b8..0fe27d7 100644
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -30,6 +30,7 @@
#include <linux/init.h>
#include <linux/efi.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/spinlock.h>
#include <linux/uaccess.h>
#include <linux/time.h>
@@ -275,7 +276,7 @@ static void __init do_add_efi_memmap(void)
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
}
-void __init efi_reserve_early(void)
+void __init efi_memblock_x86_reserve_range(void)
{
unsigned long pmap;
@@ -290,7 +291,7 @@ void __init efi_reserve_early(void)
boot_params.efi_info.efi_memdesc_size;
memmap.desc_version = boot_params.efi_info.efi_memdesc_version;
memmap.desc_size = boot_params.efi_info.efi_memdesc_size;
- reserve_early(pmap, pmap + memmap.nr_map * memmap.desc_size,
+ memblock_x86_reserve_range(pmap, pmap + memmap.nr_map * memmap.desc_size,
"EFI memmap");
}
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index da60aa8..74e4cf6 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -42,7 +42,7 @@ void __init i386_start_kernel(void)
memblock_x86_reserve_range(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE");
#endif
- reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+ memblock_x86_reserve_range(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
#ifdef CONFIG_BLK_DEV_INITRD
/* Reserve INITRD */
@@ -51,7 +51,7 @@ void __init i386_start_kernel(void)
u64 ramdisk_image = boot_params.hdr.ramdisk_image;
u64 ramdisk_size = boot_params.hdr.ramdisk_size;
u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
- reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
+ memblock_x86_reserve_range(ramdisk_image, ramdisk_end, "RAMDISK");
}
#endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8ee930f..97adf98 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -101,7 +101,7 @@ void __init x86_64_start_reservations(char *real_mode_data)
memblock_init();
- reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+ memblock_x86_reserve_range(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
#ifdef CONFIG_BLK_DEV_INITRD
/* Reserve INITRD */
@@ -110,7 +110,7 @@ void __init x86_64_start_reservations(char *real_mode_data)
unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
unsigned long ramdisk_size = boot_params.hdr.ramdisk_size;
unsigned long ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
- reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
+ memblock_x86_reserve_range(ramdisk_image, ramdisk_end, "RAMDISK");
}
#endif
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbe0aaf..a4f0173 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -302,7 +302,7 @@ static inline void init_gbpages(void)
static void __init reserve_brk(void)
{
if (_brk_end > _brk_start)
- reserve_early(__pa(_brk_start), __pa(_brk_end), "BRK");
+ memblock_x86_reserve_range(__pa(_brk_start), __pa(_brk_end), "BRK");
/* Mark brk area as locked down and no longer taking any
new allocations */
@@ -324,17 +324,16 @@ static void __init relocate_initrd(void)
char *p, *q;
/* We need to move the initrd down into lowmem */
- ramdisk_here = find_e820_area(0, end_of_lowmem, area_size,
+ ramdisk_here = memblock_find_in_range(0, end_of_lowmem, area_size,
PAGE_SIZE);
- if (ramdisk_here == -1ULL)
+ if (ramdisk_here == MEMBLOCK_ERROR)
panic("Cannot find place for new RAMDISK of size %lld\n",
ramdisk_size);
/* Note: this includes all the lowmem currently occupied by
the initrd, we rely on that fact to keep the data intact. */
- reserve_early(ramdisk_here, ramdisk_here + area_size,
- "NEW RAMDISK");
+ memblock_x86_reserve_range(ramdisk_here, ramdisk_here + area_size, "NEW RAMDISK");
initrd_start = ramdisk_here + PAGE_OFFSET;
initrd_end = initrd_start + ramdisk_size;
printk(KERN_INFO "Allocated new RAMDISK: %08llx - %08llx\n",
@@ -390,7 +389,7 @@ static void __init reserve_initrd(void)
initrd_start = 0;
if (ramdisk_size >= (end_of_lowmem>>1)) {
- free_early(ramdisk_image, ramdisk_end);
+ memblock_x86_free_range(ramdisk_image, ramdisk_end);
printk(KERN_ERR "initrd too large to handle, "
"disabling initrd\n");
return;
@@ -413,7 +412,7 @@ static void __init reserve_initrd(void)
relocate_initrd();
- free_early(ramdisk_image, ramdisk_end);
+ memblock_x86_free_range(ramdisk_image, ramdisk_end);
}
#else
static void __init reserve_initrd(void)
@@ -469,7 +468,7 @@ static void __init e820_reserve_setup_data(void)
e820_print_map("reserve setup_data");
}
-static void __init reserve_early_setup_data(void)
+static void __init memblock_x86_reserve_range_setup_data(void)
{
struct setup_data *data;
u64 pa_data;
@@ -481,7 +480,7 @@ static void __init reserve_early_setup_data(void)
while (pa_data) {
data = early_memremap(pa_data, sizeof(*data));
sprintf(buf, "setup data %x", data->type);
- reserve_early(pa_data, pa_data+sizeof(*data)+data->len, buf);
+ memblock_x86_reserve_range(pa_data, pa_data+sizeof(*data)+data->len, buf);
pa_data = data->next;
early_iounmap(data, sizeof(*data));
}
@@ -519,23 +518,23 @@ static void __init reserve_crashkernel(void)
if (crash_base <= 0) {
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = find_e820_area(alignment, ULONG_MAX, crash_size,
+ crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
alignment);
- if (crash_base == -1ULL) {
+ if (crash_base == MEMBLOCK_ERROR) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long start;
- start = find_e820_area(crash_base, ULONG_MAX, crash_size,
+ start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
}
}
- reserve_early(crash_base, crash_base + crash_size, "CRASH KERNEL");
+ memblock_x86_reserve_range(crash_base, crash_base + crash_size, "CRASH KERNEL");
printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
"for crashkernel (System RAM: %ldMB)\n",
@@ -786,7 +785,7 @@ void __init setup_arch(char **cmdline_p)
#endif
4)) {
efi_enabled = 1;
- efi_reserve_early();
+ efi_memblock_x86_reserve_range();
}
#endif
@@ -846,7 +845,7 @@ void __init setup_arch(char **cmdline_p)
vmi_activate();
/* after early param, so could get panic from serial */
- reserve_early_setup_data();
+ memblock_x86_reserve_range_setup_data();
if (acpi_mps_check()) {
#ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..7c2102c 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,7 +1,7 @@
#include <linux/io.h>
+#include <linux/memblock.h>
#include <asm/trampoline.h>
-#include <asm/e820.h>
#if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
#define __trampinit
@@ -16,15 +16,15 @@ unsigned char *__trampinitdata trampoline_base;
void __init reserve_trampoline_memory(void)
{
- unsigned long mem;
+ phys_addr_t mem;
/* Has to be in very low memory so we can execute real-mode AP code. */
- mem = find_e820_area(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE);
- if (mem == -1L)
+ mem = memblock_find_in_range(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE);
+ if (mem == MEMBLOCK_ERROR)
panic("Cannot allocate trampoline\n");
trampoline_base = __va(mem);
- reserve_early(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE");
+ memblock_x86_reserve_range(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE");
}
/*
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index b278535..c0e28a1 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -2,6 +2,7 @@
#include <linux/initrd.h>
#include <linux/ioport.h>
#include <linux/swap.h>
+#include <linux/memblock.h>
#include <asm/cacheflush.h>
#include <asm/e820.h>
@@ -33,6 +34,7 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
int use_gbpages)
{
unsigned long puds, pmds, ptes, tables, start;
+ phys_addr_t base;
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
@@ -75,12 +77,12 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
#else
start = 0x8000;
#endif
- e820_table_start = find_e820_area(start, max_pfn_mapped<<PAGE_SHIFT,
+ base = memblock_find_in_range(start, max_pfn_mapped<<PAGE_SHIFT,
tables, PAGE_SIZE);
- if (e820_table_start == -1UL)
+ if (base == MEMBLOCK_ERROR)
panic("Cannot find space for the kernel page tables");
- e820_table_start >>= PAGE_SHIFT;
+ e820_table_start = base >> PAGE_SHIFT;
e820_table_end = e820_table_start;
e820_table_top = e820_table_start + (tables >> PAGE_SHIFT);
@@ -299,7 +301,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
__flush_tlb_all();
if (!after_bootmem && e820_table_end > e820_table_start)
- reserve_early(e820_table_start << PAGE_SHIFT,
+ memblock_x86_reserve_range(e820_table_start << PAGE_SHIFT,
e820_table_end << PAGE_SHIFT, "PGTABLE");
if (!after_bootmem)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 90e0545..63b09ba 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -25,6 +25,7 @@
#include <linux/pfn.h>
#include <linux/poison.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/proc_fs.h>
#include <linux/memory_hotplug.h>
#include <linux/initrd.h>
@@ -712,14 +713,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
highstart_pfn = highend_pfn = max_pfn;
if (max_pfn > max_low_pfn)
highstart_pfn = max_low_pfn;
- e820_register_active_regions(0, 0, highend_pfn);
+ memblock_x86_register_active_regions(0, 0, highend_pfn);
sparse_memory_present_with_active_regions(0);
printk(KERN_NOTICE "%ldMB HIGHMEM available.\n",
pages_to_mb(highend_pfn - highstart_pfn));
num_physpages = highend_pfn;
high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
#else
- e820_register_active_regions(0, 0, max_low_pfn);
+ memblock_x86_register_active_regions(0, 0, max_low_pfn);
sparse_memory_present_with_active_regions(0);
num_physpages = max_low_pfn;
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
@@ -776,16 +777,16 @@ void __init setup_bootmem_allocator(void)
{
#ifndef CONFIG_NO_BOOTMEM
int nodeid;
- unsigned long bootmap_size, bootmap;
+ phys_addr_t bootmap_size, bootmap;
/*
* Initialize the boot-time allocator (with low memory only):
*/
bootmap_size = bootmem_bootmap_pages(max_low_pfn)<<PAGE_SHIFT;
- bootmap = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,
+ bootmap = memblock_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,
PAGE_SIZE);
- if (bootmap == -1L)
+ if (bootmap == MEMBLOCK_ERROR)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
- reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+ memblock_x86_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP");
#endif
printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
@@ -1069,3 +1070,4 @@ void mark_rodata_ro(void)
#endif
}
#endif
+
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 634fa08..592b236 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -21,6 +21,7 @@
#include <linux/initrd.h>
#include <linux/pagemap.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/proc_fs.h>
#include <linux/pci.h>
#include <linux/pfn.h>
@@ -577,18 +578,18 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
unsigned long bootmap_size, bootmap;
bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT;
- bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size,
+ bootmap = memblock_find_in_range(0, end_pfn<<PAGE_SHIFT, bootmap_size,
PAGE_SIZE);
- if (bootmap == -1L)
+ if (bootmap == MEMBLOCK_ERROR)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
- reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+ memblock_x86_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP");
/* don't touch min_low_pfn */
bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,
0, end_pfn);
- e820_register_active_regions(0, start_pfn, end_pfn);
+ memblock_x86_register_active_regions(0, start_pfn, end_pfn);
free_bootmem_with_active_regions(0, end_pfn);
#else
- e820_register_active_regions(0, start_pfn, end_pfn);
+ memblock_x86_register_active_regions(0, start_pfn, end_pfn);
#endif
}
#endif
diff --git a/arch/x86/mm/k8topology_64.c b/arch/x86/mm/k8topology_64.c
index 970ed57..966de93 100644
--- a/arch/x86/mm/k8topology_64.c
+++ b/arch/x86/mm/k8topology_64.c
@@ -11,6 +11,8 @@
#include <linux/string.h>
#include <linux/module.h>
#include <linux/nodemask.h>
+#include <linux/memblock.h>
+
#include <asm/io.h>
#include <linux/pci_ids.h>
#include <linux/acpi.h>
@@ -222,7 +224,7 @@ int __init k8_scan_nodes(void)
for_each_node_mask(i, node_possible_map) {
int j;
- e820_register_active_regions(i,
+ memblock_x86_register_active_regions(i,
nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
for (j = apicid_base; j < cores + apicid_base; j++)
diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index 18d244f..92faf3a 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -6,8 +6,7 @@
#include <linux/smp.h>
#include <linux/init.h>
#include <linux/pfn.h>
-
-#include <asm/e820.h>
+#include <linux/memblock.h>
static u64 patterns[] __initdata = {
0,
@@ -35,7 +34,7 @@ static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
(unsigned long long) pattern,
(unsigned long long) start_bad,
(unsigned long long) end_bad);
- reserve_early(start_bad, end_bad, "BAD RAM");
+ memblock_x86_reserve_range(start_bad, end_bad, "BAD RAM");
}
static void __init memtest(u64 pattern, u64 start_phys, u64 size)
@@ -74,7 +73,7 @@ static void __init do_one_pass(u64 pattern, u64 start, u64 end)
u64 size = 0;
while (start < end) {
- start = find_e820_area_size(start, &size, 1);
+ start = memblock_x86_find_in_range_size(start, &size, 1);
/* done ? */
if (start >= end)
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 809baaa..ddf9730 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -24,6 +24,7 @@
#include <linux/mm.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/mmzone.h>
#include <linux/highmem.h>
#include <linux/initrd.h>
@@ -120,7 +121,7 @@ int __init get_memcfg_numa_flat(void)
node_start_pfn[0] = 0;
node_end_pfn[0] = max_pfn;
- e820_register_active_regions(0, 0, max_pfn);
+ memblock_x86_register_active_regions(0, 0, max_pfn);
memory_present(0, 0, max_pfn);
node_remap_size[0] = node_memmap_size_bytes(0, 0, max_pfn);
@@ -161,14 +162,14 @@ static void __init allocate_pgdat(int nid)
NODE_DATA(nid) = (pg_data_t *)node_remap_start_vaddr[nid];
else {
unsigned long pgdat_phys;
- pgdat_phys = find_e820_area(min_low_pfn<<PAGE_SHIFT,
+ pgdat_phys = memblock_find_in_range(min_low_pfn<<PAGE_SHIFT,
max_pfn_mapped<<PAGE_SHIFT,
sizeof(pg_data_t),
PAGE_SIZE);
NODE_DATA(nid) = (pg_data_t *)(pfn_to_kaddr(pgdat_phys>>PAGE_SHIFT));
memset(buf, 0, sizeof(buf));
sprintf(buf, "NODE_DATA %d", nid);
- reserve_early(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
+ memblock_x86_reserve_range(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
}
printk(KERN_DEBUG "allocate_pgdat: node %d NODE_DATA %08lx\n",
nid, (unsigned long)NODE_DATA(nid));
@@ -291,15 +292,15 @@ static __init unsigned long calculate_numa_remap_pages(void)
PTRS_PER_PTE);
node_kva_target <<= PAGE_SHIFT;
do {
- node_kva_final = find_e820_area(node_kva_target,
+ node_kva_final = memblock_find_in_range(node_kva_target,
((u64)node_end_pfn[nid])<<PAGE_SHIFT,
((u64)size)<<PAGE_SHIFT,
LARGE_PAGE_BYTES);
node_kva_target -= LARGE_PAGE_BYTES;
- } while (node_kva_final == -1ULL &&
+ } while (node_kva_final == MEMBLOCK_ERROR &&
(node_kva_target>>PAGE_SHIFT) > (node_start_pfn[nid]));
- if (node_kva_final == -1ULL)
+ if (node_kva_final == MEMBLOCK_ERROR)
panic("Can not get kva ram\n");
node_remap_size[nid] = size;
@@ -318,9 +319,9 @@ static __init unsigned long calculate_numa_remap_pages(void)
* but we could have some hole in high memory, and it will only
* check page_is_ram(pfn) && !page_is_reserved_early(pfn) to decide
* to use it as free.
- * So reserve_early here, hope we don't run out of that array
+ * So memblock_x86_reserve_range here, hope we don't run out of that array
*/
- reserve_early(node_kva_final,
+ memblock_x86_reserve_range(node_kva_final,
node_kva_final+(((u64)size)<<PAGE_SHIFT),
"KVA RAM");
@@ -367,14 +368,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
kva_target_pfn = round_down(max_low_pfn - kva_pages, PTRS_PER_PTE);
do {
- kva_start_pfn = find_e820_area(kva_target_pfn<<PAGE_SHIFT,
+ kva_start_pfn = memblock_find_in_range(kva_target_pfn<<PAGE_SHIFT,
max_low_pfn<<PAGE_SHIFT,
kva_pages<<PAGE_SHIFT,
PTRS_PER_PTE<<PAGE_SHIFT) >> PAGE_SHIFT;
kva_target_pfn -= PTRS_PER_PTE;
- } while (kva_start_pfn == -1UL && kva_target_pfn > min_low_pfn);
+ } while (kva_start_pfn == MEMBLOCK_ERROR && kva_target_pfn > min_low_pfn);
- if (kva_start_pfn == -1UL)
+ if (kva_start_pfn == MEMBLOCK_ERROR)
panic("Can not get kva space\n");
printk(KERN_INFO "kva_start_pfn ~ %lx max_low_pfn ~ %lx\n",
@@ -382,7 +383,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
printk(KERN_INFO "max_pfn = %lx\n", max_pfn);
/* avoid clash with initrd */
- reserve_early(kva_start_pfn<<PAGE_SHIFT,
+ memblock_x86_reserve_range(kva_start_pfn<<PAGE_SHIFT,
(kva_start_pfn + kva_pages)<<PAGE_SHIFT,
"KVA PG");
#ifdef CONFIG_HIGHMEM
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 3d54f9f..984b1ff 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -87,16 +87,16 @@ static int __init allocate_cachealigned_memnodemap(void)
addr = 0x8000;
nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
- nodemap_addr = find_e820_area(addr, max_pfn<<PAGE_SHIFT,
+ nodemap_addr = memblock_find_in_range(addr, max_pfn<<PAGE_SHIFT,
nodemap_size, L1_CACHE_BYTES);
- if (nodemap_addr == -1UL) {
+ if (nodemap_addr == MEMBLOCK_ERROR) {
printk(KERN_ERR
"NUMA: Unable to allocate Memory to Node hash map\n");
nodemap_addr = nodemap_size = 0;
return -1;
}
memnodemap = phys_to_virt(nodemap_addr);
- reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
+ memblock_x86_reserve_range(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
nodemap_addr, nodemap_addr + nodemap_size);
@@ -227,7 +227,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
if (node_data[nodeid] == NULL)
return;
nodedata_phys = __pa(node_data[nodeid]);
- reserve_early(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
+ memblock_x86_reserve_range(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
printk(KERN_INFO " NODE_DATA [%016lx - %016lx]\n", nodedata_phys,
nodedata_phys + pgdat_size - 1);
nid = phys_to_nid(nodedata_phys);
@@ -246,7 +246,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
* Find a place for the bootmem map
* nodedata_phys could be on other nodes by alloc_bootmem,
* so need to sure bootmap_start not to be small, otherwise
- * early_node_mem will get that with find_e820_area instead
+ * early_node_mem will get that with memblock_find_in_range instead
* of alloc_bootmem, that could clash with reserved range
*/
bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn);
@@ -258,12 +258,12 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
bootmap = early_node_mem(nodeid, bootmap_start, end,
bootmap_pages<<PAGE_SHIFT, PAGE_SIZE);
if (bootmap == NULL) {
- free_early(nodedata_phys, nodedata_phys + pgdat_size);
+ memblock_x86_free_range(nodedata_phys, nodedata_phys + pgdat_size);
node_data[nodeid] = NULL;
return;
}
bootmap_start = __pa(bootmap);
- reserve_early(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
+ memblock_x86_reserve_range(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
"BOOTMAP");
bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
@@ -417,7 +417,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
nr_nodes = MAX_NUMNODES;
}
- size = (max_addr - addr - e820_hole_size(addr, max_addr)) / nr_nodes;
+ size = (max_addr - addr - memblock_x86_hole_size(addr, max_addr)) / nr_nodes;
/*
* Calculate the number of big nodes that can be allocated as a result
* of consolidating the remainder.
@@ -453,7 +453,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
* non-reserved memory is less than the per-node size.
*/
while (end - physnodes[i].start -
- e820_hole_size(physnodes[i].start, end) < size) {
+ memblock_x86_hole_size(physnodes[i].start, end) < size) {
end += FAKE_NODE_MIN_SIZE;
if (end > physnodes[i].end) {
end = physnodes[i].end;
@@ -467,7 +467,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
* this one must extend to the boundary.
*/
if (end < dma32_end && dma32_end - end -
- e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
+ memblock_x86_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
end = dma32_end;
/*
@@ -476,7 +476,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
* physical node.
*/
if (physnodes[i].end - end -
- e820_hole_size(end, physnodes[i].end) < size)
+ memblock_x86_hole_size(end, physnodes[i].end) < size)
end = physnodes[i].end;
/*
@@ -504,7 +504,7 @@ static u64 __init find_end_of_node(u64 start, u64 max_addr, u64 size)
{
u64 end = start + size;
- while (end - start - e820_hole_size(start, end) < size) {
+ while (end - start - memblock_x86_hole_size(start, end) < size) {
end += FAKE_NODE_MIN_SIZE;
if (end > max_addr) {
end = max_addr;
@@ -533,7 +533,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
* creates a uniform distribution of node sizes across the entire
* machine (but not necessarily over physical nodes).
*/
- min_size = (max_addr - addr - e820_hole_size(addr, max_addr)) /
+ min_size = (max_addr - addr - memblock_x86_hole_size(addr, max_addr)) /
MAX_NUMNODES;
min_size = max(min_size, FAKE_NODE_MIN_SIZE);
if ((min_size & FAKE_NODE_MIN_HASH_MASK) < min_size)
@@ -566,7 +566,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
* this one must extend to the boundary.
*/
if (end < dma32_end && dma32_end - end -
- e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
+ memblock_x86_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
end = dma32_end;
/*
@@ -575,7 +575,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
* physical node.
*/
if (physnodes[i].end - end -
- e820_hole_size(end, physnodes[i].end) < size)
+ memblock_x86_hole_size(end, physnodes[i].end) < size)
end = physnodes[i].end;
/*
@@ -639,7 +639,7 @@ static int __init numa_emulation(unsigned long start_pfn,
*/
remove_all_active_ranges();
for_each_node_mask(i, node_possible_map) {
- e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+ memblock_x86_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
}
@@ -692,7 +692,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
node_set(0, node_possible_map);
for (i = 0; i < nr_cpu_ids; i++)
numa_set_node(i, 0);
- e820_register_active_regions(0, start_pfn, last_pfn);
+ memblock_x86_register_active_regions(0, start_pfn, last_pfn);
setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
}
diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
index 9324f13..a17dffd 100644
--- a/arch/x86/mm/srat_32.c
+++ b/arch/x86/mm/srat_32.c
@@ -25,6 +25,7 @@
*/
#include <linux/mm.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/mmzone.h>
#include <linux/acpi.h>
#include <linux/nodemask.h>
@@ -264,7 +265,7 @@ int __init get_memcfg_from_srat(void)
if (node_read_chunk(chunk->nid, chunk))
continue;
- e820_register_active_regions(chunk->nid, chunk->start_pfn,
+ memblock_x86_register_active_regions(chunk->nid, chunk->start_pfn,
min(chunk->end_pfn, max_pfn));
}
/* for out of order entries in SRAT */
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index f9897f7..7f44eb6 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -16,6 +16,7 @@
#include <linux/module.h>
#include <linux/topology.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/mm.h>
#include <asm/proto.h>
#include <asm/numa.h>
@@ -98,15 +99,15 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
unsigned long phys;
length = slit->header.length;
- phys = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, length,
+ phys = memblock_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, length,
PAGE_SIZE);
- if (phys == -1L)
+ if (phys == MEMBLOCK_ERROR)
panic(" Can not save slit!\n");
acpi_slit = __va(phys);
memcpy(acpi_slit, slit, length);
- reserve_early(phys, phys + length, "ACPI SLIT");
+ memblock_x86_reserve_range(phys, phys + length, "ACPI SLIT");
}
/* Callback for Proximity Domain -> x2APIC mapping */
@@ -324,7 +325,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
pxmram = 0;
}
- e820ram = max_pfn - (e820_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
+ e820ram = max_pfn - (memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
if ((long)(e820ram - pxmram) >= (1<<(20 - PAGE_SHIFT))) {
printk(KERN_ERR
@@ -421,7 +422,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
}
for_each_node_mask(i, nodes_parsed)
- e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+ memblock_x86_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
/* for out of order entries in SRAT */
sort_node_map();
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 914f046..b511f19 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -44,6 +44,7 @@
#include <linux/bug.h>
#include <linux/module.h>
#include <linux/gfp.h>
+#include <linux/memblock.h>
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -1735,7 +1736,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
__xen_write_cr3(true, __pa(pgd));
xen_mc_issue(PARAVIRT_LAZY_CPU);
- reserve_early(__pa(xen_start_info->pt_base),
+ memblock_x86_reserve_range(__pa(xen_start_info->pt_base),
__pa(xen_start_info->pt_base +
xen_start_info->nr_pt_frames * PAGE_SIZE),
"XEN PAGETABLES");
@@ -1773,7 +1774,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
- reserve_early(__pa(xen_start_info->pt_base),
+ memblock_x86_reserve_range(__pa(xen_start_info->pt_base),
__pa(xen_start_info->pt_base +
xen_start_info->nr_pt_frames * PAGE_SIZE),
"XEN PAGETABLES");
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index ad0047f..2ac8f29 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -8,6 +8,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/pm.h>
+#include <linux/memblock.h>
#include <asm/elf.h>
#include <asm/vdso.h>
@@ -61,7 +62,7 @@ char * __init xen_memory_setup(void)
* - xen_start_info
* See comment above "struct start_info" in <xen/interface/xen.h>
*/
- reserve_early(__pa(xen_start_info->mfn_list),
+ memblock_x86_reserve_range(__pa(xen_start_info->mfn_list),
__pa(xen_start_info->pt_base),
"XEN START INFO");
diff --git a/mm/bootmem.c b/mm/bootmem.c
index fda01a2..13b0caa 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -436,7 +436,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
{
#ifdef CONFIG_NO_BOOTMEM
kmemleak_free_part(__va(physaddr), size);
- free_early(physaddr, physaddr + size);
+ memblock_x86_free_range(physaddr, physaddr + size);
#else
unsigned long start, end;
@@ -462,7 +462,7 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
{
#ifdef CONFIG_NO_BOOTMEM
kmemleak_free_part(__va(addr), size);
- free_early(addr, addr + size);
+ memblock_x86_free_range(addr, addr + size);
#else
unsigned long start, end;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-26 3:11 ` caiqian
@ 2010-09-26 6:44 ` Yinghai Lu
2010-09-26 6:55 ` CAI Qian
0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-26 6:44 UTC (permalink / raw)
To: caiqian; +Cc: linux-next, kexec, H. Peter Anvin
On 09/25/2010 08:11 PM, caiqian@redhat.com wrote:
> # /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc3+kdump.img /boot/vmlinuz-2.6.36-rc3+
>
> BUG: unable to handle kernel paging request at ffff8800dfffe400
> IP: [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120
> PGD 1a26063 PUD 1fffc067 PMD 1fffd067 PTE 0
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu0/crash_notes
> CPU 3
> Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon pcspkr 8139too 8139cp mii snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg i2c_piix4 i2c_core ext4 mbcache jbd2 floppy sd_mod crc_t10dif virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
>
> Pid: 5671, comm: kexec Not tainted 2.6.35+ #11 /KVM
> RIP: 0010:[<ffffffff8113376b>] [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120
are you kexec from 2.6.35+ to 2.6.36-rc3+?
Yinghai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-26 6:44 ` Yinghai Lu
@ 2010-09-26 6:55 ` CAI Qian
2010-09-26 6:56 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: CAI Qian @ 2010-09-26 6:55 UTC (permalink / raw)
To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin
----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> On 09/25/2010 08:11 PM, caiqian@redhat.com wrote:
> > # /sbin/kexec -p '--command-line=ro
> root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root
> rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM
> LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
> rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll
> maxcpus=1 reset_devices cgroup_disable=memory '
> --initrd=/boot/initrd-2.6.36-rc3+kdump.img /boot/vmlinuz-2.6.36-rc3+
> >
> > BUG: unable to handle kernel paging request at ffff8800dfffe400
> > IP: [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120
> > PGD 1a26063 PUD 1fffc067 PMD 1fffd067 PTE 0
> > Oops: 0000 [#1] SMP
> > last sysfs file: /sys/devices/system/cpu/cpu0/crash_notes
> > CPU 3
> > Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state
> nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon pcspkr
> 8139too 8139cp mii snd_intel8x0 snd_ac97_codec ac97_bus snd_seq
> snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg
> i2c_piix4 i2c_core ext4 mbcache jbd2 floppy sd_mod crc_t10dif
> virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod
> [last unloaded: scsi_wait_scan]
> >
> > Pid: 5671, comm: kexec Not tainted 2.6.35+ #11 /KVM
> > RIP: 0010:[<ffffffff8113376b>] [<ffffffff8113376b>]
> per_cpu_ptr_to_phys+0x3b/0x120
>
> are you kexec from 2.6.35+ to 2.6.36-rc3+?
No, both kernels were the same version. I am sorry the above logs were misleading that were copy-and-pasted from different kernel versions.
>
> Yinghai
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-26 6:55 ` CAI Qian
@ 2010-09-26 6:56 ` Yinghai Lu
2010-09-26 10:37 ` CAI Qian
0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-26 6:56 UTC (permalink / raw)
To: CAI Qian; +Cc: linux-next, kexec, H. Peter Anvin
On 09/25/2010 11:55 PM, CAI Qian wrote:
>>
>> are you kexec from 2.6.35+ to 2.6.36-rc3+?
> No, both kernels were the same version. I am sorry the above logs were misleading that were copy-and-pasted from different kernel versions.
can you check tip instead of next tree?
Yinghai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-26 6:56 ` Yinghai Lu
@ 2010-09-26 10:37 ` CAI Qian
0 siblings, 0 replies; 25+ messages in thread
From: CAI Qian @ 2010-09-26 10:37 UTC (permalink / raw)
To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin
----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> On 09/25/2010 11:55 PM, CAI Qian wrote:
> >>
> >> are you kexec from 2.6.35+ to 2.6.36-rc3+?
> > No, both kernels were the same version. I am sorry the above logs
> were misleading that were copy-and-pasted from different kernel
> versions.
>
> can you check tip instead of next tree?
I am wondering which patches there do you think would make the regression go away?
>
> Yinghai
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
[not found] <1834151968.1996101285512089968.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-09-26 14:47 ` caiqian
2010-09-26 19:42 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: caiqian @ 2010-09-26 14:47 UTC (permalink / raw)
To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin
----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> On 09/25/2010 11:55 PM, CAI Qian wrote:
> >>
> >> are you kexec from 2.6.35+ to 2.6.36-rc3+?
> > No, both kernels were the same version. I am sorry the above logs
> were misleading that were copy-and-pasted from different kernel
> versions.
>
> can you check tip instead of next tree?
No dice,
# /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img /boot/vmlinuz-2.6.36-rc5-tip+
Could not find a free area of memory of a000 bytes...
locate_hole failed
After reverted the whole memblock commits, it was working again,
7950c407c0288b223a200c1bba8198941599ca37
fb74fb6db91abc3c1ceeb9d2c17b44866a12c63e
f88eff74aa848e58b1ea49768c0bbb874b31357f
27de794365786b4cdc3461ed4e23af2a33f40612
9dc5d569c133819c1ce069ebb1d771c62de32580
4d5cf86ce187c0d3a4cdf233ab0cc6526ccbe01f
88ba088c18457caaf8d2e5f8d36becc731a3d4f6
edbe7d23b4482e7f33179290bcff3b1feae1c5f3
6bcc8176d07f108da3b1af17fb2c0e82c80e948e
b52c17ce854125700c4e19d4427d39bf2504ff63
e82d42be24bd5d75bf6f81045636e6ca95ab55f2
301ff3e88ef9ff4bdb92f36a3e6170fce4c9dd34
72d7c3b33c980843e756681fb4867dc1efd62a76
a9ce6bc15100023b411f8117e53a016d61889800
a587d2daebcd2bc159d4348b6a7b028950a6d803
6f2a75369e7561e800d86927ecd83c970996b21f
If used crashkernel=128M, the /proc/iomem looks like this. It used a huge offset.
00000000-00000fff : reserved
00001000-0009f3ff : System RAM
0009f400-0009ffff : reserved
000f0000-000fffff : reserved
00100000-dfffafff : System RAM
01000000-0149a733 : Kernel code
0149a734-01afc46f : Kernel data
01d9c000-022b18f7 : Kernel bss
dfffb000-dfffffff : reserved
f0000000-f1ffffff : 0000:00:02.0
f2000000-f2000fff : 0000:00:02.0
f2010000-f201ffff : 0000:00:02.0
f2020000-f20200ff : 0000:00:03.0
f2020000-f20200ff : 8139cp
f2030000-f203ffff : 0000:00:03.0
fec00000-fec003ff : IOAPIC 0
fee00000-fee00fff : Local APIC
fffbc000-ffffffff : reserved
100000000-c9fffffff : System RAM
c98000000-c9fffffff : Crash kernel
On kernels that are working, it automatically found the offset at 32M.
00000000-0000ffff : reserved
00010000-0009f3ff : System RAM
0009f400-0009ffff : reserved
000f0000-000fffff : reserved
00100000-dfffafff : System RAM
01000000-014250bf : Kernel code
014250c0-018aca8f : Kernel data
01b1f000-01ff7c07 : Kernel bss
02000000-09ffffff : Crash kernel
dfffb000-dfffffff : reserved
f0000000-f1ffffff : 0000:00:02.0
f2000000-f2000fff : 0000:00:02.0
f2010000-f201ffff : 0000:00:02.0
f2020000-f20200ff : 0000:00:03.0
f2020000-f20200ff : 8139cp
f2030000-f203ffff : 0000:00:03.0
fec00000-fec003ff : IOAPIC 0
fee00000-fee00fff : Local APIC
fffbc000-ffffffff : reserved
100000000-c9fffffff : System RAM
If specified a fixed offset like crashkernel=128M@32M, it failed reservation.
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-00000000dfffb000
0000000000 - 00dfe00000 page 2M
00dfe00000 - 00dfffb000 page 4k
kernel direct mapping tables up to dfffb000 @ 1fffa000-20000000
init_memory_mapping: 0000000100000000-0000000ca0000000
0100000000 - 0ca0000000 page 2M
kernel direct mapping tables up to ca0000000 @ dffc7000-dfffb000
RAMDISK: 37599000 - 37ff0000
crashkernel reservation failed - memory is in use.
After reverted those commits, it looks like this,
init_memory_mapping: 0000000000000000-00000000dfffb000
0000000000 - 00dfe00000 page 2M
00dfe00000 - 00dfffb000 page 4k
kernel direct mapping tables up to dfffb000 @ 16000-1c000
init_memory_mapping: 0000000100000000-0000000ca0000000
0100000000 - 0ca0000000 page 2M
kernel direct mapping tables up to ca0000000 @ 1a000-4e000
RAMDISK: 375c9000 - 37ff0000
Reserving 128MB of memory at 32MB for crashkernel (System RAM: 51712MB)
I can't tell where the memory at 32MB was used, but after reverted those commits I can see those early reservations information,
Subtract (76 early reservations)
#1 [0001000000 - 0001ff7c08] TEXT DATA BSS
#2 [00375c9000 - 0037ff0000] RAMDISK
#3 [0001ff8000 - 0001ff8079] BRK
#4 [000009f400 - 00000f7fb0] BIOS reserved
#5 [00000f7fb0 - 00000f7fc0] MP-table mpf
#6 [00000f822c - 0000100000] BIOS reserved
#7 [00000f7fc0 - 00000f822c] MP-table mpc
#8 [0000010000 - 0000012000] TRAMPOLINE
#9 [0000012000 - 0000016000] ACPI WAKEUP
#10 [0000016000 - 000001a000] PGTABLE
#11 [000001a000 - 0000049000] PGTABLE
#12 [0002000000 - 000a000000] CRASH KERNEL
But after those commits, those information was gone.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-26 14:47 ` caiqian
@ 2010-09-26 19:42 ` Yinghai Lu
0 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2010-09-26 19:42 UTC (permalink / raw)
To: caiqian; +Cc: linux-next, kexec, H. Peter Anvin
On 09/26/2010 07:47 AM, caiqian@redhat.com wrote:
>
> ----- "Yinghai Lu" <yinghai@kernel.org> wrote:
>
>> On 09/25/2010 11:55 PM, CAI Qian wrote:
>>>>
>>>> are you kexec from 2.6.35+ to 2.6.36-rc3+?
>>> No, both kernels were the same version. I am sorry the above logs
>> were misleading that were copy-and-pasted from different kernel
>> versions.
>>
>> can you check tip instead of next tree?
> No dice,
> # /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img /boot/vmlinuz-2.6.36-rc5-tip+
> Could not find a free area of memory of a000 bytes...
> locate_hole failed
looks like you need to update your kexec-tools package.
please run following scripts in first kernel.
cd /sys/firmware/memmap
for dir in * ; do
start=$(cat $dir/start)
end=$(cat $dir/end)
type=$(cat $dir/type)
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
done
also enable kexec debug to see what memmap kexec parse.
>
> After reverted the whole memblock commits, it was working again,
> 7950c407c0288b223a200c1bba8198941599ca37
> fb74fb6db91abc3c1ceeb9d2c17b44866a12c63e
> f88eff74aa848e58b1ea49768c0bbb874b31357f
> 27de794365786b4cdc3461ed4e23af2a33f40612
> 9dc5d569c133819c1ce069ebb1d771c62de32580
> 4d5cf86ce187c0d3a4cdf233ab0cc6526ccbe01f
> 88ba088c18457caaf8d2e5f8d36becc731a3d4f6
> edbe7d23b4482e7f33179290bcff3b1feae1c5f3
> 6bcc8176d07f108da3b1af17fb2c0e82c80e948e
> b52c17ce854125700c4e19d4427d39bf2504ff63
> e82d42be24bd5d75bf6f81045636e6ca95ab55f2
> 301ff3e88ef9ff4bdb92f36a3e6170fce4c9dd34
> 72d7c3b33c980843e756681fb4867dc1efd62a76
> a9ce6bc15100023b411f8117e53a016d61889800
> a587d2daebcd2bc159d4348b6a7b028950a6d803
> 6f2a75369e7561e800d86927ecd83c970996b21f
>
> If used crashkernel=128M, the /proc/iomem looks like this. It used a huge offset.
> 00000000-00000fff : reserved
> 00001000-0009f3ff : System RAM
> 0009f400-0009ffff : reserved
> 000f0000-000fffff : reserved
> 00100000-dfffafff : System RAM
> 01000000-0149a733 : Kernel code
> 0149a734-01afc46f : Kernel data
> 01d9c000-022b18f7 : Kernel bss
> dfffb000-dfffffff : reserved
> f0000000-f1ffffff : 0000:00:02.0
> f2000000-f2000fff : 0000:00:02.0
> f2010000-f201ffff : 0000:00:02.0
> f2020000-f20200ff : 0000:00:03.0
> f2020000-f20200ff : 8139cp
> f2030000-f203ffff : 0000:00:03.0
> fec00000-fec003ff : IOAPIC 0
> fee00000-fee00fff : Local APIC
> fffbc000-ffffffff : reserved
> 100000000-c9fffffff : System RAM
> c98000000-c9fffffff : Crash kernel
>
> On kernels that are working, it automatically found the offset at 32M.
> 00000000-0000ffff : reserved
> 00010000-0009f3ff : System RAM
> 0009f400-0009ffff : reserved
> 000f0000-000fffff : reserved
> 00100000-dfffafff : System RAM
> 01000000-014250bf : Kernel code
> 014250c0-018aca8f : Kernel data
> 01b1f000-01ff7c07 : Kernel bss
> 02000000-09ffffff : Crash kernel
> dfffb000-dfffffff : reserved
> f0000000-f1ffffff : 0000:00:02.0
> f2000000-f2000fff : 0000:00:02.0
> f2010000-f201ffff : 0000:00:02.0
> f2020000-f20200ff : 0000:00:03.0
> f2020000-f20200ff : 8139cp
> f2030000-f203ffff : 0000:00:03.0
> fec00000-fec003ff : IOAPIC 0
> fee00000-fee00fff : Local APIC
> fffbc000-ffffffff : reserved
> 100000000-c9fffffff : System RAM
>
> If specified a fixed offset like crashkernel=128M@32M, it failed reservation.
> initial memory mapped : 0 - 20000000
> init_memory_mapping: 0000000000000000-00000000dfffb000
> 0000000000 - 00dfe00000 page 2M
> 00dfe00000 - 00dfffb000 page 4k
> kernel direct mapping tables up to dfffb000 @ 1fffa000-20000000
> init_memory_mapping: 0000000100000000-0000000ca0000000
> 0100000000 - 0ca0000000 page 2M
> kernel direct mapping tables up to ca0000000 @ dffc7000-dfffb000
> RAMDISK: 37599000 - 37ff0000
> crashkernel reservation failed - memory is in use.
>
> After reverted those commits, it looks like this,
> init_memory_mapping: 0000000000000000-00000000dfffb000
> 0000000000 - 00dfe00000 page 2M
> 00dfe00000 - 00dfffb000 page 4k
> kernel direct mapping tables up to dfffb000 @ 16000-1c000
> init_memory_mapping: 0000000100000000-0000000ca0000000
> 0100000000 - 0ca0000000 page 2M
> kernel direct mapping tables up to ca0000000 @ 1a000-4e000
> RAMDISK: 375c9000 - 37ff0000
> Reserving 128MB of memory at 32MB for crashkernel (System RAM: 51712MB)
yes, default memblock find_range is top_down.
old early_res is from bottom_up.
during the convecting, we do have one x86 find_range from bottom_up, but later
it seems top_down was working on all test cases. ( 32bit etc)
Subject: [PATCH] x86, memblock: Add x86 version of memblock_find_in_range()
Generic version is going from high to low, and it seems it can not find
right area compact enough.
the x86 version will go from goal to limit and just like the way We used
for early_res
use ARCH_FIND_MEMBLOCK_AREA to select from them.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/Kconfig | 8 +++++++
arch/x86/mm/memblock.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
mm/memblock.c | 2 -
3 files changed, 63 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/mm/memblock.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/memblock.c
+++ linux-2.6/arch/x86/mm/memblock.c
@@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st
return end - start - ((u64)ram << PAGE_SHIFT);
}
+
+#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA
+/* Check for already reserved areas */
+static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align)
+{
+ u64 addr = *addrp;
+ bool changed = false;
+ struct memblock_region *r;
+again:
+ for_each_memblock(reserved, r) {
+ if ((addr + size) > r->base && addr < (r->base + r->size)) {
+ addr = round_up(r->base + r->size, align);
+ changed = true;
+ goto again;
+ }
+ }
+
+ if (changed)
+ *addrp = addr;
+
+ return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+{
+ struct memblock_region *r;
+
+ for_each_memblock(memory, r) {
+ u64 ei_start = r->base;
+ u64 ei_last = ei_start + r->size;
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ continue;
+ while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last)
+ ;
+ last = addr + size;
+ if (last > ei_last)
+ continue;
+ if (last > end)
+ continue;
+
+ return addr;
+ }
+
+ return MEMBLOCK_ERROR;
+}
+#endif
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -569,6 +569,14 @@ config PARAVIRT_DEBUG
Enable to debug paravirt_ops internals. Specifically, BUG if
a paravirt_op is missing when it is called.
+config ARCH_MEMBLOCK_FIND_AREA
+ default y
+ bool "Use x86 own memblock_find_in_range()"
+ ---help---
+ Use memblock_find_in_range() version instead of generic version, it get free
+ area up from low.
+ Generic one try to get free area down from limit.
+
config NO_BOOTMEM
def_bool y
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl
/*
* Find a free area with specified alignment in a specific range.
*/
-u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
{
return memblock_find_base(size, align, start, end);
}
>
> I can't tell where the memory at 32MB was used, but after reverted those commits I can see those early reservations information,
> Subtract (76 early reservations)
> #1 [0001000000 - 0001ff7c08] TEXT DATA BSS
> #2 [00375c9000 - 0037ff0000] RAMDISK
> #3 [0001ff8000 - 0001ff8079] BRK
> #4 [000009f400 - 00000f7fb0] BIOS reserved
> #5 [00000f7fb0 - 00000f7fc0] MP-table mpf
> #6 [00000f822c - 0000100000] BIOS reserved
> #7 [00000f7fc0 - 00000f822c] MP-table mpc
> #8 [0000010000 - 0000012000] TRAMPOLINE
> #9 [0000012000 - 0000016000] ACPI WAKEUP
> #10 [0000016000 - 000001a000] PGTABLE
> #11 [000001a000 - 0000049000] PGTABLE
> #12 [0002000000 - 000a000000] CRASH KERNEL
>
> But after those commits, those information was gone.
memblock could merge reserved area, so can not keep tags with it.
I have local patchset that could print those name tags...
please check
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-2.6-yinghai.git memblock
Yinghai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
[not found] <1346740216.2003261285553562018.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-09-27 2:42 ` caiqian
2010-09-27 5:58 ` Yinghai Lu
2010-09-27 6:31 ` Yinghai Lu
0 siblings, 2 replies; 25+ messages in thread
From: caiqian @ 2010-09-27 2:42 UTC (permalink / raw)
To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin
----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> On 09/26/2010 07:47 AM, caiqian@redhat.com wrote:
> >
> > ----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> >
> >> On 09/25/2010 11:55 PM, CAI Qian wrote:
> >>>>
> >>>> are you kexec from 2.6.35+ to 2.6.36-rc3+?
> >>> No, both kernels were the same version. I am sorry the above logs
> >> were misleading that were copy-and-pasted from different kernel
> >> versions.
> >>
> >> can you check tip instead of next tree?
> > No dice,
> > # /sbin/kexec -p '--command-line=ro
> root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root
> rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM
> LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
> rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll
> maxcpus=1 reset_devices cgroup_disable=memory '
> --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img
> /boot/vmlinuz-2.6.36-rc5-tip+
> > Could not find a free area of memory of a000 bytes...
> > locate_hole failed
>
> looks like you need to update your kexec-tools package.
Same results using the latest kexec-tools git version.
>
> please run following scripts in first kernel.
>
> cd /sys/firmware/memmap
> for dir in * ; do
> start=$(cat $dir/start)
> end=$(cat $dir/end)
> type=$(cat $dir/type)
> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
> done
0000000000000000-000000000009f400 (System RAM)
000000000009f400-00000000000a0000 (reserved)
00000000000f0000-0000000000100000 (reserved)
0000000000100000-00000000dfffb000 (System RAM)
00000000dfffb000-00000000e0000000 (reserved)
00000000fffbc000-0000000100000000 (reserved)
0000000100000000-0000000ca0000000 (System RAM)
>
> also enable kexec debug to see what memmap kexec parse.
-d did not help here.
# /sbin/kexec -p -d '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img /boot/vmlinuz-2.6.36-rc5-tip+
Could not find a free area of memory of a000 bytes...
locate_hole failed
>
> >
> > After reverted the whole memblock commits, it was working again,
> > 7950c407c0288b223a200c1bba8198941599ca37
> > fb74fb6db91abc3c1ceeb9d2c17b44866a12c63e
> > f88eff74aa848e58b1ea49768c0bbb874b31357f
> > 27de794365786b4cdc3461ed4e23af2a33f40612
> > 9dc5d569c133819c1ce069ebb1d771c62de32580
> > 4d5cf86ce187c0d3a4cdf233ab0cc6526ccbe01f
> > 88ba088c18457caaf8d2e5f8d36becc731a3d4f6
> > edbe7d23b4482e7f33179290bcff3b1feae1c5f3
> > 6bcc8176d07f108da3b1af17fb2c0e82c80e948e
> > b52c17ce854125700c4e19d4427d39bf2504ff63
> > e82d42be24bd5d75bf6f81045636e6ca95ab55f2
> > 301ff3e88ef9ff4bdb92f36a3e6170fce4c9dd34
> > 72d7c3b33c980843e756681fb4867dc1efd62a76
> > a9ce6bc15100023b411f8117e53a016d61889800
> > a587d2daebcd2bc159d4348b6a7b028950a6d803
> > 6f2a75369e7561e800d86927ecd83c970996b21f
> >
> > If used crashkernel=128M, the /proc/iomem looks like this. It used a
> huge offset.
> > 00000000-00000fff : reserved
> > 00001000-0009f3ff : System RAM
> > 0009f400-0009ffff : reserved
> > 000f0000-000fffff : reserved
> > 00100000-dfffafff : System RAM
> > 01000000-0149a733 : Kernel code
> > 0149a734-01afc46f : Kernel data
> > 01d9c000-022b18f7 : Kernel bss
> > dfffb000-dfffffff : reserved
> > f0000000-f1ffffff : 0000:00:02.0
> > f2000000-f2000fff : 0000:00:02.0
> > f2010000-f201ffff : 0000:00:02.0
> > f2020000-f20200ff : 0000:00:03.0
> > f2020000-f20200ff : 8139cp
> > f2030000-f203ffff : 0000:00:03.0
> > fec00000-fec003ff : IOAPIC 0
> > fee00000-fee00fff : Local APIC
> > fffbc000-ffffffff : reserved
> > 100000000-c9fffffff : System RAM
> > c98000000-c9fffffff : Crash kernel
> >
> > On kernels that are working, it automatically found the offset at
> 32M.
> > 00000000-0000ffff : reserved
> > 00010000-0009f3ff : System RAM
> > 0009f400-0009ffff : reserved
> > 000f0000-000fffff : reserved
> > 00100000-dfffafff : System RAM
> > 01000000-014250bf : Kernel code
> > 014250c0-018aca8f : Kernel data
> > 01b1f000-01ff7c07 : Kernel bss
> > 02000000-09ffffff : Crash kernel
> > dfffb000-dfffffff : reserved
> > f0000000-f1ffffff : 0000:00:02.0
> > f2000000-f2000fff : 0000:00:02.0
> > f2010000-f201ffff : 0000:00:02.0
> > f2020000-f20200ff : 0000:00:03.0
> > f2020000-f20200ff : 8139cp
> > f2030000-f203ffff : 0000:00:03.0
> > fec00000-fec003ff : IOAPIC 0
> > fee00000-fee00fff : Local APIC
> > fffbc000-ffffffff : reserved
> > 100000000-c9fffffff : System RAM
> >
> > If specified a fixed offset like crashkernel=128M@32M, it failed
> reservation.
> > initial memory mapped : 0 - 20000000
> > init_memory_mapping: 0000000000000000-00000000dfffb000
> > 0000000000 - 00dfe00000 page 2M
> > 00dfe00000 - 00dfffb000 page 4k
> > kernel direct mapping tables up to dfffb000 @ 1fffa000-20000000
> > init_memory_mapping: 0000000100000000-0000000ca0000000
> > 0100000000 - 0ca0000000 page 2M
> > kernel direct mapping tables up to ca0000000 @ dffc7000-dfffb000
> > RAMDISK: 37599000 - 37ff0000
> > crashkernel reservation failed - memory is in use.
> >
> > After reverted those commits, it looks like this,
> > init_memory_mapping: 0000000000000000-00000000dfffb000
> > 0000000000 - 00dfe00000 page 2M
> > 00dfe00000 - 00dfffb000 page 4k
> > kernel direct mapping tables up to dfffb000 @ 16000-1c000
> > init_memory_mapping: 0000000100000000-0000000ca0000000
> > 0100000000 - 0ca0000000 page 2M
> > kernel direct mapping tables up to ca0000000 @ 1a000-4e000
> > RAMDISK: 375c9000 - 37ff0000
> > Reserving 128MB of memory at 32MB for crashkernel (System RAM:
> 51712MB)
>
> yes, default memblock find_range is top_down.
>
> old early_res is from bottom_up.
>
> during the convecting, we do have one x86 find_range from bottom_up,
> but later
> it seems top_down was working on all test cases. ( 32bit etc)
>
> Subject: [PATCH] x86, memblock: Add x86 version of
> memblock_find_in_range()
Yes, this patch did help.
Reserving 128MB of memory at 32MB for crashkernel (System RAM: 51712MB)
>
> Generic version is going from high to low, and it seems it can not
> find
> right area compact enough.
>
> the x86 version will go from goal to limit and just like the way We
> used
> for early_res
>
> use ARCH_FIND_MEMBLOCK_AREA to select from them.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> arch/x86/Kconfig | 8 +++++++
> arch/x86/mm/memblock.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++++
> mm/memblock.c | 2 -
> 3 files changed, 63 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/arch/x86/mm/memblock.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/memblock.c
> +++ linux-2.6/arch/x86/mm/memblock.c
> @@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st
>
> return end - start - ((u64)ram << PAGE_SHIFT);
> }
> +
> +#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA
> +/* Check for already reserved areas */
> +static inline bool __init check_with_memblock_reserved(u64 *addrp,
> u64 size, u64 align)
> +{
> + u64 addr = *addrp;
> + bool changed = false;
> + struct memblock_region *r;
> +again:
> + for_each_memblock(reserved, r) {
> + if ((addr + size) > r->base && addr < (r->base + r->size)) {
> + addr = round_up(r->base + r->size, align);
> + changed = true;
> + goto again;
> + }
> + }
> +
> + if (changed)
> + *addrp = addr;
> +
> + return changed;
> +}
> +
> +/*
> + * Find a free area with specified alignment in a specific range.
> + */
> +u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64
> align)
> +{
> + struct memblock_region *r;
> +
> + for_each_memblock(memory, r) {
> + u64 ei_start = r->base;
> + u64 ei_last = ei_start + r->size;
> + u64 addr, last;
> +
> + addr = round_up(ei_start, align);
> + if (addr < start)
> + addr = round_up(start, align);
> + if (addr >= ei_last)
> + continue;
> + while (check_with_memblock_reserved(&addr, size, align) &&
> addr+size <= ei_last)
> + ;
> + last = addr + size;
> + if (last > ei_last)
> + continue;
> + if (last > end)
> + continue;
> +
> + return addr;
> + }
> +
> + return MEMBLOCK_ERROR;
> +}
> +#endif
> Index: linux-2.6/arch/x86/Kconfig
> ===================================================================
> --- linux-2.6.orig/arch/x86/Kconfig
> +++ linux-2.6/arch/x86/Kconfig
> @@ -569,6 +569,14 @@ config PARAVIRT_DEBUG
> Enable to debug paravirt_ops internals. Specifically, BUG if
> a paravirt_op is missing when it is called.
>
> +config ARCH_MEMBLOCK_FIND_AREA
> + default y
> + bool "Use x86 own memblock_find_in_range()"
> + ---help---
> + Use memblock_find_in_range() version instead of generic version,
> it get free
> + area up from low.
> + Generic one try to get free area down from limit.
> +
> config NO_BOOTMEM
> def_bool y
>
> Index: linux-2.6/mm/memblock.c
> ===================================================================
> --- linux-2.6.orig/mm/memblock.c
> +++ linux-2.6/mm/memblock.c
> @@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl
> /*
> * Find a free area with specified alignment in a specific range.
> */
> -u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64
> size, u64 align)
> +u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end,
> u64 size, u64 align)
> {
> return memblock_find_base(size, align, start, end);
> }
>
>
> >
> > I can't tell where the memory at 32MB was used, but after reverted
> those commits I can see those early reservations information,
> > Subtract (76 early reservations)
> > #1 [0001000000 - 0001ff7c08] TEXT DATA BSS
> > #2 [00375c9000 - 0037ff0000] RAMDISK
> > #3 [0001ff8000 - 0001ff8079] BRK
> > #4 [000009f400 - 00000f7fb0] BIOS reserved
> > #5 [00000f7fb0 - 00000f7fc0] MP-table mpf
> > #6 [00000f822c - 0000100000] BIOS reserved
> > #7 [00000f7fc0 - 00000f822c] MP-table mpc
> > #8 [0000010000 - 0000012000] TRAMPOLINE
> > #9 [0000012000 - 0000016000] ACPI WAKEUP
> > #10 [0000016000 - 000001a000] PGTABLE
> > #11 [000001a000 - 0000049000] PGTABLE
> > #12 [0002000000 - 000a000000] CRASH KERNEL
> >
> > But after those commits, those information was gone.
>
> memblock could merge reserved area, so can not keep tags with it.
>
> I have local patchset that could print those name tags...
> please check
Looks like so.
>
>
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-2.6-yinghai.git
> memblock
>
> Yinghai
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 2:42 ` caiqian
@ 2010-09-27 5:58 ` Yinghai Lu
2010-09-27 6:31 ` Yinghai Lu
1 sibling, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2010-09-27 5:58 UTC (permalink / raw)
To: caiqian; +Cc: linux-next, kexec, H. Peter Anvin
Please check this one on top of tip or next.
Thanks
Yinghai
[PATCH] x86, memblock: Fix crashkernel allocation
Cai Qian found that crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
no one use that range
2. always get following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.
Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/setup.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
/* 0 means: find the address automatically */
if (crash_base <= 0) {
+ unsigned long long start = 0;
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
- alignment);
- if (crash_base == MEMBLOCK_ERROR) {
+ crash_base = alignment;
+ while (crash_base < 0xffffffff) {
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, alignment);
+
+ if (start == crash_base)
+ break;
+
+ crash_base += alignment;
+ }
+ if (start != crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long start;
- start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
- 1<<20);
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 2:42 ` caiqian
2010-09-27 5:58 ` Yinghai Lu
@ 2010-09-27 6:31 ` Yinghai Lu
2010-09-27 9:16 ` CAI Qian
1 sibling, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-27 6:31 UTC (permalink / raw)
To: caiqian; +Cc: Ingo Molnar, kexec, linux-kernel@vger.kernel.org, H. Peter Anvin
Please check this one on top of tip or next.
Thanks
Yinghai
[PATCH] x86, memblock: Fix crashkernel allocation
Cai Qian found that crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
no one use that range
2. always get following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.
Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/setup.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
/* 0 means: find the address automatically */
if (crash_base <= 0) {
+ unsigned long long start = 0;
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
- alignment);
- if (crash_base == MEMBLOCK_ERROR) {
+ crash_base = alignment;
+ while (crash_base < 0xffffffff) {
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, alignment);
+
+ if (start == crash_base)
+ break;
+
+ crash_base += alignment;
+ }
+ if (start != crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long start;
- start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
- 1<<20);
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 6:31 ` Yinghai Lu
@ 2010-09-27 9:16 ` CAI Qian
0 siblings, 0 replies; 25+ messages in thread
From: CAI Qian @ 2010-09-27 9:16 UTC (permalink / raw)
To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin
----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> Please check this one on top of tip or next.
This failed for both trees.
[root@localhost linux-next]# patch -Np1 <memblock.patch
patching file arch/x86/kernel/setup.c
Hunk #1 FAILED at 516.
1 out of 1 hunk FAILED -- saving rejects to file arch/x86/kernel/setup.c.rej
>
> Thanks
>
> Yinghai
>
> [PATCH] x86, memblock: Fix crashkernel allocation
>
> Cai Qian found that crashkernel is broken with x86 memblock changes
> 1. crashkernel=128M@32M always reported that range is used, even first
> kernel is small
> no one use that range
> 2. always get following report when using "kexec -p"
> Could not find a free area of memory of a000 bytes...
> locate_hole failed
>
> The root cause is that generic memblock_find_in_range() will try to
> get range from top_down.
> But crashkernel do need from low and specified range.
>
> Let's limit the target range with rash_base + crash_size to make sure
> that
> We get range from bottom.
>
> Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> arch/x86/kernel/setup.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
>
> /* 0 means: find the address automatically */
> if (crash_base <= 0) {
> + unsigned long long start = 0;
> const unsigned long long alignment = 16<<20; /* 16M */
>
> - crash_base = memblock_find_in_range(alignment, ULONG_MAX,
> crash_size,
> - alignment);
> - if (crash_base == MEMBLOCK_ERROR) {
> + crash_base = alignment;
> + while (crash_base < 0xffffffff) {
> + start = memblock_find_in_range(crash_base,
> + crash_base + crash_size, crash_size, alignment);
> +
> + if (start == crash_base)
> + break;
> +
> + crash_base += alignment;
> + }
> + if (start != crash_base) {
> pr_info("crashkernel reservation failed - No suitable area
> found.\n");
> return;
> }
> } else {
> unsigned long long start;
>
> - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
> - 1<<20);
> + start = memblock_find_in_range(crash_base,
> + crash_base + crash_size, crash_size, 1<<20);
> if (start != crash_base) {
> pr_info("crashkernel reservation failed - memory is in use.\n");
> return;
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
[not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-09-27 11:21 ` caiqian
2010-09-27 22:22 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: caiqian @ 2010-09-27 11:21 UTC (permalink / raw)
To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin
----- "CAI Qian" <caiqian@redhat.com> wrote:
> ----- "Yinghai Lu" <yinghai@kernel.org> wrote:
>
> > Please check this one on top of tip or next.
> This failed for both trees.
> [root@localhost linux-next]# patch -Np1 <memblock.patch
> patching file arch/x86/kernel/setup.c
> Hunk #1 FAILED at 516.
> 1 out of 1 hunk FAILED -- saving rejects to file
> arch/x86/kernel/setup.c.rej
After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said,
Warning: Core image elf header is notsane
Kdump: vmcore not initialized
Here is the dmesg from the second kernel,
Initializing cgroup subsys cpuset
Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 000000000009f400 (usable)
BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable)
BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable)
last_pfn = 0xca0000 max_arch_pfn = 0x400000000
NX (Execute Disable) protection: active
user-defined physical RAM map:
user: 0000000000000000 - 00000000000a0000 (usable)
user: 0000000002000000 - 0000000009f5a000 (usable)
DMI 2.4 present.
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
No AGP bridge found
last_pfn = 0x9f5a max_arch_pfn = 0x400000000
MTRR default type: write-back
MTRR fixed ranges enabled:
00000-9FFFF write-back
A0000-BFFFF uncachable
C0000-FFFFF write-protect
MTRR variable ranges enabled:
0 base 00E0000000 mask FFE0000000 uncachable
1 disabled
2 disabled
3 disabled
4 disabled
5 disabled
6 disabled
7 disabled
PAT not supported by CPU.
found SMP MP-table at [ffff8800000f7fb0] f7fb0
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-0000000009f5a000
0000000000 - 0009e00000 page 2M
0009e00000 - 0009f5a000 page 4k
kernel direct mapping tables up to 9f5a000 @ 9f57000-9f5a000
RAMDISK: 09ae5000 - 09f49000
crashkernel reservation failed - No suitable area found.
ACPI: RSDP 00000000000f7f60 00014 (v00 BOCHS )
ACPI: RSDT 00000000dfffd890 00030 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
ACPI: FACP 00000000dffffa30 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
ACPI: DSDT 00000000dfffdb70 01E4B (v01 BXPC BXDSDT 00000001 INTL 20090123)
ACPI: FACS 00000000dffff9c0 00040
ACPI: SSDT 00000000dfffda40 0012F (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
ACPI: APIC 00000000dfffd8c0 0010A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
ACPI: Local APIC address 0xfee00000
No NUMA configuration found
Faking a node at 0000000000000000-0000000009f5a000
Initmem setup node 0 0000000000000000-0000000009f5a000
NODE_DATA [0000000009abe000 - 0000000009ae4fff]
kvm-clock: Using msrs 12 and 11
kvm-clock: cpu 0, msr 0:28c3741, boot clock
[ffffea0000000000-ffffea00003fffff] PMD -> [ffff880008e00000-ffff8800091fffff] on node 0
sizeof(struct page) = 56
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal empty
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0: 0x00000010 -> 0x000000a0
0: 0x00002000 -> 0x00009f5a
On node 0 totalpages: 32746
DMA zone: 56 pages used for memmap
DMA zone: 7 pages reserved
DMA zone: 81 pages, LIFO batch:0
DMA32 zone: 502 pages used for memmap
DMA32 zone: 32100 pages, LIFO batch:7
ACPI: PM-Timer IO Port: 0xb008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x10] enabled)
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x12] enabled)
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x13] enabled)
ACPI: IOAPIC (id[0x14] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 20, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ5 used by override.
ACPI: IRQ9 used by override.
ACPI: IRQ10 used by override.
ACPI: IRQ11 used by override.
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 20 CPUs, 0 hotplug CPUs
nr_irqs_gsi: 40
PM: Registered nosave memory: 00000000000a0000 - 0000000002000000
Allocating PCI resources starting at 9f5a000 (gap: 9f5a000:f60a6000)
Booting paravirtualized kernel on KVM
setup_percpu: NR_CPUS:4096 nr_cpumask_bits:20 nr_cpu_ids:20 nr_node_ids:1
PERCPU: Embedded 29 pages/cpu @ffff880009400000 s86912 r8192 d23680 u262144
pcpu-alloc: s86912 r8192 d23680 u262144 alloc=1*2097152
pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 -- -- -- --
kvm-clock: cpu 0, msr 0:9414741, primary cpu clock
Built 1 zonelists in Node order, mobility grouping on. Total pages: 32181
Policy zone: DMA32
Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Disabling memory control group subsystem
PID hash table entries: 512 (order: 0, 4096 bytes)
Checking aperture...
No AGP bridge found
Memory: 103484k/163176k available (4267k kernel code, 32192k absent, 27500k reserved, 4617k data, 2484k init)
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is disabled.
Verbose stalled-CPUs detection is disabled.
NR_IRQS:262400 nr_irqs:840
Spurious LAPIC timer interrupt on cpu 0
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Detected 1995.358 MHz processor.
Calibrating delay loop (skipped) preset value.. 3990.71 BogoMIPS (lpj=1995358)
pid_max: default: 32768 minimum: 301
Security Framework initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
mce: CPU supports 10 MCE banks
Performance Events: p6 PMU driver.
... version: 0
... bit width: 32
... generic registers: 2
... value mask: 00000000ffffffff
... max period: 000000007fffffff
... fixed-purpose events: 0
... event mask: 0000000000000003
SMP alternatives: switching to UP code
ACPI: Core revision 20100702
Setting APIC routing to physical flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03
Brought up 1 CPUs
Total of 1 processors activated (3990.71 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
IRQ 9: starting IRQFIXUP_POLL
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored)
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
pci 0000:00:01.2: reg 20: [io 0xc020-0xc03f]
pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref]
pci 0000:00:03.0: reg 10: [io 0xc100-0xc1ff]
pci 0000:00:03.0: reg 14: [mem 0xf2020000-0xf20200ff]
pci 0000:00:03.0: reg 30: [mem 0xf2030000-0xf203ffff pref]
pci 0000:00:04.0: reg 10: [io 0xc400-0xc7ff]
pci 0000:00:04.0: reg 14: [io 0xc800-0xc8ff]
pci 0000:00:05.0: reg 10: [io 0xc900-0xc91f]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: pci_cache_line_size set to 64 bytes
reserve RAM buffer: 0000000009f5a000 - 000000000bffffff
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4
NetLabel: unlabeled traffic allowed by default
Switching to clocksource kvm-clock
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 6 devices
ACPI: ACPI bus type pnp unregistered
pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 1, 8192 bytes)
TCP established hash table entries: 4096 (order: 4, 65536 bytes)
TCP bind hash table entries: 4096 (order: 4, 65536 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
UDP hash table entries: 128 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 128 (order: 0, 4096 bytes)
NET: Registered protocol family 1
pci 0000:00:00.0: Limiting direct PCI/PCI transfers
pci 0000:00:01.0: Activating ISA DMA hang workarounds
pci 0000:00:02.0: Boot video device
PCI: CLS 64 bytes, default 64
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 4496k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1285586109.207:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Warning: Core image elf header is notsane
Kdump: vmcore not initialized
>
> >
> > Thanks
> >
> > Yinghai
> >
> > [PATCH] x86, memblock: Fix crashkernel allocation
> >
> > Cai Qian found that crashkernel is broken with x86 memblock changes
> > 1. crashkernel=128M@32M always reported that range is used, even
> first
> > kernel is small
> > no one use that range
> > 2. always get following report when using "kexec -p"
> > Could not find a free area of memory of a000 bytes...
> > locate_hole failed
> >
> > The root cause is that generic memblock_find_in_range() will try to
> > get range from top_down.
> > But crashkernel do need from low and specified range.
> >
> > Let's limit the target range with rash_base + crash_size to make
> sure
> > that
> > We get range from bottom.
> >
> > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >
> > ---
> > arch/x86/kernel/setup.c | 19 ++++++++++++++-----
> > 1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > Index: linux-2.6/arch/x86/kernel/setup.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/setup.c
> > +++ linux-2.6/arch/x86/kernel/setup.c
> > @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
> >
> > /* 0 means: find the address automatically */
> > if (crash_base <= 0) {
> > + unsigned long long start = 0;
> > const unsigned long long alignment = 16<<20; /* 16M */
> >
> > - crash_base = memblock_find_in_range(alignment, ULONG_MAX,
> > crash_size,
> > - alignment);
> > - if (crash_base == MEMBLOCK_ERROR) {
> > + crash_base = alignment;
> > + while (crash_base < 0xffffffff) {
> > + start = memblock_find_in_range(crash_base,
> > + crash_base + crash_size, crash_size, alignment);
> > +
> > + if (start == crash_base)
> > + break;
> > +
> > + crash_base += alignment;
> > + }
> > + if (start != crash_base) {
> > pr_info("crashkernel reservation failed - No suitable area
> > found.\n");
> > return;
> > }
> > } else {
> > unsigned long long start;
> >
> > - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
> > - 1<<20);
> > + start = memblock_find_in_range(crash_base,
> > + crash_base + crash_size, crash_size, 1<<20);
> > if (start != crash_base) {
> > pr_info("crashkernel reservation failed - memory is in use.\n");
> > return;
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 11:21 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" caiqian
@ 2010-09-27 22:22 ` Yinghai Lu
2010-09-27 22:50 ` H. Peter Anvin
0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-27 22:22 UTC (permalink / raw)
To: caiqian; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin
[-- Attachment #1: Type: text/plain, Size: 2632 bytes --]
On 09/27/2010 04:21 AM, caiqian@redhat.com wrote:
>
> ----- "CAI Qian" <caiqian@redhat.com> wrote:
>
>> ----- "Yinghai Lu" <yinghai@kernel.org> wrote:
>>
>>> Please check this one on top of tip or next.
>> This failed for both trees.
>> [root@localhost linux-next]# patch -Np1 <memblock.patch
>> patching file arch/x86/kernel/setup.c
>> Hunk #1 FAILED at 516.
>> 1 out of 1 hunk FAILED -- saving rejects to file
>> arch/x86/kernel/setup.c.rej
> After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said,
>
> Warning: Core image elf header is notsane
> Kdump: vmcore not initialized
>
> Here is the dmesg from the second kernel,
>
> Initializing cgroup subsys cpuset
> Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010
> Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000100 - 000000000009f400 (usable)
> BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable)
> BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved)
> BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable)
> last_pfn = 0xca0000 max_arch_pfn = 0x400000000
> NX (Execute Disable) protection: active
> user-defined physical RAM map:
> user: 0000000000000000 - 00000000000a0000 (usable)
> user: 0000000002000000 - 0000000009f5a000 (usable)
...
> Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> Warning: Core image elf header is notsane
> Kdump: vmcore not initialized
>
>>
it should work on tip..., I tested on RHEL 6.0 beta.
with
/etc/init.d/kdump restart
BTW, second kernel is not supposed to take crashkernel=128M again.
/etc/init.d/kdump scripts remove that while using /proc/cmdline.
please refer
http://people.redhat.com/mingo/tip.git/readme.txt
to get tip/master
and apply attached patch
cat crashkernel_limit.patch | patch -p1
Thanks
Yinghai
[-- Attachment #2: crashkernel_limit.patch --]
[-- Type: text/x-patch, Size: 2230 bytes --]
[PATCH -v2] x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
no one use that range
2. always get following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.
Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.
-v2: don't limit it with 0xffffffff, in case kexec will use bzImage 64bit entry or vmlinux,
and try to allocate huge area for crashkernel.
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/setup.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
/* 0 means: find the address automatically */
if (crash_base <= 0) {
+ unsigned long long start = 0;
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
- alignment);
- if (crash_base == MEMBLOCK_ERROR) {
+ crash_base = alignment;
+ while ((crash_base + crash_size) <= total_mem) {
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, alignment);
+
+ if (start == crash_base)
+ break;
+
+ crash_base += alignment;
+ }
+ if (start != crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long start;
- start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
- 1<<20);
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
[-- Attachment #3: Type: text/plain, Size: 143 bytes --]
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 22:22 ` Yinghai Lu
@ 2010-09-27 22:50 ` H. Peter Anvin
2010-09-27 23:20 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2010-09-27 22:50 UTC (permalink / raw)
To: Yinghai Lu; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel
+ crash_base = alignment;
+ while ((crash_base + crash_size) <= total_mem) {
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, alignment);
+
+ if (start == crash_base)
+ break;
+
+ crash_base += alignment;
+ }
+ if (start != crash_base) {
Open-coded crap violation error!
Seriously, these kinds of open-coded loops are *never* acceptable, since
they are really "let's violate the interface by making it do something
it wasn't intended to do" -- it means we need a new interface.
Alternatively, if we really need the lowest possible address, why do we
need to search?
-hpa
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 22:50 ` H. Peter Anvin
@ 2010-09-27 23:20 ` Yinghai Lu
2010-09-27 23:26 ` H. Peter Anvin
0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-27 23:20 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel
On 09/27/2010 03:50 PM, H. Peter Anvin wrote:
> + crash_base = alignment;
> + while ((crash_base + crash_size) <= total_mem) {
> + start = memblock_find_in_range(crash_base,
> + crash_base + crash_size, crash_size, alignment);
> +
> + if (start == crash_base)
> + break;
> +
> + crash_base += alignment;
> + }
> + if (start != crash_base) {
>
> Open-coded crap violation error!
>
> Seriously, these kinds of open-coded loops are *never* acceptable, since
> they are really "let's violate the interface by making it do something
> it wasn't intended to do" -- it means we need a new interface.
>
> Alternatively, if we really need the lowest possible address, why do we
> need to search?
x86 own version for find_area?
Subject: [PATCH] x86, memblock: Add x86 version of memblock_find_in_range()
Generic version is going from high to low, and it seems it can not find
right area compact enough.
the x86 version will go from goal to limit and just like the way We used
for early_res
use ARCH_FIND_MEMBLOCK_AREA to select from them.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/Kconfig | 8 +++++++
arch/x86/mm/memblock.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
mm/memblock.c | 2 -
3 files changed, 63 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/mm/memblock.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/memblock.c
+++ linux-2.6/arch/x86/mm/memblock.c
@@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st
return end - start - ((u64)ram << PAGE_SHIFT);
}
+
+#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA
+/* Check for already reserved areas */
+static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align)
+{
+ u64 addr = *addrp;
+ bool changed = false;
+ struct memblock_region *r;
+again:
+ for_each_memblock(reserved, r) {
+ if ((addr + size) > r->base && addr < (r->base + r->size)) {
+ addr = round_up(r->base + r->size, align);
+ changed = true;
+ goto again;
+ }
+ }
+
+ if (changed)
+ *addrp = addr;
+
+ return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+{
+ struct memblock_region *r;
+
+ for_each_memblock(memory, r) {
+ u64 ei_start = r->base;
+ u64 ei_last = ei_start + r->size;
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ continue;
+ while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last)
+ ;
+ last = addr + size;
+ if (last > ei_last)
+ continue;
+ if (last > end)
+ continue;
+
+ return addr;
+ }
+
+ return MEMBLOCK_ERROR;
+}
+#endif
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -569,6 +569,14 @@ config PARAVIRT_DEBUG
Enable to debug paravirt_ops internals. Specifically, BUG if
a paravirt_op is missing when it is called.
+config ARCH_MEMBLOCK_FIND_AREA
+ default y
+ bool "Use x86 own memblock_find_in_range()"
+ ---help---
+ Use memblock_find_in_range() version instead of generic version, it get free
+ area up from low.
+ Generic one try to get free area down from limit.
+
config NO_BOOTMEM
def_bool y
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl
/*
* Find a free area with specified alignment in a specific range.
*/
-u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
{
return memblock_find_base(size, align, start, end);
}
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 23:20 ` Yinghai Lu
@ 2010-09-27 23:26 ` H. Peter Anvin
2010-09-27 23:32 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2010-09-27 23:26 UTC (permalink / raw)
To: Yinghai Lu; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel
On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>
> x86 own version for find_area?
>
No, double no.
Same kind of crap: overloading an interface with semantics it shouldn't
have. The right thing is to introduce a new interface with carries the
explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
That interface would have the explicit semantics of returning the lowest
possible address, as opposed to any suitable address (which may change
if policy requirements change.)
The other question is why does kexec need this in the first place? Is
this due to a design bug in kexec or is there some fundamental reason
for this?
-hpa
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 23:26 ` H. Peter Anvin
@ 2010-09-27 23:32 ` Yinghai Lu
2010-09-27 23:34 ` H. Peter Anvin
0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-27 23:32 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel
On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>>
>> x86 own version for find_area?
>>
>
> No, double no.
>
> Same kind of crap: overloading an interface with semantics it shouldn't
> have. The right thing is to introduce a new interface with carries the
> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
>
> That interface would have the explicit semantics of returning the lowest
> possible address, as opposed to any suitable address (which may change
> if policy requirements change.)
>
> The other question is why does kexec need this in the first place? Is
> this due to a design bug in kexec or is there some fundamental reason
> for this?
bzImage is used here. so need range below 4g.
Yinghai
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 23:32 ` Yinghai Lu
@ 2010-09-27 23:34 ` H. Peter Anvin
2010-09-27 23:41 ` Yinghai Lu
0 siblings, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2010-09-27 23:34 UTC (permalink / raw)
To: Yinghai Lu; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel
On 09/27/2010 04:32 PM, Yinghai Lu wrote:
> On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
>> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>>>
>>> x86 own version for find_area?
>>>
>>
>> No, double no.
>>
>> Same kind of crap: overloading an interface with semantics it shouldn't
>> have. The right thing is to introduce a new interface with carries the
>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
>>
>> That interface would have the explicit semantics of returning the lowest
>> possible address, as opposed to any suitable address (which may change
>> if policy requirements change.)
>>
>> The other question is why does kexec need this in the first place? Is
>> this due to a design bug in kexec or is there some fundamental reason
>> for this?
>
> bzImage is used here. so need range below 4g.
>
OK, so why don't you cap the range to 4 GiB and then pass that down to
the existing interface? That's different from "lowest possible address".
-hpa
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 23:34 ` H. Peter Anvin
@ 2010-09-27 23:41 ` Yinghai Lu
2010-09-28 0:53 ` Vivek Goyal
0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-27 23:41 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel
On 09/27/2010 04:34 PM, H. Peter Anvin wrote:
> On 09/27/2010 04:32 PM, Yinghai Lu wrote:
>> On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
>>> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>>>>
>>>> x86 own version for find_area?
>>>>
>>>
>>> No, double no.
>>>
>>> Same kind of crap: overloading an interface with semantics it shouldn't
>>> have. The right thing is to introduce a new interface with carries the
>>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
>>>
>>> That interface would have the explicit semantics of returning the lowest
>>> possible address, as opposed to any suitable address (which may change
>>> if policy requirements change.)
>>>
>>> The other question is why does kexec need this in the first place? Is
>>> this due to a design bug in kexec or is there some fundamental reason
>>> for this?
>>
>> bzImage is used here. so need range below 4g.
>>
>
> OK, so why don't you cap the range to 4 GiB and then pass that down to
> the existing interface? That's different from "lowest possible address".
but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly.
and crashkernel=4096M, we could get failure again.
maybe something like this, will give it a try, hope kexec doesn't have other limitation.
[PATCH -v3] x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
no one use that range
2. always get following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.
Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.
-v3: don't use loop for find low one
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/setup.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v
if (crash_base <= 0) {
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
- alignment);
+ crash_base = memblock_find_in_range(alignment, 0xffffffff,
+ crash_size, alignment);
+
if (crash_base == MEMBLOCK_ERROR) {
- pr_info("crashkernel reservation failed - No suitable area found.\n");
- return;
+ crash_base = memblock_find_in_range(alignment,
+ ULONG_MAX, crash_size, alignment);
+
+ if (crash_base == MEMBLOCK_ERROR) {
+ pr_info("crashkernel reservation failed - No suitable area found.\n");
+ return;
+ }
}
} else {
unsigned long long start;
- start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
- 1<<20);
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-27 23:41 ` Yinghai Lu
@ 2010-09-28 0:53 ` Vivek Goyal
2010-09-28 2:41 ` Yinghai Lu
2010-09-28 3:46 ` H. Peter Anvin
0 siblings, 2 replies; 25+ messages in thread
From: Vivek Goyal @ 2010-09-28 0:53 UTC (permalink / raw)
To: Yinghai Lu; +Cc: linux-kernel, Ingo Molnar, kexec, caiqian, H. Peter Anvin
On Mon, Sep 27, 2010 at 04:41:31PM -0700, Yinghai Lu wrote:
> On 09/27/2010 04:34 PM, H. Peter Anvin wrote:
> > On 09/27/2010 04:32 PM, Yinghai Lu wrote:
> >> On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
> >>> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
> >>>>
> >>>> x86 own version for find_area?
> >>>>
> >>>
> >>> No, double no.
> >>>
> >>> Same kind of crap: overloading an interface with semantics it shouldn't
> >>> have. The right thing is to introduce a new interface with carries the
> >>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
> >>>
> >>> That interface would have the explicit semantics of returning the lowest
> >>> possible address, as opposed to any suitable address (which may change
> >>> if policy requirements change.)
> >>>
> >>> The other question is why does kexec need this in the first place? Is
> >>> this due to a design bug in kexec or is there some fundamental reason
> >>> for this?
> >>
> >> bzImage is used here. so need range below 4g.
> >>
> >
> > OK, so why don't you cap the range to 4 GiB and then pass that down to
> > the existing interface? That's different from "lowest possible address".
>
> but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly.
> and crashkernel=4096M, we could get failure again.
>
> maybe something like this, will give it a try, hope kexec doesn't have other limitation.
>
> [PATCH -v3] x86, memblock: Fix crashkernel allocation
>
> Cai Qian found crashkernel is broken with x86 memblock changes
> 1. crashkernel=128M@32M always reported that range is used, even first kernel is small
> no one use that range
> 2. always get following report when using "kexec -p"
> Could not find a free area of memory of a000 bytes...
> locate_hole failed
>
> The root cause is that generic memblock_find_in_range() will try to get range from top_down.
> But crashkernel do need from low and specified range.
>
> Let's limit the target range with rash_base + crash_size to make sure that
> We get range from bottom.
>
> -v3: don't use loop for find low one
>
> Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> arch/x86/kernel/setup.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v
> if (crash_base <= 0) {
> const unsigned long long alignment = 16<<20; /* 16M */
>
> - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
> - alignment);
> + crash_base = memblock_find_in_range(alignment, 0xffffffff,
> + crash_size, alignment);
> +
Actually, hardcoding the upper limit to 4G is probably not the best idea.
Kexec loads the the relocatable binary (purgatory) and I remember that
one of the generated relocation type was signed 32 bit and allowed max value
to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
I liked HPA's other idea better of introducing memblock_find_in_range_lowest()
so that we search bottom up and not rely on a specific upper limit.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-28 0:53 ` Vivek Goyal
@ 2010-09-28 2:41 ` Yinghai Lu
2010-09-28 3:46 ` H. Peter Anvin
1 sibling, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2010-09-28 2:41 UTC (permalink / raw)
To: Vivek Goyal; +Cc: linux-kernel, Ingo Molnar, kexec, caiqian, H. Peter Anvin
On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> Actually, hardcoding the upper limit to 4G is probably not the best idea.
> Kexec loads the the relocatable binary (purgatory) and I remember that
> one of the generated relocation type was signed 32 bit and allowed max value
> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
also kexec want bzImage under 37ffffff.
>
> I liked HPA's other idea better of introducing memblock_find_in_range_lowest()
> so that we search bottom up and not rely on a specific upper limit.
>
Please check.
[PATCH -v4] x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
no one use that range
2. always get following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.
Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.
-v4: add find_memblock_find_in_range_lowest() according to hpa and vivik.
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/include/asm/memblock.h | 2 +
arch/x86/kernel/setup.c | 8 +++---
arch/x86/mm/memblock.c | 52 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 58 insertions(+), 4 deletions(-)
Index: linux-2.6/arch/x86/mm/memblock.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/memblock.c
+++ linux-2.6/arch/x86/mm/memblock.c
@@ -352,3 +352,55 @@ u64 __init memblock_x86_hole_size(u64 st
return end - start - ((u64)ram << PAGE_SHIFT);
}
+
+/* Check for already reserved areas */
+static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align)
+{
+ u64 addr = *addrp;
+ bool changed = false;
+ struct memblock_region *r;
+again:
+ for_each_memblock(reserved, r) {
+ if ((addr + size) > r->base && addr < (r->base + r->size)) {
+ addr = round_up(r->base + r->size, align);
+ changed = true;
+ goto again;
+ }
+ }
+
+ if (changed)
+ *addrp = addr;
+
+ return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range from bottom up
+ */
+u64 __init memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align)
+{
+ struct memblock_region *r;
+
+ for_each_memblock(memory, r) {
+ u64 ei_start = r->base;
+ u64 ei_last = ei_start + r->size;
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ continue;
+ while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last)
+ ;
+ last = addr + size;
+ if (last > ei_last)
+ continue;
+ if (last > end)
+ continue;
+
+ return addr;
+ }
+
+ return MEMBLOCK_ERROR;
+}
Index: linux-2.6/arch/x86/include/asm/memblock.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/memblock.h
+++ linux-2.6/arch/x86/include/asm/memblock.h
@@ -18,4 +18,6 @@ u64 memblock_x86_find_in_range_node(int
u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit);
u64 memblock_x86_memory_in_range(u64 addr, u64 limit);
+u64 memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align);
+
#endif
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -518,8 +518,8 @@ static void __init reserve_crashkernel(v
if (crash_base <= 0) {
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
- alignment);
+ crash_base = memblock_find_in_range_lowest(alignment,
+ ULONG_MAX, crash_size, alignment);
if (crash_base == MEMBLOCK_ERROR) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
@@ -527,8 +527,8 @@ static void __init reserve_crashkernel(v
} else {
unsigned long long start;
- start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
- 1<<20);
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-28 0:53 ` Vivek Goyal
2010-09-28 2:41 ` Yinghai Lu
@ 2010-09-28 3:46 ` H. Peter Anvin
2010-09-28 7:14 ` Yinghai Lu
2010-09-28 13:54 ` Vivek Goyal
1 sibling, 2 replies; 25+ messages in thread
From: H. Peter Anvin @ 2010-09-28 3:46 UTC (permalink / raw)
To: Vivek Goyal; +Cc: kexec, Ingo Molnar, Yinghai Lu, caiqian, linux-kernel
On 09/27/2010 05:53 PM, Vivek Goyal wrote:
>
> Actually, hardcoding the upper limit to 4G is probably not the best idea.
> Kexec loads the the relocatable binary (purgatory) and I remember that
> one of the generated relocation type was signed 32 bit and allowed max value
> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
>
> I liked HPA's other idea better of introducing memblock_find_in_range_lowest()
> so that we search bottom up and not rely on a specific upper limit.
>
No, it's just another crappy hack which is broken in the same way. It's
better than open-coding, but it's still a hack.
The Right Thing[TM] to do is for kexec to communicate the topmost
address it wants to this code, so it has both the upper and the lower
boundaries available to it instead of just one.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-28 3:46 ` H. Peter Anvin
@ 2010-09-28 7:14 ` Yinghai Lu
2010-09-28 14:01 ` Vivek Goyal
2010-09-28 13:54 ` Vivek Goyal
1 sibling, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2010-09-28 7:14 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, Vivek Goyal, linux-kernel
On 09/27/2010 08:46 PM, H. Peter Anvin wrote:
> On 09/27/2010 05:53 PM, Vivek Goyal wrote:
>>
>> Actually, hardcoding the upper limit to 4G is probably not the best idea.
>> Kexec loads the the relocatable binary (purgatory) and I remember that
>> one of the generated relocation type was signed 32 bit and allowed max value
>> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
>>
>> I liked HPA's other idea better of introducing memblock_find_in_range_lowest()
>> so that we search bottom up and not rely on a specific upper limit.
>>
>
> No, it's just another crappy hack which is broken in the same way. It's
> better than open-coding, but it's still a hack.
>
> The Right Thing[TM] to do is for kexec to communicate the topmost
> address it wants to this code, so it has both the upper and the lower
> boundaries available to it instead of just one.
hope you are happy with this one.
[PATCH -v5] x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
no one use that range
2. always get following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.
Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.
-v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge.
also second try for vmlinux or new kexec tools will use bzImage 64bit entry
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/setup.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -501,6 +501,7 @@ static inline unsigned long long get_tot
return total << PAGE_SHIFT;
}
+#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF
static void __init reserve_crashkernel(void)
{
unsigned long long total_mem;
@@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v
if (crash_base <= 0) {
const unsigned long long alignment = 16<<20; /* 16M */
- crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
- alignment);
+ /*
+ * Assume half crash_size is for bzImage
+ * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX
+ */
+ crash_base = memblock_find_in_range(alignment,
+ DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2,
+ crash_size, alignment);
+
if (crash_base == MEMBLOCK_ERROR) {
- pr_info("crashkernel reservation failed - No suitable area found.\n");
- return;
+ crash_base = memblock_find_in_range(alignment,
+ ULONG_MAX, crash_size, alignment);
+
+ if (crash_base == MEMBLOCK_ERROR) {
+ pr_info("crashkernel reservation failed - No suitable area found.\n");
+ return;
+ }
}
} else {
unsigned long long start;
- start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
- 1<<20);
+ start = memblock_find_in_range(crash_base,
+ crash_base + crash_size, crash_size, 1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-28 3:46 ` H. Peter Anvin
2010-09-28 7:14 ` Yinghai Lu
@ 2010-09-28 13:54 ` Vivek Goyal
1 sibling, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2010-09-28 13:54 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: kexec, Ingo Molnar, Yinghai Lu, caiqian, linux-kernel
On Mon, Sep 27, 2010 at 08:46:42PM -0700, H. Peter Anvin wrote:
> On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> >
> > Actually, hardcoding the upper limit to 4G is probably not the best idea.
> > Kexec loads the the relocatable binary (purgatory) and I remember that
> > one of the generated relocation type was signed 32 bit and allowed max value
> > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
> >
> > I liked HPA's other idea better of introducing memblock_find_in_range_lowest()
> > so that we search bottom up and not rely on a specific upper limit.
> >
>
> No, it's just another crappy hack which is broken in the same way. It's
> better than open-coding, but it's still a hack.
>
> The Right Thing[TM] to do is for kexec to communicate the topmost
> address it wants to this code, so it has both the upper and the lower
> boundaries available to it instead of just one.
>
Being able to specify the upper limit would be the best thing so that
kernel does not have to make any assumptions or hardcode anything.
Question is how to determine the upper limit.
- The upper limit will depend on what is being loaded in reserved region.
Reserving memory using crashkernel= is a boot time optin and that point
of time kexec has not even run. So we don't know what is the upper
limit.
Now we can do extra reboot to make it happen. Boot first kernel without
reserving any memory. Introduce an option in kexec which tells user what
are the segments kexec would like to load (for a given binary) and what
are there upper memory limits and then user goes ahead modifies the
command line and reboots the kernel back.
This all sounds not so clean. Especially upper limit might change based
on binary being loaded and a user might have to perform a reboot again.
So to me trying to get lowest memory available possible for crashkernel
reservations is not that a bad idea. It is certainly better than making
hardcoded assumptions about the upper limit.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
2010-09-28 7:14 ` Yinghai Lu
@ 2010-09-28 14:01 ` Vivek Goyal
0 siblings, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2010-09-28 14:01 UTC (permalink / raw)
To: Yinghai Lu; +Cc: linux-kernel, Ingo Molnar, kexec, caiqian, H. Peter Anvin
On Tue, Sep 28, 2010 at 12:14:31AM -0700, Yinghai Lu wrote:
> On 09/27/2010 08:46 PM, H. Peter Anvin wrote:
> > On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> >>
> >> Actually, hardcoding the upper limit to 4G is probably not the best idea.
> >> Kexec loads the the relocatable binary (purgatory) and I remember that
> >> one of the generated relocation type was signed 32 bit and allowed max value
> >> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
> >>
> >> I liked HPA's other idea better of introducing memblock_find_in_range_lowest()
> >> so that we search bottom up and not rely on a specific upper limit.
> >>
> >
> > No, it's just another crappy hack which is broken in the same way. It's
> > better than open-coding, but it's still a hack.
> >
> > The Right Thing[TM] to do is for kexec to communicate the topmost
> > address it wants to this code, so it has both the upper and the lower
> > boundaries available to it instead of just one.
>
> hope you are happy with this one.
>
> [PATCH -v5] x86, memblock: Fix crashkernel allocation
>
> Cai Qian found crashkernel is broken with x86 memblock changes
> 1. crashkernel=128M@32M always reported that range is used, even first kernel is small
> no one use that range
> 2. always get following report when using "kexec -p"
> Could not find a free area of memory of a000 bytes...
> locate_hole failed
>
> The root cause is that generic memblock_find_in_range() will try to get range from top_down.
> But crashkernel do need from low and specified range.
>
> Let's limit the target range with rash_base + crash_size to make sure that
> We get range from bottom.
>
> -v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge.
> also second try for vmlinux or new kexec tools will use bzImage 64bit entry
>
> Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> arch/x86/kernel/setup.c | 24 ++++++++++++++++++------
> 1 file changed, 18 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -501,6 +501,7 @@ static inline unsigned long long get_tot
> return total << PAGE_SHIFT;
> }
>
> +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF
> static void __init reserve_crashkernel(void)
> {
> unsigned long long total_mem;
> @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v
> if (crash_base <= 0) {
> const unsigned long long alignment = 16<<20; /* 16M */
>
> - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
> - alignment);
> + /*
> + * Assume half crash_size is for bzImage
> + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX
> + */
> + crash_base = memblock_find_in_range(alignment,
> + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2,
> + crash_size, alignment);
> +
IMHO, these kind of hardcodings are worse than finding the lowest possible
address. It is assuming that kexec is going to load a bzImage.
So we have following three options sorted from best to worst.
- Specify upper limit in "crashkernel=" command line syntax
- Find the lowest possible address for crashkernel reservations
- Hardcode upper limit based on certain factors.
Because upper limit depends on image being loaded and can also vary as
kexec-tools changes, knowing it for sure will require extra reboot. It
also make command line syntax more complicated as we need to introduce
another field to speciy upper limit. Especially for the following case.
crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
So personally I think we can stick to second best option and that is
finding the lowest possible memory area.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-09-28 14:01 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 11:21 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" caiqian
2010-09-27 22:22 ` Yinghai Lu
2010-09-27 22:50 ` H. Peter Anvin
2010-09-27 23:20 ` Yinghai Lu
2010-09-27 23:26 ` H. Peter Anvin
2010-09-27 23:32 ` Yinghai Lu
2010-09-27 23:34 ` H. Peter Anvin
2010-09-27 23:41 ` Yinghai Lu
2010-09-28 0:53 ` Vivek Goyal
2010-09-28 2:41 ` Yinghai Lu
2010-09-28 3:46 ` H. Peter Anvin
2010-09-28 7:14 ` Yinghai Lu
2010-09-28 14:01 ` Vivek Goyal
2010-09-28 13:54 ` Vivek Goyal
[not found] <1346740216.2003261285553562018.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 2:42 ` caiqian
2010-09-27 5:58 ` Yinghai Lu
2010-09-27 6:31 ` Yinghai Lu
2010-09-27 9:16 ` CAI Qian
[not found] <1834151968.1996101285512089968.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-26 14:47 ` caiqian
2010-09-26 19:42 ` Yinghai Lu
[not found] <1614106428.1991831285470588200.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-26 3:11 ` caiqian
2010-09-26 6:44 ` Yinghai Lu
2010-09-26 6:55 ` CAI Qian
2010-09-26 6:56 ` Yinghai Lu
2010-09-26 10:37 ` CAI Qian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox