Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
       [not found] <870873343.2003871285555329846.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-09-27  6:31 ` Yinghai Lu
  2010-09-27  9:16   ` CAI Qian
  0 siblings, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2010-09-27  6:31 UTC (permalink / raw)
  To: caiqian; +Cc: kexec, H. Peter Anvin, Ingo Molnar, linux-kernel@vger.kernel.org

Please check this one on top of tip or next.

Thanks

Yinghai

[PATCH] x86, memblock: Fix crashkernel allocation

Cai Qian found that crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
   no one use that range
2. always get following report when using "kexec -p"
	Could not find a free area of memory of a000 bytes...
	locate_hole failed

The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.

Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
 
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
+		unsigned long long start = 0;
 		const unsigned long long alignment = 16<<20;	/* 16M */
 
-		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
-				 alignment);
-		if (crash_base == MEMBLOCK_ERROR) {
+		crash_base = alignment;
+		while (crash_base < 0xffffffff) {
+			start = memblock_find_in_range(crash_base,
+				crash_base + crash_size, crash_size, alignment);
+
+			if (start == crash_base)
+				break;
+
+			crash_base += alignment;
+		}
+		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
 	} else {
 		unsigned long long start;
 
-		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
-				 1<<20);
+		start = memblock_find_in_range(crash_base,
+				 crash_base + crash_size, crash_size, 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27  6:31 ` Yinghai Lu
@ 2010-09-27  9:16   ` CAI Qian
  0 siblings, 0 replies; 16+ messages in thread
From: CAI Qian @ 2010-09-27  9:16 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin


----- "Yinghai Lu" <yinghai@kernel.org> wrote:

> Please check this one on top of tip or next.
This failed for both trees.
[root@localhost linux-next]# patch -Np1 <memblock.patch
patching file arch/x86/kernel/setup.c
Hunk #1 FAILED at 516.
1 out of 1 hunk FAILED -- saving rejects to file arch/x86/kernel/setup.c.rej

> 
> Thanks
> 
> Yinghai
> 
> [PATCH] x86, memblock: Fix crashkernel allocation
> 
> Cai Qian found that crashkernel is broken with x86 memblock changes
> 1. crashkernel=128M@32M always reported that range is used, even first
> kernel is small
>    no one use that range
> 2. always get following report when using "kexec -p"
> 	Could not find a free area of memory of a000 bytes...
> 	locate_hole failed
> 
> The root cause is that generic memblock_find_in_range() will try to
> get range from top_down.
> But crashkernel do need from low and specified range.
> 
> Let's limit the target range with rash_base + crash_size to make sure
> that
> We get range from bottom.
> 
> Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/kernel/setup.c |   19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
>  
>  	/* 0 means: find the address automatically */
>  	if (crash_base <= 0) {
> +		unsigned long long start = 0;
>  		const unsigned long long alignment = 16<<20;	/* 16M */
>  
> -		crash_base = memblock_find_in_range(alignment, ULONG_MAX,
> crash_size,
> -				 alignment);
> -		if (crash_base == MEMBLOCK_ERROR) {
> +		crash_base = alignment;
> +		while (crash_base < 0xffffffff) {
> +			start = memblock_find_in_range(crash_base,
> +				crash_base + crash_size, crash_size, alignment);
> +
> +			if (start == crash_base)
> +				break;
> +
> +			crash_base += alignment;
> +		}
> +		if (start != crash_base) {
>  			pr_info("crashkernel reservation failed - No suitable area
> found.\n");
>  			return;
>  		}
>  	} else {
>  		unsigned long long start;
>  
> -		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
> -				 1<<20);
> +		start = memblock_find_in_range(crash_base,
> +				 crash_base + crash_size, crash_size, 1<<20);
>  		if (start != crash_base) {
>  			pr_info("crashkernel reservation failed - memory is in use.\n");
>  			return;
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
       [not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-09-27 11:21 ` caiqian
  2010-09-27 22:22   ` Yinghai Lu
  0 siblings, 1 reply; 16+ messages in thread
From: caiqian @ 2010-09-27 11:21 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin


----- "CAI Qian" <caiqian@redhat.com> wrote:

> ----- "Yinghai Lu" <yinghai@kernel.org> wrote:
> 
> > Please check this one on top of tip or next.
> This failed for both trees.
> [root@localhost linux-next]# patch -Np1 <memblock.patch
> patching file arch/x86/kernel/setup.c
> Hunk #1 FAILED at 516.
> 1 out of 1 hunk FAILED -- saving rejects to file
> arch/x86/kernel/setup.c.rej
After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said,

Warning: Core image elf header is notsane
Kdump: vmcore not initialized

Here is the dmesg from the second kernel,

Initializing cgroup subsys cpuset
Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory  memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable)
 BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable)
last_pfn = 0xca0000 max_arch_pfn = 0x400000000
NX (Execute Disable) protection: active
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000002000000 - 0000000009f5a000 (usable)
DMI 2.4 present.
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
No AGP bridge found
last_pfn = 0x9f5a max_arch_pfn = 0x400000000
MTRR default type: write-back
MTRR fixed ranges enabled:
  00000-9FFFF write-back
  A0000-BFFFF uncachable
  C0000-FFFFF write-protect
MTRR variable ranges enabled:
  0 base 00E0000000 mask FFE0000000 uncachable
  1 disabled
  2 disabled
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
PAT not supported by CPU.
found SMP MP-table at [ffff8800000f7fb0] f7fb0
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-0000000009f5a000
 0000000000 - 0009e00000 page 2M
 0009e00000 - 0009f5a000 page 4k
kernel direct mapping tables up to 9f5a000 @ 9f57000-9f5a000
RAMDISK: 09ae5000 - 09f49000
crashkernel reservation failed - No suitable area found.
ACPI: RSDP 00000000000f7f60 00014 (v00 BOCHS )
ACPI: RSDT 00000000dfffd890 00030 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
ACPI: FACP 00000000dffffa30 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
ACPI: DSDT 00000000dfffdb70 01E4B (v01   BXPC   BXDSDT 00000001 INTL 20090123)
ACPI: FACS 00000000dffff9c0 00040
ACPI: SSDT 00000000dfffda40 0012F (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
ACPI: APIC 00000000dfffd8c0 0010A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
ACPI: Local APIC address 0xfee00000
No NUMA configuration found
Faking a node at 0000000000000000-0000000009f5a000
Initmem setup node 0 0000000000000000-0000000009f5a000
  NODE_DATA [0000000009abe000 - 0000000009ae4fff]
kvm-clock: Using msrs 12 and 11
kvm-clock: cpu 0, msr 0:28c3741, boot clock
 [ffffea0000000000-ffffea00003fffff] PMD -> [ffff880008e00000-ffff8800091fffff] on node 0
sizeof(struct page) = 56
Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   empty
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000010 -> 0x000000a0
    0: 0x00002000 -> 0x00009f5a
On node 0 totalpages: 32746
  DMA zone: 56 pages used for memmap
  DMA zone: 7 pages reserved
  DMA zone: 81 pages, LIFO batch:0
  DMA32 zone: 502 pages used for memmap
  DMA32 zone: 32100 pages, LIFO batch:7
ACPI: PM-Timer IO Port: 0xb008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x10] enabled)
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x12] enabled)
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x13] enabled)
ACPI: IOAPIC (id[0x14] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 20, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ5 used by override.
ACPI: IRQ9 used by override.
ACPI: IRQ10 used by override.
ACPI: IRQ11 used by override.
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 20 CPUs, 0 hotplug CPUs
nr_irqs_gsi: 40
PM: Registered nosave memory: 00000000000a0000 - 0000000002000000
Allocating PCI resources starting at 9f5a000 (gap: 9f5a000:f60a6000)
Booting paravirtualized kernel on KVM
setup_percpu: NR_CPUS:4096 nr_cpumask_bits:20 nr_cpu_ids:20 nr_node_ids:1
PERCPU: Embedded 29 pages/cpu @ffff880009400000 s86912 r8192 d23680 u262144
pcpu-alloc: s86912 r8192 d23680 u262144 alloc=1*2097152
pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 
pcpu-alloc: [0] 16 17 18 19 -- -- -- -- 
kvm-clock: cpu 0, msr 0:9414741, primary cpu clock
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 32181
Policy zone: DMA32
Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory  memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Disabling memory control group subsystem
PID hash table entries: 512 (order: 0, 4096 bytes)
Checking aperture...
No AGP bridge found
Memory: 103484k/163176k available (4267k kernel code, 32192k absent, 27500k reserved, 4617k data, 2484k init)
Hierarchical RCU implementation.
	RCU-based detection of stalled CPUs is disabled.
	Verbose stalled-CPUs detection is disabled.
NR_IRQS:262400 nr_irqs:840
Spurious LAPIC timer interrupt on cpu 0
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Detected 1995.358 MHz processor.
Calibrating delay loop (skipped) preset value.. 3990.71 BogoMIPS (lpj=1995358)
pid_max: default: 32768 minimum: 301
Security Framework initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
mce: CPU supports 10 MCE banks
Performance Events: p6 PMU driver.
... version:                0
... bit width:              32
... generic registers:      2
... value mask:             00000000ffffffff
... max period:             000000007fffffff
... fixed-purpose events:   0
... event mask:             0000000000000003
SMP alternatives: switching to UP code
ACPI: Core revision 20100702
Setting APIC routing to physical flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03
Brought up 1 CPUs
Total of 1 processors activated (3990.71 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
IRQ 9: starting IRQFIXUP_POLL
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff] (ignored)
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
pci 0000:00:01.2: reg 20: [io  0xc020-0xc03f]
pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref]
pci 0000:00:03.0: reg 10: [io  0xc100-0xc1ff]
pci 0000:00:03.0: reg 14: [mem 0xf2020000-0xf20200ff]
pci 0000:00:03.0: reg 30: [mem 0xf2030000-0xf203ffff pref]
pci 0000:00:04.0: reg 10: [io  0xc400-0xc7ff]
pci 0000:00:04.0: reg 14: [io  0xc800-0xc8ff]
pci 0000:00:05.0: reg 10: [io  0xc900-0xc91f]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: pci_cache_line_size set to 64 bytes
reserve RAM buffer: 0000000009f5a000 - 000000000bffffff 
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
Switching to clocksource kvm-clock
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 6 devices
ACPI: ACPI bus type pnp unregistered
pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 1, 8192 bytes)
TCP established hash table entries: 4096 (order: 4, 65536 bytes)
TCP bind hash table entries: 4096 (order: 4, 65536 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
UDP hash table entries: 128 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 128 (order: 0, 4096 bytes)
NET: Registered protocol family 1
pci 0000:00:00.0: Limiting direct PCI/PCI transfers
pci 0000:00:01.0: Activating ISA DMA hang workarounds
pci 0000:00:02.0: Boot video device
PCI: CLS 64 bytes, default 64
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 4496k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1285586109.207:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Warning: Core image elf header is notsane
Kdump: vmcore not initialized

> 
> >
> > Thanks
> >
> > Yinghai
> >
> > [PATCH] x86, memblock: Fix crashkernel allocation
> >
> > Cai Qian found that crashkernel is broken with x86 memblock changes
> > 1. crashkernel=128M@32M always reported that range is used, even
> first
> > kernel is small
> >    no one use that range
> > 2. always get following report when using "kexec -p"
> > 	Could not find a free area of memory of a000 bytes...
> > 	locate_hole failed
> >
> > The root cause is that generic memblock_find_in_range() will try to
> > get range from top_down.
> > But crashkernel do need from low and specified range.
> >
> > Let's limit the target range with rash_base + crash_size to make
> sure
> > that
> > We get range from bottom.
> >
> > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >
> > ---
> >  arch/x86/kernel/setup.c |   19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > Index: linux-2.6/arch/x86/kernel/setup.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/setup.c
> > +++ linux-2.6/arch/x86/kernel/setup.c
> > @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
> >
> >  	/* 0 means: find the address automatically */
> >  	if (crash_base <= 0) {
> > +		unsigned long long start = 0;
> >  		const unsigned long long alignment = 16<<20;	/* 16M */
> >
> > -		crash_base = memblock_find_in_range(alignment, ULONG_MAX,
> > crash_size,
> > -				 alignment);
> > -		if (crash_base == MEMBLOCK_ERROR) {
> > +		crash_base = alignment;
> > +		while (crash_base < 0xffffffff) {
> > +			start = memblock_find_in_range(crash_base,
> > +				crash_base + crash_size, crash_size, alignment);
> > +
> > +			if (start == crash_base)
> > +				break;
> > +
> > +			crash_base += alignment;
> > +		}
> > +		if (start != crash_base) {
> >  			pr_info("crashkernel reservation failed - No suitable area
> > found.\n");
> >  			return;
> >  		}
> >  	} else {
> >  		unsigned long long start;
> >
> > -		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
> > -				 1<<20);
> > +		start = memblock_find_in_range(crash_base,
> > +				 crash_base + crash_size, crash_size, 1<<20);
> >  		if (start != crash_base) {
> >  			pr_info("crashkernel reservation failed - memory is in use.\n");
> >  			return;
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 11:21 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" caiqian
@ 2010-09-27 22:22   ` Yinghai Lu
  2010-09-27 22:50     ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2010-09-27 22:22 UTC (permalink / raw)
  To: caiqian; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 2632 bytes --]

On 09/27/2010 04:21 AM, caiqian@redhat.com wrote:
> 
> ----- "CAI Qian" <caiqian@redhat.com> wrote:
> 
>> ----- "Yinghai Lu" <yinghai@kernel.org> wrote:
>>
>>> Please check this one on top of tip or next.
>> This failed for both trees.
>> [root@localhost linux-next]# patch -Np1 <memblock.patch
>> patching file arch/x86/kernel/setup.c
>> Hunk #1 FAILED at 516.
>> 1 out of 1 hunk FAILED -- saving rejects to file
>> arch/x86/kernel/setup.c.rej
> After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said,
> 
> Warning: Core image elf header is notsane
> Kdump: vmcore not initialized
> 
> Here is the dmesg from the second kernel,
> 
> Initializing cgroup subsys cpuset
> Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010
> Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory  memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063
> BIOS-provided physical RAM map:
>  BIOS-e820: 0000000000000100 - 000000000009f400 (usable)
>  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable)
>  BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved)
>  BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
>  BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable)
> last_pfn = 0xca0000 max_arch_pfn = 0x400000000
> NX (Execute Disable) protection: active
> user-defined physical RAM map:
>  user: 0000000000000000 - 00000000000a0000 (usable)
>  user: 0000000002000000 - 0000000009f5a000 (usable)
...

> Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> Warning: Core image elf header is notsane
> Kdump: vmcore not initialized
> 
>>

it should work on tip..., I tested on RHEL 6.0 beta.
with
/etc/init.d/kdump restart

BTW, second kernel is not supposed to take crashkernel=128M again.
/etc/init.d/kdump scripts remove that while using /proc/cmdline.

please refer
http://people.redhat.com/mingo/tip.git/readme.txt
to get tip/master

and apply attached patch
cat crashkernel_limit.patch | patch -p1

Thanks

Yinghai


[-- Attachment #2: crashkernel_limit.patch --]
[-- Type: text/x-patch, Size: 2230 bytes --]

[PATCH -v2] x86, memblock: Fix crashkernel allocation

Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
   no one use that range
2. always get following report when using "kexec -p"
	Could not find a free area of memory of a000 bytes...
	locate_hole failed

The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.

Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.

-v2: don't limit it with 0xffffffff, in case kexec will use bzImage 64bit entry or vmlinux,
     and try to allocate huge area for crashkernel.

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v
 
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
+		unsigned long long start = 0;
 		const unsigned long long alignment = 16<<20;	/* 16M */
 
-		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
-				 alignment);
-		if (crash_base == MEMBLOCK_ERROR) {
+		crash_base = alignment;
+		while ((crash_base + crash_size) <= total_mem) {
+			start = memblock_find_in_range(crash_base,
+				crash_base + crash_size, crash_size, alignment);
+
+			if (start == crash_base)
+				break;
+
+			crash_base += alignment;
+		}
+		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
 	} else {
 		unsigned long long start;
 
-		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
-				 1<<20);
+		start = memblock_find_in_range(crash_base,
+				 crash_base + crash_size, crash_size, 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 22:22   ` Yinghai Lu
@ 2010-09-27 22:50     ` H. Peter Anvin
  2010-09-27 23:20       ` Yinghai Lu
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2010-09-27 22:50 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel

+		crash_base = alignment;
+		while ((crash_base + crash_size) <= total_mem) {
+			start = memblock_find_in_range(crash_base,
+				crash_base + crash_size, crash_size, alignment);
+
+			if (start == crash_base)
+				break;
+
+			crash_base += alignment;
+		}
+		if (start != crash_base) {

Open-coded crap violation error!

Seriously, these kinds of open-coded loops are *never* acceptable, since
they are really "let's violate the interface by making it do something
it wasn't intended to do" -- it means we need a new interface.

Alternatively, if we really need the lowest possible address, why do we
need to search?

	-hpa

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 22:50     ` H. Peter Anvin
@ 2010-09-27 23:20       ` Yinghai Lu
  2010-09-27 23:26         ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2010-09-27 23:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel

On 09/27/2010 03:50 PM, H. Peter Anvin wrote:
> +		crash_base = alignment;
> +		while ((crash_base + crash_size) <= total_mem) {
> +			start = memblock_find_in_range(crash_base,
> +				crash_base + crash_size, crash_size, alignment);
> +
> +			if (start == crash_base)
> +				break;
> +
> +			crash_base += alignment;
> +		}
> +		if (start != crash_base) {
> 
> Open-coded crap violation error!
> 
> Seriously, these kinds of open-coded loops are *never* acceptable, since
> they are really "let's violate the interface by making it do something
> it wasn't intended to do" -- it means we need a new interface.
> 
> Alternatively, if we really need the lowest possible address, why do we
> need to search?

x86 own version for find_area?

Subject: [PATCH] x86, memblock: Add x86 version of memblock_find_in_range()

Generic version is going from high to low, and it seems it can not find
right area compact enough.

the x86 version will go from goal to limit and just like the way We used
for early_res

use ARCH_FIND_MEMBLOCK_AREA to select from them.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/Kconfig       |    8 +++++++
 arch/x86/mm/memblock.c |   54 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/memblock.c          |    2 -
 3 files changed, 63 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/memblock.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/memblock.c
+++ linux-2.6/arch/x86/mm/memblock.c
@@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st
 
 	return end - start - ((u64)ram << PAGE_SHIFT);
 }
+
+#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA
+/* Check for already reserved areas */
+static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align)
+{
+	u64 addr = *addrp;
+	bool changed = false;
+	struct memblock_region *r;
+again:
+	for_each_memblock(reserved, r) {
+		if ((addr + size) > r->base && addr < (r->base + r->size)) {
+			addr = round_up(r->base + r->size, align);
+			changed = true;
+			goto again;
+		}
+	}
+
+	if (changed)
+		*addrp = addr;
+
+	return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+{
+	struct memblock_region *r;
+
+	for_each_memblock(memory, r) {
+		u64 ei_start = r->base;
+		u64 ei_last = ei_start + r->size;
+		u64 addr, last;
+
+		addr = round_up(ei_start, align);
+		if (addr < start)
+			addr = round_up(start, align);
+		if (addr >= ei_last)
+			continue;
+		while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last)
+			;
+		last = addr + size;
+		if (last > ei_last)
+			continue;
+		if (last > end)
+			continue;
+
+		return addr;
+	}
+
+	return MEMBLOCK_ERROR;
+}
+#endif
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -569,6 +569,14 @@ config PARAVIRT_DEBUG
 	  Enable to debug paravirt_ops internals.  Specifically, BUG if
 	  a paravirt_op is missing when it is called.
 
+config ARCH_MEMBLOCK_FIND_AREA
+	default y
+	bool "Use x86 own memblock_find_in_range()"
+	---help---
+	  Use memblock_find_in_range() version instead of generic version, it get free
+	  area up from low.
+	  Generic one try to get free area down from limit.
+
 config NO_BOOTMEM
 	def_bool y
 
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl
 /*
  * Find a free area with specified alignment in a specific range.
  */
-u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
 {
 	return memblock_find_base(size, align, start, end);
 }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 23:20       ` Yinghai Lu
@ 2010-09-27 23:26         ` H. Peter Anvin
  2010-09-27 23:32           ` Yinghai Lu
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2010-09-27 23:26 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel

On 09/27/2010 04:20 PM, Yinghai Lu wrote:
> 
> x86 own version for find_area?
> 

No, double no.

Same kind of crap: overloading an interface with semantics it shouldn't
have.  The right thing is to introduce a new interface with carries the
explicitly needed policy with it... e.g. memblock_find_in_range_lowest().

That interface would have the explicit semantics of returning the lowest
possible address, as opposed to any suitable address (which may change
if policy requirements change.)

The other question is why does kexec need this in the first place?  Is
this due to a design bug in kexec or is there some fundamental reason
for this?

	-hpa

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 23:26         ` H. Peter Anvin
@ 2010-09-27 23:32           ` Yinghai Lu
  2010-09-27 23:34             ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2010-09-27 23:32 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel

On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>>
>> x86 own version for find_area?
>>
> 
> No, double no.
> 
> Same kind of crap: overloading an interface with semantics it shouldn't
> have.  The right thing is to introduce a new interface with carries the
> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
> 
> That interface would have the explicit semantics of returning the lowest
> possible address, as opposed to any suitable address (which may change
> if policy requirements change.)
> 
> The other question is why does kexec need this in the first place?  Is
> this due to a design bug in kexec or is there some fundamental reason
> for this?

bzImage is used here. so need range below 4g.

Yinghai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 23:32           ` Yinghai Lu
@ 2010-09-27 23:34             ` H. Peter Anvin
  2010-09-27 23:41               ` Yinghai Lu
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2010-09-27 23:34 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel

On 09/27/2010 04:32 PM, Yinghai Lu wrote:
> On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
>> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>>>
>>> x86 own version for find_area?
>>>
>>
>> No, double no.
>>
>> Same kind of crap: overloading an interface with semantics it shouldn't
>> have.  The right thing is to introduce a new interface with carries the
>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
>>
>> That interface would have the explicit semantics of returning the lowest
>> possible address, as opposed to any suitable address (which may change
>> if policy requirements change.)
>>
>> The other question is why does kexec need this in the first place?  Is
>> this due to a design bug in kexec or is there some fundamental reason
>> for this?
> 
> bzImage is used here. so need range below 4g.
> 

OK, so why don't you cap the range to 4 GiB and then pass that down to
the existing interface?  That's different from "lowest possible address".

	-hpa


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 23:34             ` H. Peter Anvin
@ 2010-09-27 23:41               ` Yinghai Lu
  2010-09-28  0:53                 ` Vivek Goyal
  0 siblings, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2010-09-27 23:41 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel

On 09/27/2010 04:34 PM, H. Peter Anvin wrote:
> On 09/27/2010 04:32 PM, Yinghai Lu wrote:
>> On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
>>> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
>>>>
>>>> x86 own version for find_area?
>>>>
>>>
>>> No, double no.
>>>
>>> Same kind of crap: overloading an interface with semantics it shouldn't
>>> have.  The right thing is to introduce a new interface with carries the
>>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
>>>
>>> That interface would have the explicit semantics of returning the lowest
>>> possible address, as opposed to any suitable address (which may change
>>> if policy requirements change.)
>>>
>>> The other question is why does kexec need this in the first place?  Is
>>> this due to a design bug in kexec or is there some fundamental reason
>>> for this?
>>
>> bzImage is used here. so need range below 4g.
>>
> 
> OK, so why don't you cap the range to 4 GiB and then pass that down to
> the existing interface?  That's different from "lowest possible address".

but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly.
and crashkernel=4096M, we could get failure again.

maybe something like this, will give it a try, hope kexec doesn't have other limitation.

[PATCH -v3] x86, memblock: Fix crashkernel allocation

Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
   no one use that range
2. always get following report when using "kexec -p"
	Could not find a free area of memory of a000 bytes...
	locate_hole failed

The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.

Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.

-v3: don't use loop for find low one

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v
 	if (crash_base <= 0) {
 		const unsigned long long alignment = 16<<20;	/* 16M */
 
-		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
-				 alignment);
+		crash_base = memblock_find_in_range(alignment, 0xffffffff,
+				crash_size, alignment);
+
 		if (crash_base == MEMBLOCK_ERROR) {
-			pr_info("crashkernel reservation failed - No suitable area found.\n");
-			return;
+			crash_base = memblock_find_in_range(alignment,
+					 ULONG_MAX, crash_size, alignment);
+
+			if (crash_base == MEMBLOCK_ERROR) {
+				pr_info("crashkernel reservation failed - No suitable area found.\n");
+				return;
+			}
 		}
 	} else {
 		unsigned long long start;
 
-		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
-				 1<<20);
+		start = memblock_find_in_range(crash_base,
+				 crash_base + crash_size, crash_size, 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-27 23:41               ` Yinghai Lu
@ 2010-09-28  0:53                 ` Vivek Goyal
  2010-09-28  2:41                   ` Yinghai Lu
  2010-09-28  3:46                   ` H. Peter Anvin
  0 siblings, 2 replies; 16+ messages in thread
From: Vivek Goyal @ 2010-09-28  0:53 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: H. Peter Anvin, Ingo Molnar, kexec, caiqian, linux-kernel

On Mon, Sep 27, 2010 at 04:41:31PM -0700, Yinghai Lu wrote:
> On 09/27/2010 04:34 PM, H. Peter Anvin wrote:
> > On 09/27/2010 04:32 PM, Yinghai Lu wrote:
> >> On 09/27/2010 04:26 PM, H. Peter Anvin wrote:
> >>> On 09/27/2010 04:20 PM, Yinghai Lu wrote:
> >>>>
> >>>> x86 own version for find_area?
> >>>>
> >>>
> >>> No, double no.
> >>>
> >>> Same kind of crap: overloading an interface with semantics it shouldn't
> >>> have.  The right thing is to introduce a new interface with carries the
> >>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest().
> >>>
> >>> That interface would have the explicit semantics of returning the lowest
> >>> possible address, as opposed to any suitable address (which may change
> >>> if policy requirements change.)
> >>>
> >>> The other question is why does kexec need this in the first place?  Is
> >>> this due to a design bug in kexec or is there some fundamental reason
> >>> for this?
> >>
> >> bzImage is used here. so need range below 4g.
> >>
> > 
> > OK, so why don't you cap the range to 4 GiB and then pass that down to
> > the existing interface?  That's different from "lowest possible address".
> 
> but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly.
> and crashkernel=4096M, we could get failure again.
> 
> maybe something like this, will give it a try, hope kexec doesn't have other limitation.
> 
> [PATCH -v3] x86, memblock: Fix crashkernel allocation
> 
> Cai Qian found crashkernel is broken with x86 memblock changes
> 1. crashkernel=128M@32M always reported that range is used, even first kernel is small
>    no one use that range
> 2. always get following report when using "kexec -p"
> 	Could not find a free area of memory of a000 bytes...
> 	locate_hole failed
> 
> The root cause is that generic memblock_find_in_range() will try to get range from top_down.
> But crashkernel do need from low and specified range.
> 
> Let's limit the target range with rash_base + crash_size to make sure that
> We get range from bottom.
> 
> -v3: don't use loop for find low one
> 
> Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/kernel/setup.c |   19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v
>  	if (crash_base <= 0) {
>  		const unsigned long long alignment = 16<<20;	/* 16M */
>  
> -		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
> -				 alignment);
> +		crash_base = memblock_find_in_range(alignment, 0xffffffff,
> +				crash_size, alignment);
> +

Actually, hardcoding the upper limit to 4G is probably not the best idea.
Kexec loads the the relocatable binary (purgatory) and I remember that
one of the generated relocation type was signed 32 bit and allowed max value
to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.

I liked HPA's other idea better of introducing memblock_find_in_range_lowest() 
so that we search bottom up and not rely on a specific upper limit.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-28  0:53                 ` Vivek Goyal
@ 2010-09-28  2:41                   ` Yinghai Lu
  2010-09-28  3:46                   ` H. Peter Anvin
  1 sibling, 0 replies; 16+ messages in thread
From: Yinghai Lu @ 2010-09-28  2:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: H. Peter Anvin, Ingo Molnar, kexec, caiqian, linux-kernel

On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> Actually, hardcoding the upper limit to 4G is probably not the best idea.
> Kexec loads the the relocatable binary (purgatory) and I remember that
> one of the generated relocation type was signed 32 bit and allowed max value
> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.

also kexec want bzImage under 37ffffff.

> 
> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() 
> so that we search bottom up and not rely on a specific upper limit.
> 

Please check.

[PATCH -v4] x86, memblock: Fix crashkernel allocation

Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
   no one use that range
2. always get following report when using "kexec -p"
        Could not find a free area of memory of a000 bytes...
        locate_hole failed

The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.

Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.

-v4: add find_memblock_find_in_range_lowest() according to hpa and vivik.

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/memblock.h |    2 +
 arch/x86/kernel/setup.c         |    8 +++---
 arch/x86/mm/memblock.c          |   52 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86/mm/memblock.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/memblock.c
+++ linux-2.6/arch/x86/mm/memblock.c
@@ -352,3 +352,55 @@ u64 __init memblock_x86_hole_size(u64 st
 
 	return end - start - ((u64)ram << PAGE_SHIFT);
 }
+
+/* Check for already reserved areas */
+static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align)
+{
+	u64 addr = *addrp;
+	bool changed = false;
+	struct memblock_region *r;
+again:
+	for_each_memblock(reserved, r) {
+		if ((addr + size) > r->base && addr < (r->base + r->size)) {
+			addr = round_up(r->base + r->size, align);
+			changed = true;
+			goto again;
+		}
+	}
+
+	if (changed)
+		*addrp = addr;
+
+	return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range from bottom up
+ */
+u64 __init memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align)
+{
+	struct memblock_region *r;
+
+	for_each_memblock(memory, r) {
+		u64 ei_start = r->base;
+		u64 ei_last = ei_start + r->size;
+		u64 addr, last;
+
+		addr = round_up(ei_start, align);
+		if (addr < start)
+			addr = round_up(start, align);
+		if (addr >= ei_last)
+			continue;
+		while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last)
+			;
+		last = addr + size;
+		if (last > ei_last)
+			continue;
+		if (last > end)
+			continue;
+
+		return addr;
+	}
+
+	return MEMBLOCK_ERROR;
+}
Index: linux-2.6/arch/x86/include/asm/memblock.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/memblock.h
+++ linux-2.6/arch/x86/include/asm/memblock.h
@@ -18,4 +18,6 @@ u64 memblock_x86_find_in_range_node(int
 u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit);
 u64 memblock_x86_memory_in_range(u64 addr, u64 limit);
 
+u64 memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align);
+
 #endif
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -518,8 +518,8 @@ static void __init reserve_crashkernel(v
 	if (crash_base <= 0) {
 		const unsigned long long alignment = 16<<20;	/* 16M */
 
-		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
-				 alignment);
+		crash_base = memblock_find_in_range_lowest(alignment,
+					 ULONG_MAX, crash_size, alignment);
 		if (crash_base == MEMBLOCK_ERROR) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
@@ -527,8 +527,8 @@ static void __init reserve_crashkernel(v
 	} else {
 		unsigned long long start;
 
-		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
-				 1<<20);
+		start = memblock_find_in_range(crash_base,
+				 crash_base + crash_size, crash_size, 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-28  0:53                 ` Vivek Goyal
  2010-09-28  2:41                   ` Yinghai Lu
@ 2010-09-28  3:46                   ` H. Peter Anvin
  2010-09-28  7:14                     ` Yinghai Lu
  2010-09-28 13:54                     ` Vivek Goyal
  1 sibling, 2 replies; 16+ messages in thread
From: H. Peter Anvin @ 2010-09-28  3:46 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Yinghai Lu, Ingo Molnar, kexec, caiqian, linux-kernel

On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> 
> Actually, hardcoding the upper limit to 4G is probably not the best idea.
> Kexec loads the the relocatable binary (purgatory) and I remember that
> one of the generated relocation type was signed 32 bit and allowed max value
> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
> 
> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() 
> so that we search bottom up and not rely on a specific upper limit.
> 

No, it's just another crappy hack which is broken in the same way.  It's
better than open-coding, but it's still a hack.

The Right Thing[TM] to do is for kexec to communicate the topmost
address it wants to this code, so it has both the upper and the lower
boundaries available to it instead of just one.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-28  3:46                   ` H. Peter Anvin
@ 2010-09-28  7:14                     ` Yinghai Lu
  2010-09-28 14:01                       ` Vivek Goyal
  2010-09-28 13:54                     ` Vivek Goyal
  1 sibling, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2010-09-28  7:14 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Vivek Goyal, Ingo Molnar, kexec, caiqian, linux-kernel

On 09/27/2010 08:46 PM, H. Peter Anvin wrote:
> On 09/27/2010 05:53 PM, Vivek Goyal wrote:
>>
>> Actually, hardcoding the upper limit to 4G is probably not the best idea.
>> Kexec loads the the relocatable binary (purgatory) and I remember that
>> one of the generated relocation type was signed 32 bit and allowed max value
>> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
>>
>> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() 
>> so that we search bottom up and not rely on a specific upper limit.
>>
> 
> No, it's just another crappy hack which is broken in the same way.  It's
> better than open-coding, but it's still a hack.
> 
> The Right Thing[TM] to do is for kexec to communicate the topmost
> address it wants to this code, so it has both the upper and the lower
> boundaries available to it instead of just one.

hope you are happy with this one.

[PATCH -v5] x86, memblock: Fix crashkernel allocation

Cai Qian found crashkernel is broken with x86 memblock changes
1. crashkernel=128M@32M always reported that range is used, even first kernel is small
   no one use that range
2. always get following report when using "kexec -p"
	Could not find a free area of memory of a000 bytes...
	locate_hole failed

The root cause is that generic memblock_find_in_range() will try to get range from top_down.
But crashkernel do need from low and specified range.

Let's limit the target range with rash_base + crash_size to make sure that
We get range from bottom.

-v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge.
     also second try for vmlinux or new kexec tools will use bzImage 64bit entry

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -501,6 +501,7 @@ static inline unsigned long long get_tot
 	return total << PAGE_SHIFT;
 }
 
+#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long total_mem;
@@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v
 	if (crash_base <= 0) {
 		const unsigned long long alignment = 16<<20;	/* 16M */
 
-		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
-				 alignment);
+		/*
+		 * Assume half crash_size is for bzImage
+		 *  kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX
+		 */
+		crash_base = memblock_find_in_range(alignment,
+				DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2,
+				crash_size, alignment);
+
 		if (crash_base == MEMBLOCK_ERROR) {
-			pr_info("crashkernel reservation failed - No suitable area found.\n");
-			return;
+			crash_base = memblock_find_in_range(alignment,
+					 ULONG_MAX, crash_size, alignment);
+
+			if (crash_base == MEMBLOCK_ERROR) {
+				pr_info("crashkernel reservation failed - No suitable area found.\n");
+				return;
+			}
 		}
 	} else {
 		unsigned long long start;
 
-		start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size,
-				 1<<20);
+		start = memblock_find_in_range(crash_base,
+				 crash_base + crash_size, crash_size, 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-28  3:46                   ` H. Peter Anvin
  2010-09-28  7:14                     ` Yinghai Lu
@ 2010-09-28 13:54                     ` Vivek Goyal
  1 sibling, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2010-09-28 13:54 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Yinghai Lu, Ingo Molnar, kexec, caiqian, linux-kernel

On Mon, Sep 27, 2010 at 08:46:42PM -0700, H. Peter Anvin wrote:
> On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> > 
> > Actually, hardcoding the upper limit to 4G is probably not the best idea.
> > Kexec loads the the relocatable binary (purgatory) and I remember that
> > one of the generated relocation type was signed 32 bit and allowed max value
> > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
> > 
> > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() 
> > so that we search bottom up and not rely on a specific upper limit.
> > 
> 
> No, it's just another crappy hack which is broken in the same way.  It's
> better than open-coding, but it's still a hack.
> 
> The Right Thing[TM] to do is for kexec to communicate the topmost
> address it wants to this code, so it has both the upper and the lower
> boundaries available to it instead of just one.
> 

Being able to specify the upper limit would be the best thing so that
kernel does not have to make any assumptions or hardcode anything.

Question is how to determine the upper limit.

- The upper limit will depend on what is being loaded in reserved region.
  Reserving memory using crashkernel= is a boot time optin and that point
  of time kexec has not even run. So we don't know what is the upper
  limit.

  Now we can do extra reboot to make it happen. Boot first kernel without
  reserving any memory. Introduce an option in kexec which tells user what
  are the segments kexec would like to load (for a given binary) and what
  are there upper memory limits and then user goes ahead modifies the
  command line and reboots the kernel back.

  This all sounds not so clean. Especially upper limit might change based
  on binary being loaded and a user might have to perform a reboot again.

So to me trying to get lowest memory available possible for crashkernel
reservations is not that a bad idea. It is certainly better than making
hardcoded assumptions about the upper limit.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_"
  2010-09-28  7:14                     ` Yinghai Lu
@ 2010-09-28 14:01                       ` Vivek Goyal
  0 siblings, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2010-09-28 14:01 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: H. Peter Anvin, Ingo Molnar, kexec, caiqian, linux-kernel

On Tue, Sep 28, 2010 at 12:14:31AM -0700, Yinghai Lu wrote:
> On 09/27/2010 08:46 PM, H. Peter Anvin wrote:
> > On 09/27/2010 05:53 PM, Vivek Goyal wrote:
> >>
> >> Actually, hardcoding the upper limit to 4G is probably not the best idea.
> >> Kexec loads the the relocatable binary (purgatory) and I remember that
> >> one of the generated relocation type was signed 32 bit and allowed max value
> >> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G.
> >>
> >> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() 
> >> so that we search bottom up and not rely on a specific upper limit.
> >>
> > 
> > No, it's just another crappy hack which is broken in the same way.  It's
> > better than open-coding, but it's still a hack.
> > 
> > The Right Thing[TM] to do is for kexec to communicate the topmost
> > address it wants to this code, so it has both the upper and the lower
> > boundaries available to it instead of just one.
> 
> hope you are happy with this one.
> 
> [PATCH -v5] x86, memblock: Fix crashkernel allocation
> 
> Cai Qian found crashkernel is broken with x86 memblock changes
> 1. crashkernel=128M@32M always reported that range is used, even first kernel is small
>    no one use that range
> 2. always get following report when using "kexec -p"
> 	Could not find a free area of memory of a000 bytes...
> 	locate_hole failed
> 
> The root cause is that generic memblock_find_in_range() will try to get range from top_down.
> But crashkernel do need from low and specified range.
> 
> Let's limit the target range with rash_base + crash_size to make sure that
> We get range from bottom.
> 
> -v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge.
>      also second try for vmlinux or new kexec tools will use bzImage 64bit entry
> 
> Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/kernel/setup.c |   24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -501,6 +501,7 @@ static inline unsigned long long get_tot
>  	return total << PAGE_SHIFT;
>  }
>  
> +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF
>  static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long total_mem;
> @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v
>  	if (crash_base <= 0) {
>  		const unsigned long long alignment = 16<<20;	/* 16M */
>  
> -		crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size,
> -				 alignment);
> +		/*
> +		 * Assume half crash_size is for bzImage
> +		 *  kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX
> +		 */
> +		crash_base = memblock_find_in_range(alignment,
> +				DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2,
> +				crash_size, alignment);
> +

IMHO, these kind of hardcodings are worse than finding the lowest possible
address. It is assuming that kexec is going to load a bzImage.

So we have following three options sorted from best to worst.

- Specify upper limit in "crashkernel=" command line syntax
- Find the lowest possible address for crashkernel reservations
- Hardcode upper limit based on certain factors.

Because upper limit depends on image being loaded and can also vary as
kexec-tools changes, knowing it for sure will require extra reboot. It
also make command line syntax more complicated as we need to introduce
another field to speciy upper limit. Especially for the following case.

crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]

So personally I think we can stick to second best option and that is
finding the lowest possible memory area.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-09-28 14:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 11:21 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" caiqian
2010-09-27 22:22   ` Yinghai Lu
2010-09-27 22:50     ` H. Peter Anvin
2010-09-27 23:20       ` Yinghai Lu
2010-09-27 23:26         ` H. Peter Anvin
2010-09-27 23:32           ` Yinghai Lu
2010-09-27 23:34             ` H. Peter Anvin
2010-09-27 23:41               ` Yinghai Lu
2010-09-28  0:53                 ` Vivek Goyal
2010-09-28  2:41                   ` Yinghai Lu
2010-09-28  3:46                   ` H. Peter Anvin
2010-09-28  7:14                     ` Yinghai Lu
2010-09-28 14:01                       ` Vivek Goyal
2010-09-28 13:54                     ` Vivek Goyal
     [not found] <870873343.2003871285555329846.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27  6:31 ` Yinghai Lu
2010-09-27  9:16   ` CAI Qian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox