* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" [not found] <870873343.2003871285555329846.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-09-27 6:31 ` Yinghai Lu 2010-09-27 9:16 ` CAI Qian 0 siblings, 1 reply; 16+ messages in thread From: Yinghai Lu @ 2010-09-27 6:31 UTC (permalink / raw) To: caiqian; +Cc: kexec, H. Peter Anvin, Ingo Molnar, linux-kernel@vger.kernel.org Please check this one on top of tip or next. Thanks Yinghai [PATCH] x86, memblock: Fix crashkernel allocation Cai Qian found that crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v /* 0 means: find the address automatically */ if (crash_base <= 0) { + unsigned long long start = 0; const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); - if (crash_base == MEMBLOCK_ERROR) { + crash_base = alignment; + while (crash_base < 0xffffffff) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 6:31 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" Yinghai Lu @ 2010-09-27 9:16 ` CAI Qian 0 siblings, 0 replies; 16+ messages in thread From: CAI Qian @ 2010-09-27 9:16 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > Please check this one on top of tip or next. This failed for both trees. [root@localhost linux-next]# patch -Np1 <memblock.patch patching file arch/x86/kernel/setup.c Hunk #1 FAILED at 516. 1 out of 1 hunk FAILED -- saving rejects to file arch/x86/kernel/setup.c.rej > > Thanks > > Yinghai > > [PATCH] x86, memblock: Fix crashkernel allocation > > Cai Qian found that crashkernel is broken with x86 memblock changes > 1. crashkernel=128M@32M always reported that range is used, even first > kernel is small > no one use that range > 2. always get following report when using "kexec -p" > Could not find a free area of memory of a000 bytes... > locate_hole failed > > The root cause is that generic memblock_find_in_range() will try to > get range from top_down. > But crashkernel do need from low and specified range. > > Let's limit the target range with rash_base + crash_size to make sure > that > We get range from bottom. > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/setup.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v > > /* 0 means: find the address automatically */ > if (crash_base <= 0) { > + unsigned long long start = 0; > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, > crash_size, > - alignment); > - if (crash_base == MEMBLOCK_ERROR) { > + crash_base = alignment; > + while (crash_base < 0xffffffff) { > + start = memblock_find_in_range(crash_base, > + crash_base + crash_size, crash_size, alignment); > + > + if (start == crash_base) > + break; > + > + crash_base += alignment; > + } > + if (start != crash_base) { > pr_info("crashkernel reservation failed - No suitable area > found.\n"); > return; > } > } else { > unsigned long long start; > > - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, > - 1<<20); > + start = memblock_find_in_range(crash_base, > + crash_base + crash_size, crash_size, 1<<20); > if (start != crash_base) { > pr_info("crashkernel reservation failed - memory is in use.\n"); > return; > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>]
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" [not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-09-27 11:21 ` caiqian 2010-09-27 22:22 ` Yinghai Lu 0 siblings, 1 reply; 16+ messages in thread From: caiqian @ 2010-09-27 11:21 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin ----- "CAI Qian" <caiqian@redhat.com> wrote: > ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > > > Please check this one on top of tip or next. > This failed for both trees. > [root@localhost linux-next]# patch -Np1 <memblock.patch > patching file arch/x86/kernel/setup.c > Hunk #1 FAILED at 516. > 1 out of 1 hunk FAILED -- saving rejects to file > arch/x86/kernel/setup.c.rej After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said, Warning: Core image elf header is notsane Kdump: vmcore not initialized Here is the dmesg from the second kernel, Initializing cgroup subsys cpuset Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010 Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063 BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009f400 (usable) BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable) BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved) BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable) last_pfn = 0xca0000 max_arch_pfn = 0x400000000 NX (Execute Disable) protection: active user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000002000000 - 0000000009f5a000 (usable) DMI 2.4 present. e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) e820 remove range: 00000000000a0000 - 0000000000100000 (usable) No AGP bridge found last_pfn = 0x9f5a max_arch_pfn = 0x400000000 MTRR default type: write-back MTRR fixed ranges enabled: 00000-9FFFF write-back A0000-BFFFF uncachable C0000-FFFFF write-protect MTRR variable ranges enabled: 0 base 00E0000000 mask FFE0000000 uncachable 1 disabled 2 disabled 3 disabled 4 disabled 5 disabled 6 disabled 7 disabled PAT not supported by CPU. found SMP MP-table at [ffff8800000f7fb0] f7fb0 initial memory mapped : 0 - 20000000 init_memory_mapping: 0000000000000000-0000000009f5a000 0000000000 - 0009e00000 page 2M 0009e00000 - 0009f5a000 page 4k kernel direct mapping tables up to 9f5a000 @ 9f57000-9f5a000 RAMDISK: 09ae5000 - 09f49000 crashkernel reservation failed - No suitable area found. ACPI: RSDP 00000000000f7f60 00014 (v00 BOCHS ) ACPI: RSDT 00000000dfffd890 00030 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001) ACPI: FACP 00000000dffffa30 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001) ACPI: DSDT 00000000dfffdb70 01E4B (v01 BXPC BXDSDT 00000001 INTL 20090123) ACPI: FACS 00000000dffff9c0 00040 ACPI: SSDT 00000000dfffda40 0012F (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001) ACPI: APIC 00000000dfffd8c0 0010A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001) ACPI: Local APIC address 0xfee00000 No NUMA configuration found Faking a node at 0000000000000000-0000000009f5a000 Initmem setup node 0 0000000000000000-0000000009f5a000 NODE_DATA [0000000009abe000 - 0000000009ae4fff] kvm-clock: Using msrs 12 and 11 kvm-clock: cpu 0, msr 0:28c3741, boot clock [ffffea0000000000-ffffea00003fffff] PMD -> [ffff880008e00000-ffff8800091fffff] on node 0 sizeof(struct page) = 56 Zone PFN ranges: DMA 0x00000010 -> 0x00001000 DMA32 0x00001000 -> 0x00100000 Normal empty Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0x00000010 -> 0x000000a0 0: 0x00002000 -> 0x00009f5a On node 0 totalpages: 32746 DMA zone: 56 pages used for memmap DMA zone: 7 pages reserved DMA zone: 81 pages, LIFO batch:0 DMA32 zone: 502 pages used for memmap DMA32 zone: 32100 pages, LIFO batch:7 ACPI: PM-Timer IO Port: 0xb008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x10] enabled) ACPI: LAPIC (acpi_id[0x11] lapic_id[0x11] enabled) ACPI: LAPIC (acpi_id[0x12] lapic_id[0x12] enabled) ACPI: LAPIC (acpi_id[0x13] lapic_id[0x13] enabled) ACPI: IOAPIC (id[0x14] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 20, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ5 used by override. ACPI: IRQ9 used by override. ACPI: IRQ10 used by override. ACPI: IRQ11 used by override. Using ACPI (MADT) for SMP configuration information SMP: Allowing 20 CPUs, 0 hotplug CPUs nr_irqs_gsi: 40 PM: Registered nosave memory: 00000000000a0000 - 0000000002000000 Allocating PCI resources starting at 9f5a000 (gap: 9f5a000:f60a6000) Booting paravirtualized kernel on KVM setup_percpu: NR_CPUS:4096 nr_cpumask_bits:20 nr_cpu_ids:20 nr_node_ids:1 PERCPU: Embedded 29 pages/cpu @ffff880009400000 s86912 r8192 d23680 u262144 pcpu-alloc: s86912 r8192 d23680 u262144 alloc=1*2097152 pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 pcpu-alloc: [0] 16 17 18 19 -- -- -- -- kvm-clock: cpu 0, msr 0:9414741, primary cpu clock Built 1 zonelists in Node order, mobility grouping on. Total pages: 32181 Policy zone: DMA32 Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063 Misrouted IRQ fixup and polling support enabled This may significantly impact system performance Disabling memory control group subsystem PID hash table entries: 512 (order: 0, 4096 bytes) Checking aperture... No AGP bridge found Memory: 103484k/163176k available (4267k kernel code, 32192k absent, 27500k reserved, 4617k data, 2484k init) Hierarchical RCU implementation. RCU-based detection of stalled CPUs is disabled. Verbose stalled-CPUs detection is disabled. NR_IRQS:262400 nr_irqs:840 Spurious LAPIC timer interrupt on cpu 0 Console: colour VGA+ 80x25 console [tty0] enabled console [ttyS0] enabled Detected 1995.358 MHz processor. Calibrating delay loop (skipped) preset value.. 3990.71 BogoMIPS (lpj=1995358) pid_max: default: 32768 minimum: 301 Security Framework initialized SELinux: Initializing. SELinux: Starting in permissive mode Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) Inode-cache hash table entries: 8192 (order: 4, 65536 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct Initializing cgroup subsys memory Initializing cgroup subsys devices Initializing cgroup subsys freezer Initializing cgroup subsys net_cls mce: CPU supports 10 MCE banks Performance Events: p6 PMU driver. ... version: 0 ... bit width: 32 ... generic registers: 2 ... value mask: 00000000ffffffff ... max period: 000000007fffffff ... fixed-purpose events: 0 ... event mask: 0000000000000003 SMP alternatives: switching to UP code ACPI: Core revision 20100702 Setting APIC routing to physical flat ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03 Brought up 1 CPUs Total of 1 processors activated (3990.71 BogoMIPS). devtmpfs: initialized regulator: core version 0.5 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0 IRQ 9: starting IRQFIXUP_POLL ACPI: EC: Look up EC in DSDT ACPI: Interpreter enabled ACPI: (supports S0 S3 S4 S5) ACPI: Using IOAPIC for interrupt routing ACPI: No dock devices found. PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored) pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored) pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f] pci 0000:00:01.2: reg 20: [io 0xc020-0xc03f] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff] pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref] pci 0000:00:03.0: reg 10: [io 0xc100-0xc1ff] pci 0000:00:03.0: reg 14: [mem 0xf2020000-0xf20200ff] pci 0000:00:03.0: reg 30: [mem 0xf2030000-0xf203ffff pref] pci 0000:00:04.0: reg 10: [io 0xc400-0xc7ff] pci 0000:00:04.0: reg 14: [io 0xc800-0xc8ff] pci 0000:00:05.0: reg 10: [io 0xc900-0xc91f] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none vgaarb: loaded SCSI subsystem initialized libata version 3.00 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: pci_cache_line_size set to 64 bytes reserve RAM buffer: 0000000009f5a000 - 000000000bffffff NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default Switching to clocksource kvm-clock pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 6 devices ACPI: ACPI bus type pnp unregistered pci_bus 0000:00: resource 0 [io 0x0000-0xffff] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff] NET: Registered protocol family 2 IP route cache hash table entries: 1024 (order: 1, 8192 bytes) TCP established hash table entries: 4096 (order: 4, 65536 bytes) TCP bind hash table entries: 4096 (order: 4, 65536 bytes) TCP: Hash tables configured (established 4096 bind 4096) TCP reno registered UDP hash table entries: 128 (order: 0, 4096 bytes) UDP-Lite hash table entries: 128 (order: 0, 4096 bytes) NET: Registered protocol family 1 pci 0000:00:00.0: Limiting direct PCI/PCI transfers pci 0000:00:01.0: Activating ISA DMA hang workarounds pci 0000:00:02.0: Boot video device PCI: CLS 64 bytes, default 64 Trying to unpack rootfs image as initramfs... Freeing initrd memory: 4496k freed audit: initializing netlink socket (disabled) type=2000 audit(1285586109.207:1): initialized HugeTLB registered 2 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Warning: Core image elf header is notsane Kdump: vmcore not initialized > > > > > Thanks > > > > Yinghai > > > > [PATCH] x86, memblock: Fix crashkernel allocation > > > > Cai Qian found that crashkernel is broken with x86 memblock changes > > 1. crashkernel=128M@32M always reported that range is used, even > first > > kernel is small > > no one use that range > > 2. always get following report when using "kexec -p" > > Could not find a free area of memory of a000 bytes... > > locate_hole failed > > > > The root cause is that generic memblock_find_in_range() will try to > > get range from top_down. > > But crashkernel do need from low and specified range. > > > > Let's limit the target range with rash_base + crash_size to make > sure > > that > > We get range from bottom. > > > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > > > --- > > arch/x86/kernel/setup.c | 19 ++++++++++++++----- > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > Index: linux-2.6/arch/x86/kernel/setup.c > > =================================================================== > > --- linux-2.6.orig/arch/x86/kernel/setup.c > > +++ linux-2.6/arch/x86/kernel/setup.c > > @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v > > > > /* 0 means: find the address automatically */ > > if (crash_base <= 0) { > > + unsigned long long start = 0; > > const unsigned long long alignment = 16<<20; /* 16M */ > > > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, > > crash_size, > > - alignment); > > - if (crash_base == MEMBLOCK_ERROR) { > > + crash_base = alignment; > > + while (crash_base < 0xffffffff) { > > + start = memblock_find_in_range(crash_base, > > + crash_base + crash_size, crash_size, alignment); > > + > > + if (start == crash_base) > > + break; > > + > > + crash_base += alignment; > > + } > > + if (start != crash_base) { > > pr_info("crashkernel reservation failed - No suitable area > > found.\n"); > > return; > > } > > } else { > > unsigned long long start; > > > > - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, > > - 1<<20); > > + start = memblock_find_in_range(crash_base, > > + crash_base + crash_size, crash_size, 1<<20); > > if (start != crash_base) { > > pr_info("crashkernel reservation failed - memory is in use.\n"); > > return; > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 11:21 ` caiqian @ 2010-09-27 22:22 ` Yinghai Lu 2010-09-27 22:50 ` H. Peter Anvin 0 siblings, 1 reply; 16+ messages in thread From: Yinghai Lu @ 2010-09-27 22:22 UTC (permalink / raw) To: caiqian; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin [-- Attachment #1: Type: text/plain, Size: 2632 bytes --] On 09/27/2010 04:21 AM, caiqian@redhat.com wrote: > > ----- "CAI Qian" <caiqian@redhat.com> wrote: > >> ----- "Yinghai Lu" <yinghai@kernel.org> wrote: >> >>> Please check this one on top of tip or next. >> This failed for both trees. >> [root@localhost linux-next]# patch -Np1 <memblock.patch >> patching file arch/x86/kernel/setup.c >> Hunk #1 FAILED at 516. >> 1 out of 1 hunk FAILED -- saving rejects to file >> arch/x86/kernel/setup.c.rej > After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said, > > Warning: Core image elf header is notsane > Kdump: vmcore not initialized > > Here is the dmesg from the second kernel, > > Initializing cgroup subsys cpuset > Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010 > Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000100 - 000000000009f400 (usable) > BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable) > BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved) > BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved) > BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable) > last_pfn = 0xca0000 max_arch_pfn = 0x400000000 > NX (Execute Disable) protection: active > user-defined physical RAM map: > user: 0000000000000000 - 00000000000a0000 (usable) > user: 0000000002000000 - 0000000009f5a000 (usable) ... > Dquot-cache hash table entries: 512 (order 0, 4096 bytes) > Warning: Core image elf header is notsane > Kdump: vmcore not initialized > >> it should work on tip..., I tested on RHEL 6.0 beta. with /etc/init.d/kdump restart BTW, second kernel is not supposed to take crashkernel=128M again. /etc/init.d/kdump scripts remove that while using /proc/cmdline. please refer http://people.redhat.com/mingo/tip.git/readme.txt to get tip/master and apply attached patch cat crashkernel_limit.patch | patch -p1 Thanks Yinghai [-- Attachment #2: crashkernel_limit.patch --] [-- Type: text/x-patch, Size: 2230 bytes --] [PATCH -v2] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v2: don't limit it with 0xffffffff, in case kexec will use bzImage 64bit entry or vmlinux, and try to allocate huge area for crashkernel. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v /* 0 means: find the address automatically */ if (crash_base <= 0) { + unsigned long long start = 0; const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); - if (crash_base == MEMBLOCK_ERROR) { + crash_base = alignment; + while ((crash_base + crash_size) <= total_mem) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 22:22 ` Yinghai Lu @ 2010-09-27 22:50 ` H. Peter Anvin 2010-09-27 23:20 ` Yinghai Lu 0 siblings, 1 reply; 16+ messages in thread From: H. Peter Anvin @ 2010-09-27 22:50 UTC (permalink / raw) To: Yinghai Lu; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel + crash_base = alignment; + while ((crash_base + crash_size) <= total_mem) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { Open-coded crap violation error! Seriously, these kinds of open-coded loops are *never* acceptable, since they are really "let's violate the interface by making it do something it wasn't intended to do" -- it means we need a new interface. Alternatively, if we really need the lowest possible address, why do we need to search? -hpa ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 22:50 ` H. Peter Anvin @ 2010-09-27 23:20 ` Yinghai Lu 2010-09-27 23:26 ` H. Peter Anvin 0 siblings, 1 reply; 16+ messages in thread From: Yinghai Lu @ 2010-09-27 23:20 UTC (permalink / raw) To: H. Peter Anvin; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel On 09/27/2010 03:50 PM, H. Peter Anvin wrote: > + crash_base = alignment; > + while ((crash_base + crash_size) <= total_mem) { > + start = memblock_find_in_range(crash_base, > + crash_base + crash_size, crash_size, alignment); > + > + if (start == crash_base) > + break; > + > + crash_base += alignment; > + } > + if (start != crash_base) { > > Open-coded crap violation error! > > Seriously, these kinds of open-coded loops are *never* acceptable, since > they are really "let's violate the interface by making it do something > it wasn't intended to do" -- it means we need a new interface. > > Alternatively, if we really need the lowest possible address, why do we > need to search? x86 own version for find_area? Subject: [PATCH] x86, memblock: Add x86 version of memblock_find_in_range() Generic version is going from high to low, and it seems it can not find right area compact enough. the x86 version will go from goal to limit and just like the way We used for early_res use ARCH_FIND_MEMBLOCK_AREA to select from them. Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/Kconfig | 8 +++++++ arch/x86/mm/memblock.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++ mm/memblock.c | 2 - 3 files changed, 63 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/mm/memblock.c =================================================================== --- linux-2.6.orig/arch/x86/mm/memblock.c +++ linux-2.6/arch/x86/mm/memblock.c @@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st return end - start - ((u64)ram << PAGE_SHIFT); } + +#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA +/* Check for already reserved areas */ +static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align) +{ + u64 addr = *addrp; + bool changed = false; + struct memblock_region *r; +again: + for_each_memblock(reserved, r) { + if ((addr + size) > r->base && addr < (r->base + r->size)) { + addr = round_up(r->base + r->size, align); + changed = true; + goto again; + } + } + + if (changed) + *addrp = addr; + + return changed; +} + +/* + * Find a free area with specified alignment in a specific range. + */ +u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) +{ + struct memblock_region *r; + + for_each_memblock(memory, r) { + u64 ei_start = r->base; + u64 ei_last = ei_start + r->size; + u64 addr, last; + + addr = round_up(ei_start, align); + if (addr < start) + addr = round_up(start, align); + if (addr >= ei_last) + continue; + while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last) + ; + last = addr + size; + if (last > ei_last) + continue; + if (last > end) + continue; + + return addr; + } + + return MEMBLOCK_ERROR; +} +#endif Index: linux-2.6/arch/x86/Kconfig =================================================================== --- linux-2.6.orig/arch/x86/Kconfig +++ linux-2.6/arch/x86/Kconfig @@ -569,6 +569,14 @@ config PARAVIRT_DEBUG Enable to debug paravirt_ops internals. Specifically, BUG if a paravirt_op is missing when it is called. +config ARCH_MEMBLOCK_FIND_AREA + default y + bool "Use x86 own memblock_find_in_range()" + ---help--- + Use memblock_find_in_range() version instead of generic version, it get free + area up from low. + Generic one try to get free area down from limit. + config NO_BOOTMEM def_bool y Index: linux-2.6/mm/memblock.c =================================================================== --- linux-2.6.orig/mm/memblock.c +++ linux-2.6/mm/memblock.c @@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl /* * Find a free area with specified alignment in a specific range. */ -u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) +u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) { return memblock_find_base(size, align, start, end); } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:20 ` Yinghai Lu @ 2010-09-27 23:26 ` H. Peter Anvin 2010-09-27 23:32 ` Yinghai Lu 0 siblings, 1 reply; 16+ messages in thread From: H. Peter Anvin @ 2010-09-27 23:26 UTC (permalink / raw) To: Yinghai Lu; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel On 09/27/2010 04:20 PM, Yinghai Lu wrote: > > x86 own version for find_area? > No, double no. Same kind of crap: overloading an interface with semantics it shouldn't have. The right thing is to introduce a new interface with carries the explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). That interface would have the explicit semantics of returning the lowest possible address, as opposed to any suitable address (which may change if policy requirements change.) The other question is why does kexec need this in the first place? Is this due to a design bug in kexec or is there some fundamental reason for this? -hpa ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:26 ` H. Peter Anvin @ 2010-09-27 23:32 ` Yinghai Lu 2010-09-27 23:34 ` H. Peter Anvin 0 siblings, 1 reply; 16+ messages in thread From: Yinghai Lu @ 2010-09-27 23:32 UTC (permalink / raw) To: H. Peter Anvin; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel On 09/27/2010 04:26 PM, H. Peter Anvin wrote: > On 09/27/2010 04:20 PM, Yinghai Lu wrote: >> >> x86 own version for find_area? >> > > No, double no. > > Same kind of crap: overloading an interface with semantics it shouldn't > have. The right thing is to introduce a new interface with carries the > explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). > > That interface would have the explicit semantics of returning the lowest > possible address, as opposed to any suitable address (which may change > if policy requirements change.) > > The other question is why does kexec need this in the first place? Is > this due to a design bug in kexec or is there some fundamental reason > for this? bzImage is used here. so need range below 4g. Yinghai ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:32 ` Yinghai Lu @ 2010-09-27 23:34 ` H. Peter Anvin 2010-09-27 23:41 ` Yinghai Lu 0 siblings, 1 reply; 16+ messages in thread From: H. Peter Anvin @ 2010-09-27 23:34 UTC (permalink / raw) To: Yinghai Lu; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel On 09/27/2010 04:32 PM, Yinghai Lu wrote: > On 09/27/2010 04:26 PM, H. Peter Anvin wrote: >> On 09/27/2010 04:20 PM, Yinghai Lu wrote: >>> >>> x86 own version for find_area? >>> >> >> No, double no. >> >> Same kind of crap: overloading an interface with semantics it shouldn't >> have. The right thing is to introduce a new interface with carries the >> explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). >> >> That interface would have the explicit semantics of returning the lowest >> possible address, as opposed to any suitable address (which may change >> if policy requirements change.) >> >> The other question is why does kexec need this in the first place? Is >> this due to a design bug in kexec or is there some fundamental reason >> for this? > > bzImage is used here. so need range below 4g. > OK, so why don't you cap the range to 4 GiB and then pass that down to the existing interface? That's different from "lowest possible address". -hpa ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:34 ` H. Peter Anvin @ 2010-09-27 23:41 ` Yinghai Lu 2010-09-28 0:53 ` Vivek Goyal 0 siblings, 1 reply; 16+ messages in thread From: Yinghai Lu @ 2010-09-27 23:41 UTC (permalink / raw) To: H. Peter Anvin; +Cc: caiqian, Ingo Molnar, kexec, linux-kernel On 09/27/2010 04:34 PM, H. Peter Anvin wrote: > On 09/27/2010 04:32 PM, Yinghai Lu wrote: >> On 09/27/2010 04:26 PM, H. Peter Anvin wrote: >>> On 09/27/2010 04:20 PM, Yinghai Lu wrote: >>>> >>>> x86 own version for find_area? >>>> >>> >>> No, double no. >>> >>> Same kind of crap: overloading an interface with semantics it shouldn't >>> have. The right thing is to introduce a new interface with carries the >>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). >>> >>> That interface would have the explicit semantics of returning the lowest >>> possible address, as opposed to any suitable address (which may change >>> if policy requirements change.) >>> >>> The other question is why does kexec need this in the first place? Is >>> this due to a design bug in kexec or is there some fundamental reason >>> for this? >> >> bzImage is used here. so need range below 4g. >> > > OK, so why don't you cap the range to 4 GiB and then pass that down to > the existing interface? That's different from "lowest possible address". but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly. and crashkernel=4096M, we could get failure again. maybe something like this, will give it a try, hope kexec doesn't have other limitation. [PATCH -v3] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v3: don't use loop for find low one Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + crash_base = memblock_find_in_range(alignment, 0xffffffff, + crash_size, alignment); + if (crash_base == MEMBLOCK_ERROR) { - pr_info("crashkernel reservation failed - No suitable area found.\n"); - return; + crash_base = memblock_find_in_range(alignment, + ULONG_MAX, crash_size, alignment); + + if (crash_base == MEMBLOCK_ERROR) { + pr_info("crashkernel reservation failed - No suitable area found.\n"); + return; + } } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:41 ` Yinghai Lu @ 2010-09-28 0:53 ` Vivek Goyal 2010-09-28 2:41 ` Yinghai Lu 2010-09-28 3:46 ` H. Peter Anvin 0 siblings, 2 replies; 16+ messages in thread From: Vivek Goyal @ 2010-09-28 0:53 UTC (permalink / raw) To: Yinghai Lu; +Cc: H. Peter Anvin, Ingo Molnar, kexec, caiqian, linux-kernel On Mon, Sep 27, 2010 at 04:41:31PM -0700, Yinghai Lu wrote: > On 09/27/2010 04:34 PM, H. Peter Anvin wrote: > > On 09/27/2010 04:32 PM, Yinghai Lu wrote: > >> On 09/27/2010 04:26 PM, H. Peter Anvin wrote: > >>> On 09/27/2010 04:20 PM, Yinghai Lu wrote: > >>>> > >>>> x86 own version for find_area? > >>>> > >>> > >>> No, double no. > >>> > >>> Same kind of crap: overloading an interface with semantics it shouldn't > >>> have. The right thing is to introduce a new interface with carries the > >>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). > >>> > >>> That interface would have the explicit semantics of returning the lowest > >>> possible address, as opposed to any suitable address (which may change > >>> if policy requirements change.) > >>> > >>> The other question is why does kexec need this in the first place? Is > >>> this due to a design bug in kexec or is there some fundamental reason > >>> for this? > >> > >> bzImage is used here. so need range below 4g. > >> > > > > OK, so why don't you cap the range to 4 GiB and then pass that down to > > the existing interface? That's different from "lowest possible address". > > but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly. > and crashkernel=4096M, we could get failure again. > > maybe something like this, will give it a try, hope kexec doesn't have other limitation. > > [PATCH -v3] x86, memblock: Fix crashkernel allocation > > Cai Qian found crashkernel is broken with x86 memblock changes > 1. crashkernel=128M@32M always reported that range is used, even first kernel is small > no one use that range > 2. always get following report when using "kexec -p" > Could not find a free area of memory of a000 bytes... > locate_hole failed > > The root cause is that generic memblock_find_in_range() will try to get range from top_down. > But crashkernel do need from low and specified range. > > Let's limit the target range with rash_base + crash_size to make sure that > We get range from bottom. > > -v3: don't use loop for find low one > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/setup.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v > if (crash_base <= 0) { > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, > - alignment); > + crash_base = memblock_find_in_range(alignment, 0xffffffff, > + crash_size, alignment); > + Actually, hardcoding the upper limit to 4G is probably not the best idea. Kexec loads the the relocatable binary (purgatory) and I remember that one of the generated relocation type was signed 32 bit and allowed max value to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. I liked HPA's other idea better of introducing memblock_find_in_range_lowest() so that we search bottom up and not rely on a specific upper limit. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 0:53 ` Vivek Goyal @ 2010-09-28 2:41 ` Yinghai Lu 2010-09-28 3:46 ` H. Peter Anvin 1 sibling, 0 replies; 16+ messages in thread From: Yinghai Lu @ 2010-09-28 2:41 UTC (permalink / raw) To: Vivek Goyal; +Cc: H. Peter Anvin, Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 05:53 PM, Vivek Goyal wrote: > Actually, hardcoding the upper limit to 4G is probably not the best idea. > Kexec loads the the relocatable binary (purgatory) and I remember that > one of the generated relocation type was signed 32 bit and allowed max value > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. also kexec want bzImage under 37ffffff. > > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > so that we search bottom up and not rely on a specific upper limit. > Please check. [PATCH -v4] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v4: add find_memblock_find_in_range_lowest() according to hpa and vivik. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/memblock.h | 2 + arch/x86/kernel/setup.c | 8 +++--- arch/x86/mm/memblock.c | 52 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 4 deletions(-) Index: linux-2.6/arch/x86/mm/memblock.c =================================================================== --- linux-2.6.orig/arch/x86/mm/memblock.c +++ linux-2.6/arch/x86/mm/memblock.c @@ -352,3 +352,55 @@ u64 __init memblock_x86_hole_size(u64 st return end - start - ((u64)ram << PAGE_SHIFT); } + +/* Check for already reserved areas */ +static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align) +{ + u64 addr = *addrp; + bool changed = false; + struct memblock_region *r; +again: + for_each_memblock(reserved, r) { + if ((addr + size) > r->base && addr < (r->base + r->size)) { + addr = round_up(r->base + r->size, align); + changed = true; + goto again; + } + } + + if (changed) + *addrp = addr; + + return changed; +} + +/* + * Find a free area with specified alignment in a specific range from bottom up + */ +u64 __init memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align) +{ + struct memblock_region *r; + + for_each_memblock(memory, r) { + u64 ei_start = r->base; + u64 ei_last = ei_start + r->size; + u64 addr, last; + + addr = round_up(ei_start, align); + if (addr < start) + addr = round_up(start, align); + if (addr >= ei_last) + continue; + while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last) + ; + last = addr + size; + if (last > ei_last) + continue; + if (last > end) + continue; + + return addr; + } + + return MEMBLOCK_ERROR; +} Index: linux-2.6/arch/x86/include/asm/memblock.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/memblock.h +++ linux-2.6/arch/x86/include/asm/memblock.h @@ -18,4 +18,6 @@ u64 memblock_x86_find_in_range_node(int u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit); u64 memblock_x86_memory_in_range(u64 addr, u64 limit); +u64 memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align); + #endif Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -518,8 +518,8 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + crash_base = memblock_find_in_range_lowest(alignment, + ULONG_MAX, crash_size, alignment); if (crash_base == MEMBLOCK_ERROR) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; @@ -527,8 +527,8 @@ static void __init reserve_crashkernel(v } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 0:53 ` Vivek Goyal 2010-09-28 2:41 ` Yinghai Lu @ 2010-09-28 3:46 ` H. Peter Anvin 2010-09-28 7:14 ` Yinghai Lu 2010-09-28 13:54 ` Vivek Goyal 1 sibling, 2 replies; 16+ messages in thread From: H. Peter Anvin @ 2010-09-28 3:46 UTC (permalink / raw) To: Vivek Goyal; +Cc: Yinghai Lu, Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 05:53 PM, Vivek Goyal wrote: > > Actually, hardcoding the upper limit to 4G is probably not the best idea. > Kexec loads the the relocatable binary (purgatory) and I remember that > one of the generated relocation type was signed 32 bit and allowed max value > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. > > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > so that we search bottom up and not rely on a specific upper limit. > No, it's just another crappy hack which is broken in the same way. It's better than open-coding, but it's still a hack. The Right Thing[TM] to do is for kexec to communicate the topmost address it wants to this code, so it has both the upper and the lower boundaries available to it instead of just one. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 3:46 ` H. Peter Anvin @ 2010-09-28 7:14 ` Yinghai Lu 2010-09-28 14:01 ` Vivek Goyal 2010-09-28 13:54 ` Vivek Goyal 1 sibling, 1 reply; 16+ messages in thread From: Yinghai Lu @ 2010-09-28 7:14 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Vivek Goyal, Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 08:46 PM, H. Peter Anvin wrote: > On 09/27/2010 05:53 PM, Vivek Goyal wrote: >> >> Actually, hardcoding the upper limit to 4G is probably not the best idea. >> Kexec loads the the relocatable binary (purgatory) and I remember that >> one of the generated relocation type was signed 32 bit and allowed max value >> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. >> >> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() >> so that we search bottom up and not rely on a specific upper limit. >> > > No, it's just another crappy hack which is broken in the same way. It's > better than open-coding, but it's still a hack. > > The Right Thing[TM] to do is for kexec to communicate the topmost > address it wants to this code, so it has both the upper and the lower > boundaries available to it instead of just one. hope you are happy with this one. [PATCH -v5] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge. also second try for vmlinux or new kexec tools will use bzImage 64bit entry Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -501,6 +501,7 @@ static inline unsigned long long get_tot return total << PAGE_SHIFT; } +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF static void __init reserve_crashkernel(void) { unsigned long long total_mem; @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + /* + * Assume half crash_size is for bzImage + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX + */ + crash_base = memblock_find_in_range(alignment, + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, + crash_size, alignment); + if (crash_base == MEMBLOCK_ERROR) { - pr_info("crashkernel reservation failed - No suitable area found.\n"); - return; + crash_base = memblock_find_in_range(alignment, + ULONG_MAX, crash_size, alignment); + + if (crash_base == MEMBLOCK_ERROR) { + pr_info("crashkernel reservation failed - No suitable area found.\n"); + return; + } } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 7:14 ` Yinghai Lu @ 2010-09-28 14:01 ` Vivek Goyal 0 siblings, 0 replies; 16+ messages in thread From: Vivek Goyal @ 2010-09-28 14:01 UTC (permalink / raw) To: Yinghai Lu; +Cc: H. Peter Anvin, Ingo Molnar, kexec, caiqian, linux-kernel On Tue, Sep 28, 2010 at 12:14:31AM -0700, Yinghai Lu wrote: > On 09/27/2010 08:46 PM, H. Peter Anvin wrote: > > On 09/27/2010 05:53 PM, Vivek Goyal wrote: > >> > >> Actually, hardcoding the upper limit to 4G is probably not the best idea. > >> Kexec loads the the relocatable binary (purgatory) and I remember that > >> one of the generated relocation type was signed 32 bit and allowed max value > >> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. > >> > >> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > >> so that we search bottom up and not rely on a specific upper limit. > >> > > > > No, it's just another crappy hack which is broken in the same way. It's > > better than open-coding, but it's still a hack. > > > > The Right Thing[TM] to do is for kexec to communicate the topmost > > address it wants to this code, so it has both the upper and the lower > > boundaries available to it instead of just one. > > hope you are happy with this one. > > [PATCH -v5] x86, memblock: Fix crashkernel allocation > > Cai Qian found crashkernel is broken with x86 memblock changes > 1. crashkernel=128M@32M always reported that range is used, even first kernel is small > no one use that range > 2. always get following report when using "kexec -p" > Could not find a free area of memory of a000 bytes... > locate_hole failed > > The root cause is that generic memblock_find_in_range() will try to get range from top_down. > But crashkernel do need from low and specified range. > > Let's limit the target range with rash_base + crash_size to make sure that > We get range from bottom. > > -v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge. > also second try for vmlinux or new kexec tools will use bzImage 64bit entry > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/setup.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -501,6 +501,7 @@ static inline unsigned long long get_tot > return total << PAGE_SHIFT; > } > > +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF > static void __init reserve_crashkernel(void) > { > unsigned long long total_mem; > @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v > if (crash_base <= 0) { > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, > - alignment); > + /* > + * Assume half crash_size is for bzImage > + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX > + */ > + crash_base = memblock_find_in_range(alignment, > + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, > + crash_size, alignment); > + IMHO, these kind of hardcodings are worse than finding the lowest possible address. It is assuming that kexec is going to load a bzImage. So we have following three options sorted from best to worst. - Specify upper limit in "crashkernel=" command line syntax - Find the lowest possible address for crashkernel reservations - Hardcode upper limit based on certain factors. Because upper limit depends on image being loaded and can also vary as kexec-tools changes, knowing it for sure will require extra reboot. It also make command line syntax more complicated as we need to introduce another field to speciy upper limit. Especially for the following case. crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] So personally I think we can stick to second best option and that is finding the lowest possible memory area. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 3:46 ` H. Peter Anvin 2010-09-28 7:14 ` Yinghai Lu @ 2010-09-28 13:54 ` Vivek Goyal 1 sibling, 0 replies; 16+ messages in thread From: Vivek Goyal @ 2010-09-28 13:54 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Yinghai Lu, Ingo Molnar, kexec, caiqian, linux-kernel On Mon, Sep 27, 2010 at 08:46:42PM -0700, H. Peter Anvin wrote: > On 09/27/2010 05:53 PM, Vivek Goyal wrote: > > > > Actually, hardcoding the upper limit to 4G is probably not the best idea. > > Kexec loads the the relocatable binary (purgatory) and I remember that > > one of the generated relocation type was signed 32 bit and allowed max value > > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. > > > > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > > so that we search bottom up and not rely on a specific upper limit. > > > > No, it's just another crappy hack which is broken in the same way. It's > better than open-coding, but it's still a hack. > > The Right Thing[TM] to do is for kexec to communicate the topmost > address it wants to this code, so it has both the upper and the lower > boundaries available to it instead of just one. > Being able to specify the upper limit would be the best thing so that kernel does not have to make any assumptions or hardcode anything. Question is how to determine the upper limit. - The upper limit will depend on what is being loaded in reserved region. Reserving memory using crashkernel= is a boot time optin and that point of time kexec has not even run. So we don't know what is the upper limit. Now we can do extra reboot to make it happen. Boot first kernel without reserving any memory. Introduce an option in kexec which tells user what are the segments kexec would like to load (for a given binary) and what are there upper memory limits and then user goes ahead modifies the command line and reboots the kernel back. This all sounds not so clean. Especially upper limit might change based on binary being loaded and a user might have to perform a reboot again. So to me trying to get lowest memory available possible for crashkernel reservations is not that a bad idea. It is certainly better than making hardcoded assumptions about the upper limit. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-09-28 14:01 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <870873343.2003871285555329846.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 6:31 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" Yinghai Lu
2010-09-27 9:16 ` CAI Qian
[not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 11:21 ` caiqian
2010-09-27 22:22 ` Yinghai Lu
2010-09-27 22:50 ` H. Peter Anvin
2010-09-27 23:20 ` Yinghai Lu
2010-09-27 23:26 ` H. Peter Anvin
2010-09-27 23:32 ` Yinghai Lu
2010-09-27 23:34 ` H. Peter Anvin
2010-09-27 23:41 ` Yinghai Lu
2010-09-28 0:53 ` Vivek Goyal
2010-09-28 2:41 ` Yinghai Lu
2010-09-28 3:46 ` H. Peter Anvin
2010-09-28 7:14 ` Yinghai Lu
2010-09-28 14:01 ` Vivek Goyal
2010-09-28 13:54 ` Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox