* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" [not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-09-27 11:21 ` caiqian 2010-09-27 22:22 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: caiqian @ 2010-09-27 11:21 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin ----- "CAI Qian" <caiqian@redhat.com> wrote: > ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > > > Please check this one on top of tip or next. > This failed for both trees. > [root@localhost linux-next]# patch -Np1 <memblock.patch > patching file arch/x86/kernel/setup.c > Hunk #1 FAILED at 516. > 1 out of 1 hunk FAILED -- saving rejects to file > arch/x86/kernel/setup.c.rej After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said, Warning: Core image elf header is notsane Kdump: vmcore not initialized Here is the dmesg from the second kernel, Initializing cgroup subsys cpuset Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010 Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063 BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009f400 (usable) BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable) BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved) BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable) last_pfn = 0xca0000 max_arch_pfn = 0x400000000 NX (Execute Disable) protection: active user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000002000000 - 0000000009f5a000 (usable) DMI 2.4 present. e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) e820 remove range: 00000000000a0000 - 0000000000100000 (usable) No AGP bridge found last_pfn = 0x9f5a max_arch_pfn = 0x400000000 MTRR default type: write-back MTRR fixed ranges enabled: 00000-9FFFF write-back A0000-BFFFF uncachable C0000-FFFFF write-protect MTRR variable ranges enabled: 0 base 00E0000000 mask FFE0000000 uncachable 1 disabled 2 disabled 3 disabled 4 disabled 5 disabled 6 disabled 7 disabled PAT not supported by CPU. found SMP MP-table at [ffff8800000f7fb0] f7fb0 initial memory mapped : 0 - 20000000 init_memory_mapping: 0000000000000000-0000000009f5a000 0000000000 - 0009e00000 page 2M 0009e00000 - 0009f5a000 page 4k kernel direct mapping tables up to 9f5a000 @ 9f57000-9f5a000 RAMDISK: 09ae5000 - 09f49000 crashkernel reservation failed - No suitable area found. ACPI: RSDP 00000000000f7f60 00014 (v00 BOCHS ) ACPI: RSDT 00000000dfffd890 00030 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001) ACPI: FACP 00000000dffffa30 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001) ACPI: DSDT 00000000dfffdb70 01E4B (v01 BXPC BXDSDT 00000001 INTL 20090123) ACPI: FACS 00000000dffff9c0 00040 ACPI: SSDT 00000000dfffda40 0012F (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001) ACPI: APIC 00000000dfffd8c0 0010A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001) ACPI: Local APIC address 0xfee00000 No NUMA configuration found Faking a node at 0000000000000000-0000000009f5a000 Initmem setup node 0 0000000000000000-0000000009f5a000 NODE_DATA [0000000009abe000 - 0000000009ae4fff] kvm-clock: Using msrs 12 and 11 kvm-clock: cpu 0, msr 0:28c3741, boot clock [ffffea0000000000-ffffea00003fffff] PMD -> [ffff880008e00000-ffff8800091fffff] on node 0 sizeof(struct page) = 56 Zone PFN ranges: DMA 0x00000010 -> 0x00001000 DMA32 0x00001000 -> 0x00100000 Normal empty Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0x00000010 -> 0x000000a0 0: 0x00002000 -> 0x00009f5a On node 0 totalpages: 32746 DMA zone: 56 pages used for memmap DMA zone: 7 pages reserved DMA zone: 81 pages, LIFO batch:0 DMA32 zone: 502 pages used for memmap DMA32 zone: 32100 pages, LIFO batch:7 ACPI: PM-Timer IO Port: 0xb008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x10] enabled) ACPI: LAPIC (acpi_id[0x11] lapic_id[0x11] enabled) ACPI: LAPIC (acpi_id[0x12] lapic_id[0x12] enabled) ACPI: LAPIC (acpi_id[0x13] lapic_id[0x13] enabled) ACPI: IOAPIC (id[0x14] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 20, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ5 used by override. ACPI: IRQ9 used by override. ACPI: IRQ10 used by override. ACPI: IRQ11 used by override. Using ACPI (MADT) for SMP configuration information SMP: Allowing 20 CPUs, 0 hotplug CPUs nr_irqs_gsi: 40 PM: Registered nosave memory: 00000000000a0000 - 0000000002000000 Allocating PCI resources starting at 9f5a000 (gap: 9f5a000:f60a6000) Booting paravirtualized kernel on KVM setup_percpu: NR_CPUS:4096 nr_cpumask_bits:20 nr_cpu_ids:20 nr_node_ids:1 PERCPU: Embedded 29 pages/cpu @ffff880009400000 s86912 r8192 d23680 u262144 pcpu-alloc: s86912 r8192 d23680 u262144 alloc=1*2097152 pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 pcpu-alloc: [0] 16 17 18 19 -- -- -- -- kvm-clock: cpu 0, msr 0:9414741, primary cpu clock Built 1 zonelists in Node order, mobility grouping on. Total pages: 32181 Policy zone: DMA32 Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063 Misrouted IRQ fixup and polling support enabled This may significantly impact system performance Disabling memory control group subsystem PID hash table entries: 512 (order: 0, 4096 bytes) Checking aperture... No AGP bridge found Memory: 103484k/163176k available (4267k kernel code, 32192k absent, 27500k reserved, 4617k data, 2484k init) Hierarchical RCU implementation. RCU-based detection of stalled CPUs is disabled. Verbose stalled-CPUs detection is disabled. NR_IRQS:262400 nr_irqs:840 Spurious LAPIC timer interrupt on cpu 0 Console: colour VGA+ 80x25 console [tty0] enabled console [ttyS0] enabled Detected 1995.358 MHz processor. Calibrating delay loop (skipped) preset value.. 3990.71 BogoMIPS (lpj=1995358) pid_max: default: 32768 minimum: 301 Security Framework initialized SELinux: Initializing. SELinux: Starting in permissive mode Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) Inode-cache hash table entries: 8192 (order: 4, 65536 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct Initializing cgroup subsys memory Initializing cgroup subsys devices Initializing cgroup subsys freezer Initializing cgroup subsys net_cls mce: CPU supports 10 MCE banks Performance Events: p6 PMU driver. ... version: 0 ... bit width: 32 ... generic registers: 2 ... value mask: 00000000ffffffff ... max period: 000000007fffffff ... fixed-purpose events: 0 ... event mask: 0000000000000003 SMP alternatives: switching to UP code ACPI: Core revision 20100702 Setting APIC routing to physical flat ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03 Brought up 1 CPUs Total of 1 processors activated (3990.71 BogoMIPS). devtmpfs: initialized regulator: core version 0.5 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0 IRQ 9: starting IRQFIXUP_POLL ACPI: EC: Look up EC in DSDT ACPI: Interpreter enabled ACPI: (supports S0 S3 S4 S5) ACPI: Using IOAPIC for interrupt routing ACPI: No dock devices found. PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored) pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored) pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f] pci 0000:00:01.2: reg 20: [io 0xc020-0xc03f] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff] pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref] pci 0000:00:03.0: reg 10: [io 0xc100-0xc1ff] pci 0000:00:03.0: reg 14: [mem 0xf2020000-0xf20200ff] pci 0000:00:03.0: reg 30: [mem 0xf2030000-0xf203ffff pref] pci 0000:00:04.0: reg 10: [io 0xc400-0xc7ff] pci 0000:00:04.0: reg 14: [io 0xc800-0xc8ff] pci 0000:00:05.0: reg 10: [io 0xc900-0xc91f] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none vgaarb: loaded SCSI subsystem initialized libata version 3.00 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: pci_cache_line_size set to 64 bytes reserve RAM buffer: 0000000009f5a000 - 000000000bffffff NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default Switching to clocksource kvm-clock pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 6 devices ACPI: ACPI bus type pnp unregistered pci_bus 0000:00: resource 0 [io 0x0000-0xffff] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff] NET: Registered protocol family 2 IP route cache hash table entries: 1024 (order: 1, 8192 bytes) TCP established hash table entries: 4096 (order: 4, 65536 bytes) TCP bind hash table entries: 4096 (order: 4, 65536 bytes) TCP: Hash tables configured (established 4096 bind 4096) TCP reno registered UDP hash table entries: 128 (order: 0, 4096 bytes) UDP-Lite hash table entries: 128 (order: 0, 4096 bytes) NET: Registered protocol family 1 pci 0000:00:00.0: Limiting direct PCI/PCI transfers pci 0000:00:01.0: Activating ISA DMA hang workarounds pci 0000:00:02.0: Boot video device PCI: CLS 64 bytes, default 64 Trying to unpack rootfs image as initramfs... Freeing initrd memory: 4496k freed audit: initializing netlink socket (disabled) type=2000 audit(1285586109.207:1): initialized HugeTLB registered 2 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Warning: Core image elf header is notsane Kdump: vmcore not initialized > > > > > Thanks > > > > Yinghai > > > > [PATCH] x86, memblock: Fix crashkernel allocation > > > > Cai Qian found that crashkernel is broken with x86 memblock changes > > 1. crashkernel=128M@32M always reported that range is used, even > first > > kernel is small > > no one use that range > > 2. always get following report when using "kexec -p" > > Could not find a free area of memory of a000 bytes... > > locate_hole failed > > > > The root cause is that generic memblock_find_in_range() will try to > > get range from top_down. > > But crashkernel do need from low and specified range. > > > > Let's limit the target range with rash_base + crash_size to make > sure > > that > > We get range from bottom. > > > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > > > --- > > arch/x86/kernel/setup.c | 19 ++++++++++++++----- > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > Index: linux-2.6/arch/x86/kernel/setup.c > > =================================================================== > > --- linux-2.6.orig/arch/x86/kernel/setup.c > > +++ linux-2.6/arch/x86/kernel/setup.c > > @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v > > > > /* 0 means: find the address automatically */ > > if (crash_base <= 0) { > > + unsigned long long start = 0; > > const unsigned long long alignment = 16<<20; /* 16M */ > > > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, > > crash_size, > > - alignment); > > - if (crash_base == MEMBLOCK_ERROR) { > > + crash_base = alignment; > > + while (crash_base < 0xffffffff) { > > + start = memblock_find_in_range(crash_base, > > + crash_base + crash_size, crash_size, alignment); > > + > > + if (start == crash_base) > > + break; > > + > > + crash_base += alignment; > > + } > > + if (start != crash_base) { > > pr_info("crashkernel reservation failed - No suitable area > > found.\n"); > > return; > > } > > } else { > > unsigned long long start; > > > > - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, > > - 1<<20); > > + start = memblock_find_in_range(crash_base, > > + crash_base + crash_size, crash_size, 1<<20); > > if (start != crash_base) { > > pr_info("crashkernel reservation failed - memory is in use.\n"); > > return; > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 11:21 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" caiqian @ 2010-09-27 22:22 ` Yinghai Lu 2010-09-27 22:50 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-27 22:22 UTC (permalink / raw) To: caiqian; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin [-- Attachment #1: Type: text/plain, Size: 2632 bytes --] On 09/27/2010 04:21 AM, caiqian@redhat.com wrote: > > ----- "CAI Qian" <caiqian@redhat.com> wrote: > >> ----- "Yinghai Lu" <yinghai@kernel.org> wrote: >> >>> Please check this one on top of tip or next. >> This failed for both trees. >> [root@localhost linux-next]# patch -Np1 <memblock.patch >> patching file arch/x86/kernel/setup.c >> Hunk #1 FAILED at 516. >> 1 out of 1 hunk FAILED -- saving rejects to file >> arch/x86/kernel/setup.c.rej > After manually applied the patch on the top of the latest mmotm tree, now there was no /proc/vmcore exported to the second kernel anymore. It could be the results of other recent commits in mmotm though. It said, > > Warning: Core image elf header is notsane > Kdump: vmcore not initialized > > Here is the dmesg from the second kernel, > > Initializing cgroup subsys cpuset > Linux version 2.6.36-rc5-mm1+ (root@localhost.localdomain) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #6 SMP Mon Sep 27 07:00:15 EDT 2010 > Command line: ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=130408K@32768K elfcorehdr=163176K kexec_jump_back_entry=0x000000000232f063 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000100 - 000000000009f400 (usable) > BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 00000000dfffb000 (usable) > BIOS-e820: 00000000dfffb000 - 00000000e0000000 (reserved) > BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved) > BIOS-e820: 0000000100000000 - 0000000ca0000000 (usable) > last_pfn = 0xca0000 max_arch_pfn = 0x400000000 > NX (Execute Disable) protection: active > user-defined physical RAM map: > user: 0000000000000000 - 00000000000a0000 (usable) > user: 0000000002000000 - 0000000009f5a000 (usable) ... > Dquot-cache hash table entries: 512 (order 0, 4096 bytes) > Warning: Core image elf header is notsane > Kdump: vmcore not initialized > >> it should work on tip..., I tested on RHEL 6.0 beta. with /etc/init.d/kdump restart BTW, second kernel is not supposed to take crashkernel=128M again. /etc/init.d/kdump scripts remove that while using /proc/cmdline. please refer http://people.redhat.com/mingo/tip.git/readme.txt to get tip/master and apply attached patch cat crashkernel_limit.patch | patch -p1 Thanks Yinghai [-- Attachment #2: crashkernel_limit.patch --] [-- Type: text/x-patch, Size: 2230 bytes --] [PATCH -v2] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v2: don't limit it with 0xffffffff, in case kexec will use bzImage 64bit entry or vmlinux, and try to allocate huge area for crashkernel. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v /* 0 means: find the address automatically */ if (crash_base <= 0) { + unsigned long long start = 0; const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); - if (crash_base == MEMBLOCK_ERROR) { + crash_base = alignment; + while ((crash_base + crash_size) <= total_mem) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; [-- Attachment #3: Type: text/plain, Size: 143 bytes --] _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 22:22 ` Yinghai Lu @ 2010-09-27 22:50 ` H. Peter Anvin 2010-09-27 23:20 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2010-09-27 22:50 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel + crash_base = alignment; + while ((crash_base + crash_size) <= total_mem) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { Open-coded crap violation error! Seriously, these kinds of open-coded loops are *never* acceptable, since they are really "let's violate the interface by making it do something it wasn't intended to do" -- it means we need a new interface. Alternatively, if we really need the lowest possible address, why do we need to search? -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 22:50 ` H. Peter Anvin @ 2010-09-27 23:20 ` Yinghai Lu 2010-09-27 23:26 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-27 23:20 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 03:50 PM, H. Peter Anvin wrote: > + crash_base = alignment; > + while ((crash_base + crash_size) <= total_mem) { > + start = memblock_find_in_range(crash_base, > + crash_base + crash_size, crash_size, alignment); > + > + if (start == crash_base) > + break; > + > + crash_base += alignment; > + } > + if (start != crash_base) { > > Open-coded crap violation error! > > Seriously, these kinds of open-coded loops are *never* acceptable, since > they are really "let's violate the interface by making it do something > it wasn't intended to do" -- it means we need a new interface. > > Alternatively, if we really need the lowest possible address, why do we > need to search? x86 own version for find_area? Subject: [PATCH] x86, memblock: Add x86 version of memblock_find_in_range() Generic version is going from high to low, and it seems it can not find right area compact enough. the x86 version will go from goal to limit and just like the way We used for early_res use ARCH_FIND_MEMBLOCK_AREA to select from them. Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/Kconfig | 8 +++++++ arch/x86/mm/memblock.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++ mm/memblock.c | 2 - 3 files changed, 63 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/mm/memblock.c =================================================================== --- linux-2.6.orig/arch/x86/mm/memblock.c +++ linux-2.6/arch/x86/mm/memblock.c @@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st return end - start - ((u64)ram << PAGE_SHIFT); } + +#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA +/* Check for already reserved areas */ +static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align) +{ + u64 addr = *addrp; + bool changed = false; + struct memblock_region *r; +again: + for_each_memblock(reserved, r) { + if ((addr + size) > r->base && addr < (r->base + r->size)) { + addr = round_up(r->base + r->size, align); + changed = true; + goto again; + } + } + + if (changed) + *addrp = addr; + + return changed; +} + +/* + * Find a free area with specified alignment in a specific range. + */ +u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) +{ + struct memblock_region *r; + + for_each_memblock(memory, r) { + u64 ei_start = r->base; + u64 ei_last = ei_start + r->size; + u64 addr, last; + + addr = round_up(ei_start, align); + if (addr < start) + addr = round_up(start, align); + if (addr >= ei_last) + continue; + while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last) + ; + last = addr + size; + if (last > ei_last) + continue; + if (last > end) + continue; + + return addr; + } + + return MEMBLOCK_ERROR; +} +#endif Index: linux-2.6/arch/x86/Kconfig =================================================================== --- linux-2.6.orig/arch/x86/Kconfig +++ linux-2.6/arch/x86/Kconfig @@ -569,6 +569,14 @@ config PARAVIRT_DEBUG Enable to debug paravirt_ops internals. Specifically, BUG if a paravirt_op is missing when it is called. +config ARCH_MEMBLOCK_FIND_AREA + default y + bool "Use x86 own memblock_find_in_range()" + ---help--- + Use memblock_find_in_range() version instead of generic version, it get free + area up from low. + Generic one try to get free area down from limit. + config NO_BOOTMEM def_bool y Index: linux-2.6/mm/memblock.c =================================================================== --- linux-2.6.orig/mm/memblock.c +++ linux-2.6/mm/memblock.c @@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl /* * Find a free area with specified alignment in a specific range. */ -u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) +u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) { return memblock_find_base(size, align, start, end); } _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:20 ` Yinghai Lu @ 2010-09-27 23:26 ` H. Peter Anvin 2010-09-27 23:32 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2010-09-27 23:26 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 04:20 PM, Yinghai Lu wrote: > > x86 own version for find_area? > No, double no. Same kind of crap: overloading an interface with semantics it shouldn't have. The right thing is to introduce a new interface with carries the explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). That interface would have the explicit semantics of returning the lowest possible address, as opposed to any suitable address (which may change if policy requirements change.) The other question is why does kexec need this in the first place? Is this due to a design bug in kexec or is there some fundamental reason for this? -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:26 ` H. Peter Anvin @ 2010-09-27 23:32 ` Yinghai Lu 2010-09-27 23:34 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-27 23:32 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 04:26 PM, H. Peter Anvin wrote: > On 09/27/2010 04:20 PM, Yinghai Lu wrote: >> >> x86 own version for find_area? >> > > No, double no. > > Same kind of crap: overloading an interface with semantics it shouldn't > have. The right thing is to introduce a new interface with carries the > explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). > > That interface would have the explicit semantics of returning the lowest > possible address, as opposed to any suitable address (which may change > if policy requirements change.) > > The other question is why does kexec need this in the first place? Is > this due to a design bug in kexec or is there some fundamental reason > for this? bzImage is used here. so need range below 4g. Yinghai _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:32 ` Yinghai Lu @ 2010-09-27 23:34 ` H. Peter Anvin 2010-09-27 23:41 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2010-09-27 23:34 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 04:32 PM, Yinghai Lu wrote: > On 09/27/2010 04:26 PM, H. Peter Anvin wrote: >> On 09/27/2010 04:20 PM, Yinghai Lu wrote: >>> >>> x86 own version for find_area? >>> >> >> No, double no. >> >> Same kind of crap: overloading an interface with semantics it shouldn't >> have. The right thing is to introduce a new interface with carries the >> explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). >> >> That interface would have the explicit semantics of returning the lowest >> possible address, as opposed to any suitable address (which may change >> if policy requirements change.) >> >> The other question is why does kexec need this in the first place? Is >> this due to a design bug in kexec or is there some fundamental reason >> for this? > > bzImage is used here. so need range below 4g. > OK, so why don't you cap the range to 4 GiB and then pass that down to the existing interface? That's different from "lowest possible address". -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:34 ` H. Peter Anvin @ 2010-09-27 23:41 ` Yinghai Lu 2010-09-28 0:53 ` Vivek Goyal 0 siblings, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-27 23:41 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, linux-kernel On 09/27/2010 04:34 PM, H. Peter Anvin wrote: > On 09/27/2010 04:32 PM, Yinghai Lu wrote: >> On 09/27/2010 04:26 PM, H. Peter Anvin wrote: >>> On 09/27/2010 04:20 PM, Yinghai Lu wrote: >>>> >>>> x86 own version for find_area? >>>> >>> >>> No, double no. >>> >>> Same kind of crap: overloading an interface with semantics it shouldn't >>> have. The right thing is to introduce a new interface with carries the >>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). >>> >>> That interface would have the explicit semantics of returning the lowest >>> possible address, as opposed to any suitable address (which may change >>> if policy requirements change.) >>> >>> The other question is why does kexec need this in the first place? Is >>> this due to a design bug in kexec or is there some fundamental reason >>> for this? >> >> bzImage is used here. so need range below 4g. >> > > OK, so why don't you cap the range to 4 GiB and then pass that down to > the existing interface? That's different from "lowest possible address". but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly. and crashkernel=4096M, we could get failure again. maybe something like this, will give it a try, hope kexec doesn't have other limitation. [PATCH -v3] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v3: don't use loop for find low one Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + crash_base = memblock_find_in_range(alignment, 0xffffffff, + crash_size, alignment); + if (crash_base == MEMBLOCK_ERROR) { - pr_info("crashkernel reservation failed - No suitable area found.\n"); - return; + crash_base = memblock_find_in_range(alignment, + ULONG_MAX, crash_size, alignment); + + if (crash_base == MEMBLOCK_ERROR) { + pr_info("crashkernel reservation failed - No suitable area found.\n"); + return; + } } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 23:41 ` Yinghai Lu @ 2010-09-28 0:53 ` Vivek Goyal 2010-09-28 2:41 ` Yinghai Lu 2010-09-28 3:46 ` H. Peter Anvin 0 siblings, 2 replies; 25+ messages in thread From: Vivek Goyal @ 2010-09-28 0:53 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, Ingo Molnar, kexec, caiqian, H. Peter Anvin On Mon, Sep 27, 2010 at 04:41:31PM -0700, Yinghai Lu wrote: > On 09/27/2010 04:34 PM, H. Peter Anvin wrote: > > On 09/27/2010 04:32 PM, Yinghai Lu wrote: > >> On 09/27/2010 04:26 PM, H. Peter Anvin wrote: > >>> On 09/27/2010 04:20 PM, Yinghai Lu wrote: > >>>> > >>>> x86 own version for find_area? > >>>> > >>> > >>> No, double no. > >>> > >>> Same kind of crap: overloading an interface with semantics it shouldn't > >>> have. The right thing is to introduce a new interface with carries the > >>> explicitly needed policy with it... e.g. memblock_find_in_range_lowest(). > >>> > >>> That interface would have the explicit semantics of returning the lowest > >>> possible address, as opposed to any suitable address (which may change > >>> if policy requirements change.) > >>> > >>> The other question is why does kexec need this in the first place? Is > >>> this due to a design bug in kexec or is there some fundamental reason > >>> for this? > >> > >> bzImage is used here. so need range below 4g. > >> > > > > OK, so why don't you cap the range to 4 GiB and then pass that down to > > the existing interface? That's different from "lowest possible address". > > but if later bzImage will use 64 entry and kexec honor it, or use 64bit vmlinux directly. > and crashkernel=4096M, we could get failure again. > > maybe something like this, will give it a try, hope kexec doesn't have other limitation. > > [PATCH -v3] x86, memblock: Fix crashkernel allocation > > Cai Qian found crashkernel is broken with x86 memblock changes > 1. crashkernel=128M@32M always reported that range is used, even first kernel is small > no one use that range > 2. always get following report when using "kexec -p" > Could not find a free area of memory of a000 bytes... > locate_hole failed > > The root cause is that generic memblock_find_in_range() will try to get range from top_down. > But crashkernel do need from low and specified range. > > Let's limit the target range with rash_base + crash_size to make sure that > We get range from bottom. > > -v3: don't use loop for find low one > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/setup.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -518,17 +518,23 @@ static void __init reserve_crashkernel(v > if (crash_base <= 0) { > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, > - alignment); > + crash_base = memblock_find_in_range(alignment, 0xffffffff, > + crash_size, alignment); > + Actually, hardcoding the upper limit to 4G is probably not the best idea. Kexec loads the the relocatable binary (purgatory) and I remember that one of the generated relocation type was signed 32 bit and allowed max value to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. I liked HPA's other idea better of introducing memblock_find_in_range_lowest() so that we search bottom up and not rely on a specific upper limit. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 0:53 ` Vivek Goyal @ 2010-09-28 2:41 ` Yinghai Lu 2010-09-28 3:46 ` H. Peter Anvin 1 sibling, 0 replies; 25+ messages in thread From: Yinghai Lu @ 2010-09-28 2:41 UTC (permalink / raw) To: Vivek Goyal; +Cc: linux-kernel, Ingo Molnar, kexec, caiqian, H. Peter Anvin On 09/27/2010 05:53 PM, Vivek Goyal wrote: > Actually, hardcoding the upper limit to 4G is probably not the best idea. > Kexec loads the the relocatable binary (purgatory) and I remember that > one of the generated relocation type was signed 32 bit and allowed max value > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. also kexec want bzImage under 37ffffff. > > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > so that we search bottom up and not rely on a specific upper limit. > Please check. [PATCH -v4] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v4: add find_memblock_find_in_range_lowest() according to hpa and vivik. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/memblock.h | 2 + arch/x86/kernel/setup.c | 8 +++--- arch/x86/mm/memblock.c | 52 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 4 deletions(-) Index: linux-2.6/arch/x86/mm/memblock.c =================================================================== --- linux-2.6.orig/arch/x86/mm/memblock.c +++ linux-2.6/arch/x86/mm/memblock.c @@ -352,3 +352,55 @@ u64 __init memblock_x86_hole_size(u64 st return end - start - ((u64)ram << PAGE_SHIFT); } + +/* Check for already reserved areas */ +static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align) +{ + u64 addr = *addrp; + bool changed = false; + struct memblock_region *r; +again: + for_each_memblock(reserved, r) { + if ((addr + size) > r->base && addr < (r->base + r->size)) { + addr = round_up(r->base + r->size, align); + changed = true; + goto again; + } + } + + if (changed) + *addrp = addr; + + return changed; +} + +/* + * Find a free area with specified alignment in a specific range from bottom up + */ +u64 __init memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align) +{ + struct memblock_region *r; + + for_each_memblock(memory, r) { + u64 ei_start = r->base; + u64 ei_last = ei_start + r->size; + u64 addr, last; + + addr = round_up(ei_start, align); + if (addr < start) + addr = round_up(start, align); + if (addr >= ei_last) + continue; + while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last) + ; + last = addr + size; + if (last > ei_last) + continue; + if (last > end) + continue; + + return addr; + } + + return MEMBLOCK_ERROR; +} Index: linux-2.6/arch/x86/include/asm/memblock.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/memblock.h +++ linux-2.6/arch/x86/include/asm/memblock.h @@ -18,4 +18,6 @@ u64 memblock_x86_find_in_range_node(int u64 memblock_x86_free_memory_in_range(u64 addr, u64 limit); u64 memblock_x86_memory_in_range(u64 addr, u64 limit); +u64 memblock_find_in_range_lowest(u64 start, u64 end, u64 size, u64 align); + #endif Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -518,8 +518,8 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + crash_base = memblock_find_in_range_lowest(alignment, + ULONG_MAX, crash_size, alignment); if (crash_base == MEMBLOCK_ERROR) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; @@ -527,8 +527,8 @@ static void __init reserve_crashkernel(v } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 0:53 ` Vivek Goyal 2010-09-28 2:41 ` Yinghai Lu @ 2010-09-28 3:46 ` H. Peter Anvin 2010-09-28 7:14 ` Yinghai Lu 2010-09-28 13:54 ` Vivek Goyal 1 sibling, 2 replies; 25+ messages in thread From: H. Peter Anvin @ 2010-09-28 3:46 UTC (permalink / raw) To: Vivek Goyal; +Cc: kexec, Ingo Molnar, Yinghai Lu, caiqian, linux-kernel On 09/27/2010 05:53 PM, Vivek Goyal wrote: > > Actually, hardcoding the upper limit to 4G is probably not the best idea. > Kexec loads the the relocatable binary (purgatory) and I remember that > one of the generated relocation type was signed 32 bit and allowed max value > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. > > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > so that we search bottom up and not rely on a specific upper limit. > No, it's just another crappy hack which is broken in the same way. It's better than open-coding, but it's still a hack. The Right Thing[TM] to do is for kexec to communicate the topmost address it wants to this code, so it has both the upper and the lower boundaries available to it instead of just one. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 3:46 ` H. Peter Anvin @ 2010-09-28 7:14 ` Yinghai Lu 2010-09-28 14:01 ` Vivek Goyal 2010-09-28 13:54 ` Vivek Goyal 1 sibling, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-28 7:14 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, kexec, caiqian, Vivek Goyal, linux-kernel On 09/27/2010 08:46 PM, H. Peter Anvin wrote: > On 09/27/2010 05:53 PM, Vivek Goyal wrote: >> >> Actually, hardcoding the upper limit to 4G is probably not the best idea. >> Kexec loads the the relocatable binary (purgatory) and I remember that >> one of the generated relocation type was signed 32 bit and allowed max value >> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. >> >> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() >> so that we search bottom up and not rely on a specific upper limit. >> > > No, it's just another crappy hack which is broken in the same way. It's > better than open-coding, but it's still a hack. > > The Right Thing[TM] to do is for kexec to communicate the topmost > address it wants to this code, so it has both the upper and the lower > boundaries available to it instead of just one. hope you are happy with this one. [PATCH -v5] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. -v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge. also second try for vmlinux or new kexec tools will use bzImage 64bit entry Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -501,6 +501,7 @@ static inline unsigned long long get_tot return total << PAGE_SHIFT; } +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF static void __init reserve_crashkernel(void) { unsigned long long total_mem; @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + /* + * Assume half crash_size is for bzImage + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX + */ + crash_base = memblock_find_in_range(alignment, + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, + crash_size, alignment); + if (crash_base == MEMBLOCK_ERROR) { - pr_info("crashkernel reservation failed - No suitable area found.\n"); - return; + crash_base = memblock_find_in_range(alignment, + ULONG_MAX, crash_size, alignment); + + if (crash_base == MEMBLOCK_ERROR) { + pr_info("crashkernel reservation failed - No suitable area found.\n"); + return; + } } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 7:14 ` Yinghai Lu @ 2010-09-28 14:01 ` Vivek Goyal 0 siblings, 0 replies; 25+ messages in thread From: Vivek Goyal @ 2010-09-28 14:01 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, Ingo Molnar, kexec, caiqian, H. Peter Anvin On Tue, Sep 28, 2010 at 12:14:31AM -0700, Yinghai Lu wrote: > On 09/27/2010 08:46 PM, H. Peter Anvin wrote: > > On 09/27/2010 05:53 PM, Vivek Goyal wrote: > >> > >> Actually, hardcoding the upper limit to 4G is probably not the best idea. > >> Kexec loads the the relocatable binary (purgatory) and I remember that > >> one of the generated relocation type was signed 32 bit and allowed max value > >> to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. > >> > >> I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > >> so that we search bottom up and not rely on a specific upper limit. > >> > > > > No, it's just another crappy hack which is broken in the same way. It's > > better than open-coding, but it's still a hack. > > > > The Right Thing[TM] to do is for kexec to communicate the topmost > > address it wants to this code, so it has both the upper and the lower > > boundaries available to it instead of just one. > > hope you are happy with this one. > > [PATCH -v5] x86, memblock: Fix crashkernel allocation > > Cai Qian found crashkernel is broken with x86 memblock changes > 1. crashkernel=128M@32M always reported that range is used, even first kernel is small > no one use that range > 2. always get following report when using "kexec -p" > Could not find a free area of memory of a000 bytes... > locate_hole failed > > The root cause is that generic memblock_find_in_range() will try to get range from top_down. > But crashkernel do need from low and specified range. > > Let's limit the target range with rash_base + crash_size to make sure that > We get range from bottom. > > -v5: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge. > also second try for vmlinux or new kexec tools will use bzImage 64bit entry > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/setup.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -501,6 +501,7 @@ static inline unsigned long long get_tot > return total << PAGE_SHIFT; > } > > +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF > static void __init reserve_crashkernel(void) > { > unsigned long long total_mem; > @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v > if (crash_base <= 0) { > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, > - alignment); > + /* > + * Assume half crash_size is for bzImage > + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX > + */ > + crash_base = memblock_find_in_range(alignment, > + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, > + crash_size, alignment); > + IMHO, these kind of hardcodings are worse than finding the lowest possible address. It is assuming that kexec is going to load a bzImage. So we have following three options sorted from best to worst. - Specify upper limit in "crashkernel=" command line syntax - Find the lowest possible address for crashkernel reservations - Hardcode upper limit based on certain factors. Because upper limit depends on image being loaded and can also vary as kexec-tools changes, knowing it for sure will require extra reboot. It also make command line syntax more complicated as we need to introduce another field to speciy upper limit. Especially for the following case. crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] So personally I think we can stick to second best option and that is finding the lowest possible memory area. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-28 3:46 ` H. Peter Anvin 2010-09-28 7:14 ` Yinghai Lu @ 2010-09-28 13:54 ` Vivek Goyal 1 sibling, 0 replies; 25+ messages in thread From: Vivek Goyal @ 2010-09-28 13:54 UTC (permalink / raw) To: H. Peter Anvin; +Cc: kexec, Ingo Molnar, Yinghai Lu, caiqian, linux-kernel On Mon, Sep 27, 2010 at 08:46:42PM -0700, H. Peter Anvin wrote: > On 09/27/2010 05:53 PM, Vivek Goyal wrote: > > > > Actually, hardcoding the upper limit to 4G is probably not the best idea. > > Kexec loads the the relocatable binary (purgatory) and I remember that > > one of the generated relocation type was signed 32 bit and allowed max value > > to be 2G only. So IIRC, purgatory code always needed to be loaded below 2G. > > > > I liked HPA's other idea better of introducing memblock_find_in_range_lowest() > > so that we search bottom up and not rely on a specific upper limit. > > > > No, it's just another crappy hack which is broken in the same way. It's > better than open-coding, but it's still a hack. > > The Right Thing[TM] to do is for kexec to communicate the topmost > address it wants to this code, so it has both the upper and the lower > boundaries available to it instead of just one. > Being able to specify the upper limit would be the best thing so that kernel does not have to make any assumptions or hardcode anything. Question is how to determine the upper limit. - The upper limit will depend on what is being loaded in reserved region. Reserving memory using crashkernel= is a boot time optin and that point of time kexec has not even run. So we don't know what is the upper limit. Now we can do extra reboot to make it happen. Boot first kernel without reserving any memory. Introduce an option in kexec which tells user what are the segments kexec would like to load (for a given binary) and what are there upper memory limits and then user goes ahead modifies the command line and reboots the kernel back. This all sounds not so clean. Especially upper limit might change based on binary being loaded and a user might have to perform a reboot again. So to me trying to get lowest memory available possible for crashkernel reservations is not that a bad idea. It is certainly better than making hardcoded assumptions about the upper limit. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1346740216.2003261285553562018.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>]
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" [not found] <1346740216.2003261285553562018.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-09-27 2:42 ` caiqian 2010-09-27 5:58 ` Yinghai Lu 2010-09-27 6:31 ` Yinghai Lu 0 siblings, 2 replies; 25+ messages in thread From: caiqian @ 2010-09-27 2:42 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > On 09/26/2010 07:47 AM, caiqian@redhat.com wrote: > > > > ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > > > >> On 09/25/2010 11:55 PM, CAI Qian wrote: > >>>> > >>>> are you kexec from 2.6.35+ to 2.6.36-rc3+? > >>> No, both kernels were the same version. I am sorry the above logs > >> were misleading that were copy-and-pasted from different kernel > >> versions. > >> > >> can you check tip instead of next tree? > > No dice, > > # /sbin/kexec -p '--command-line=ro > root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root > rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM > LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us > rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll > maxcpus=1 reset_devices cgroup_disable=memory ' > --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img > /boot/vmlinuz-2.6.36-rc5-tip+ > > Could not find a free area of memory of a000 bytes... > > locate_hole failed > > looks like you need to update your kexec-tools package. Same results using the latest kexec-tools git version. > > please run following scripts in first kernel. > > cd /sys/firmware/memmap > for dir in * ; do > start=$(cat $dir/start) > end=$(cat $dir/end) > type=$(cat $dir/type) > printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" > done 0000000000000000-000000000009f400 (System RAM) 000000000009f400-00000000000a0000 (reserved) 00000000000f0000-0000000000100000 (reserved) 0000000000100000-00000000dfffb000 (System RAM) 00000000dfffb000-00000000e0000000 (reserved) 00000000fffbc000-0000000100000000 (reserved) 0000000100000000-0000000ca0000000 (System RAM) > > also enable kexec debug to see what memmap kexec parse. -d did not help here. # /sbin/kexec -p -d '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img /boot/vmlinuz-2.6.36-rc5-tip+ Could not find a free area of memory of a000 bytes... locate_hole failed > > > > > After reverted the whole memblock commits, it was working again, > > 7950c407c0288b223a200c1bba8198941599ca37 > > fb74fb6db91abc3c1ceeb9d2c17b44866a12c63e > > f88eff74aa848e58b1ea49768c0bbb874b31357f > > 27de794365786b4cdc3461ed4e23af2a33f40612 > > 9dc5d569c133819c1ce069ebb1d771c62de32580 > > 4d5cf86ce187c0d3a4cdf233ab0cc6526ccbe01f > > 88ba088c18457caaf8d2e5f8d36becc731a3d4f6 > > edbe7d23b4482e7f33179290bcff3b1feae1c5f3 > > 6bcc8176d07f108da3b1af17fb2c0e82c80e948e > > b52c17ce854125700c4e19d4427d39bf2504ff63 > > e82d42be24bd5d75bf6f81045636e6ca95ab55f2 > > 301ff3e88ef9ff4bdb92f36a3e6170fce4c9dd34 > > 72d7c3b33c980843e756681fb4867dc1efd62a76 > > a9ce6bc15100023b411f8117e53a016d61889800 > > a587d2daebcd2bc159d4348b6a7b028950a6d803 > > 6f2a75369e7561e800d86927ecd83c970996b21f > > > > If used crashkernel=128M, the /proc/iomem looks like this. It used a > huge offset. > > 00000000-00000fff : reserved > > 00001000-0009f3ff : System RAM > > 0009f400-0009ffff : reserved > > 000f0000-000fffff : reserved > > 00100000-dfffafff : System RAM > > 01000000-0149a733 : Kernel code > > 0149a734-01afc46f : Kernel data > > 01d9c000-022b18f7 : Kernel bss > > dfffb000-dfffffff : reserved > > f0000000-f1ffffff : 0000:00:02.0 > > f2000000-f2000fff : 0000:00:02.0 > > f2010000-f201ffff : 0000:00:02.0 > > f2020000-f20200ff : 0000:00:03.0 > > f2020000-f20200ff : 8139cp > > f2030000-f203ffff : 0000:00:03.0 > > fec00000-fec003ff : IOAPIC 0 > > fee00000-fee00fff : Local APIC > > fffbc000-ffffffff : reserved > > 100000000-c9fffffff : System RAM > > c98000000-c9fffffff : Crash kernel > > > > On kernels that are working, it automatically found the offset at > 32M. > > 00000000-0000ffff : reserved > > 00010000-0009f3ff : System RAM > > 0009f400-0009ffff : reserved > > 000f0000-000fffff : reserved > > 00100000-dfffafff : System RAM > > 01000000-014250bf : Kernel code > > 014250c0-018aca8f : Kernel data > > 01b1f000-01ff7c07 : Kernel bss > > 02000000-09ffffff : Crash kernel > > dfffb000-dfffffff : reserved > > f0000000-f1ffffff : 0000:00:02.0 > > f2000000-f2000fff : 0000:00:02.0 > > f2010000-f201ffff : 0000:00:02.0 > > f2020000-f20200ff : 0000:00:03.0 > > f2020000-f20200ff : 8139cp > > f2030000-f203ffff : 0000:00:03.0 > > fec00000-fec003ff : IOAPIC 0 > > fee00000-fee00fff : Local APIC > > fffbc000-ffffffff : reserved > > 100000000-c9fffffff : System RAM > > > > If specified a fixed offset like crashkernel=128M@32M, it failed > reservation. > > initial memory mapped : 0 - 20000000 > > init_memory_mapping: 0000000000000000-00000000dfffb000 > > 0000000000 - 00dfe00000 page 2M > > 00dfe00000 - 00dfffb000 page 4k > > kernel direct mapping tables up to dfffb000 @ 1fffa000-20000000 > > init_memory_mapping: 0000000100000000-0000000ca0000000 > > 0100000000 - 0ca0000000 page 2M > > kernel direct mapping tables up to ca0000000 @ dffc7000-dfffb000 > > RAMDISK: 37599000 - 37ff0000 > > crashkernel reservation failed - memory is in use. > > > > After reverted those commits, it looks like this, > > init_memory_mapping: 0000000000000000-00000000dfffb000 > > 0000000000 - 00dfe00000 page 2M > > 00dfe00000 - 00dfffb000 page 4k > > kernel direct mapping tables up to dfffb000 @ 16000-1c000 > > init_memory_mapping: 0000000100000000-0000000ca0000000 > > 0100000000 - 0ca0000000 page 2M > > kernel direct mapping tables up to ca0000000 @ 1a000-4e000 > > RAMDISK: 375c9000 - 37ff0000 > > Reserving 128MB of memory at 32MB for crashkernel (System RAM: > 51712MB) > > yes, default memblock find_range is top_down. > > old early_res is from bottom_up. > > during the convecting, we do have one x86 find_range from bottom_up, > but later > it seems top_down was working on all test cases. ( 32bit etc) > > Subject: [PATCH] x86, memblock: Add x86 version of > memblock_find_in_range() Yes, this patch did help. Reserving 128MB of memory at 32MB for crashkernel (System RAM: 51712MB) > > Generic version is going from high to low, and it seems it can not > find > right area compact enough. > > the x86 version will go from goal to limit and just like the way We > used > for early_res > > use ARCH_FIND_MEMBLOCK_AREA to select from them. > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > --- > arch/x86/Kconfig | 8 +++++++ > arch/x86/mm/memblock.c | 54 > +++++++++++++++++++++++++++++++++++++++++++++++++ > mm/memblock.c | 2 - > 3 files changed, 63 insertions(+), 1 deletion(-) > > Index: linux-2.6/arch/x86/mm/memblock.c > =================================================================== > --- linux-2.6.orig/arch/x86/mm/memblock.c > +++ linux-2.6/arch/x86/mm/memblock.c > @@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st > > return end - start - ((u64)ram << PAGE_SHIFT); > } > + > +#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA > +/* Check for already reserved areas */ > +static inline bool __init check_with_memblock_reserved(u64 *addrp, > u64 size, u64 align) > +{ > + u64 addr = *addrp; > + bool changed = false; > + struct memblock_region *r; > +again: > + for_each_memblock(reserved, r) { > + if ((addr + size) > r->base && addr < (r->base + r->size)) { > + addr = round_up(r->base + r->size, align); > + changed = true; > + goto again; > + } > + } > + > + if (changed) > + *addrp = addr; > + > + return changed; > +} > + > +/* > + * Find a free area with specified alignment in a specific range. > + */ > +u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 > align) > +{ > + struct memblock_region *r; > + > + for_each_memblock(memory, r) { > + u64 ei_start = r->base; > + u64 ei_last = ei_start + r->size; > + u64 addr, last; > + > + addr = round_up(ei_start, align); > + if (addr < start) > + addr = round_up(start, align); > + if (addr >= ei_last) > + continue; > + while (check_with_memblock_reserved(&addr, size, align) && > addr+size <= ei_last) > + ; > + last = addr + size; > + if (last > ei_last) > + continue; > + if (last > end) > + continue; > + > + return addr; > + } > + > + return MEMBLOCK_ERROR; > +} > +#endif > Index: linux-2.6/arch/x86/Kconfig > =================================================================== > --- linux-2.6.orig/arch/x86/Kconfig > +++ linux-2.6/arch/x86/Kconfig > @@ -569,6 +569,14 @@ config PARAVIRT_DEBUG > Enable to debug paravirt_ops internals. Specifically, BUG if > a paravirt_op is missing when it is called. > > +config ARCH_MEMBLOCK_FIND_AREA > + default y > + bool "Use x86 own memblock_find_in_range()" > + ---help--- > + Use memblock_find_in_range() version instead of generic version, > it get free > + area up from low. > + Generic one try to get free area down from limit. > + > config NO_BOOTMEM > def_bool y > > Index: linux-2.6/mm/memblock.c > =================================================================== > --- linux-2.6.orig/mm/memblock.c > +++ linux-2.6/mm/memblock.c > @@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl > /* > * Find a free area with specified alignment in a specific range. > */ > -u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 > size, u64 align) > +u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, > u64 size, u64 align) > { > return memblock_find_base(size, align, start, end); > } > > > > > > I can't tell where the memory at 32MB was used, but after reverted > those commits I can see those early reservations information, > > Subtract (76 early reservations) > > #1 [0001000000 - 0001ff7c08] TEXT DATA BSS > > #2 [00375c9000 - 0037ff0000] RAMDISK > > #3 [0001ff8000 - 0001ff8079] BRK > > #4 [000009f400 - 00000f7fb0] BIOS reserved > > #5 [00000f7fb0 - 00000f7fc0] MP-table mpf > > #6 [00000f822c - 0000100000] BIOS reserved > > #7 [00000f7fc0 - 00000f822c] MP-table mpc > > #8 [0000010000 - 0000012000] TRAMPOLINE > > #9 [0000012000 - 0000016000] ACPI WAKEUP > > #10 [0000016000 - 000001a000] PGTABLE > > #11 [000001a000 - 0000049000] PGTABLE > > #12 [0002000000 - 000a000000] CRASH KERNEL > > > > But after those commits, those information was gone. > > memblock could merge reserved area, so can not keep tags with it. > > I have local patchset that could print those name tags... > please check Looks like so. > > > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-2.6-yinghai.git > memblock > > Yinghai > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 2:42 ` caiqian @ 2010-09-27 5:58 ` Yinghai Lu 2010-09-27 6:31 ` Yinghai Lu 1 sibling, 0 replies; 25+ messages in thread From: Yinghai Lu @ 2010-09-27 5:58 UTC (permalink / raw) To: caiqian; +Cc: linux-next, kexec, H. Peter Anvin Please check this one on top of tip or next. Thanks Yinghai [PATCH] x86, memblock: Fix crashkernel allocation Cai Qian found that crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v /* 0 means: find the address automatically */ if (crash_base <= 0) { + unsigned long long start = 0; const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); - if (crash_base == MEMBLOCK_ERROR) { + crash_base = alignment; + while (crash_base < 0xffffffff) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 2:42 ` caiqian 2010-09-27 5:58 ` Yinghai Lu @ 2010-09-27 6:31 ` Yinghai Lu 2010-09-27 9:16 ` CAI Qian 1 sibling, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-27 6:31 UTC (permalink / raw) To: caiqian; +Cc: Ingo Molnar, kexec, linux-kernel@vger.kernel.org, H. Peter Anvin Please check this one on top of tip or next. Thanks Yinghai [PATCH] x86, memblock: Fix crashkernel allocation Cai Qian found that crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with rash_base + crash_size to make sure that We get range from bottom. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v /* 0 means: find the address automatically */ if (crash_base <= 0) { + unsigned long long start = 0; const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); - if (crash_base == MEMBLOCK_ERROR) { + crash_base = alignment; + while (crash_base < 0xffffffff) { + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, alignment); + + if (start == crash_base) + break; + + crash_base += alignment; + } + if (start != crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; } } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-27 6:31 ` Yinghai Lu @ 2010-09-27 9:16 ` CAI Qian 0 siblings, 0 replies; 25+ messages in thread From: CAI Qian @ 2010-09-27 9:16 UTC (permalink / raw) To: Yinghai Lu; +Cc: Ingo Molnar, kexec, linux-kernel, H. Peter Anvin ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > Please check this one on top of tip or next. This failed for both trees. [root@localhost linux-next]# patch -Np1 <memblock.patch patching file arch/x86/kernel/setup.c Hunk #1 FAILED at 516. 1 out of 1 hunk FAILED -- saving rejects to file arch/x86/kernel/setup.c.rej > > Thanks > > Yinghai > > [PATCH] x86, memblock: Fix crashkernel allocation > > Cai Qian found that crashkernel is broken with x86 memblock changes > 1. crashkernel=128M@32M always reported that range is used, even first > kernel is small > no one use that range > 2. always get following report when using "kexec -p" > Could not find a free area of memory of a000 bytes... > locate_hole failed > > The root cause is that generic memblock_find_in_range() will try to > get range from top_down. > But crashkernel do need from low and specified range. > > Let's limit the target range with rash_base + crash_size to make sure > that > We get range from bottom. > > Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/setup.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -516,19 +516,28 @@ static void __init reserve_crashkernel(v > > /* 0 means: find the address automatically */ > if (crash_base <= 0) { > + unsigned long long start = 0; > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, > crash_size, > - alignment); > - if (crash_base == MEMBLOCK_ERROR) { > + crash_base = alignment; > + while (crash_base < 0xffffffff) { > + start = memblock_find_in_range(crash_base, > + crash_base + crash_size, crash_size, alignment); > + > + if (start == crash_base) > + break; > + > + crash_base += alignment; > + } > + if (start != crash_base) { > pr_info("crashkernel reservation failed - No suitable area > found.\n"); > return; > } > } else { > unsigned long long start; > > - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, > - 1<<20); > + start = memblock_find_in_range(crash_base, > + crash_base + crash_size, crash_size, 1<<20); > if (start != crash_base) { > pr_info("crashkernel reservation failed - memory is in use.\n"); > return; > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1834151968.1996101285512089968.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>]
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" [not found] <1834151968.1996101285512089968.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-09-26 14:47 ` caiqian 2010-09-26 19:42 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: caiqian @ 2010-09-26 14:47 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > On 09/25/2010 11:55 PM, CAI Qian wrote: > >> > >> are you kexec from 2.6.35+ to 2.6.36-rc3+? > > No, both kernels were the same version. I am sorry the above logs > were misleading that were copy-and-pasted from different kernel > versions. > > can you check tip instead of next tree? No dice, # /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img /boot/vmlinuz-2.6.36-rc5-tip+ Could not find a free area of memory of a000 bytes... locate_hole failed After reverted the whole memblock commits, it was working again, 7950c407c0288b223a200c1bba8198941599ca37 fb74fb6db91abc3c1ceeb9d2c17b44866a12c63e f88eff74aa848e58b1ea49768c0bbb874b31357f 27de794365786b4cdc3461ed4e23af2a33f40612 9dc5d569c133819c1ce069ebb1d771c62de32580 4d5cf86ce187c0d3a4cdf233ab0cc6526ccbe01f 88ba088c18457caaf8d2e5f8d36becc731a3d4f6 edbe7d23b4482e7f33179290bcff3b1feae1c5f3 6bcc8176d07f108da3b1af17fb2c0e82c80e948e b52c17ce854125700c4e19d4427d39bf2504ff63 e82d42be24bd5d75bf6f81045636e6ca95ab55f2 301ff3e88ef9ff4bdb92f36a3e6170fce4c9dd34 72d7c3b33c980843e756681fb4867dc1efd62a76 a9ce6bc15100023b411f8117e53a016d61889800 a587d2daebcd2bc159d4348b6a7b028950a6d803 6f2a75369e7561e800d86927ecd83c970996b21f If used crashkernel=128M, the /proc/iomem looks like this. It used a huge offset. 00000000-00000fff : reserved 00001000-0009f3ff : System RAM 0009f400-0009ffff : reserved 000f0000-000fffff : reserved 00100000-dfffafff : System RAM 01000000-0149a733 : Kernel code 0149a734-01afc46f : Kernel data 01d9c000-022b18f7 : Kernel bss dfffb000-dfffffff : reserved f0000000-f1ffffff : 0000:00:02.0 f2000000-f2000fff : 0000:00:02.0 f2010000-f201ffff : 0000:00:02.0 f2020000-f20200ff : 0000:00:03.0 f2020000-f20200ff : 8139cp f2030000-f203ffff : 0000:00:03.0 fec00000-fec003ff : IOAPIC 0 fee00000-fee00fff : Local APIC fffbc000-ffffffff : reserved 100000000-c9fffffff : System RAM c98000000-c9fffffff : Crash kernel On kernels that are working, it automatically found the offset at 32M. 00000000-0000ffff : reserved 00010000-0009f3ff : System RAM 0009f400-0009ffff : reserved 000f0000-000fffff : reserved 00100000-dfffafff : System RAM 01000000-014250bf : Kernel code 014250c0-018aca8f : Kernel data 01b1f000-01ff7c07 : Kernel bss 02000000-09ffffff : Crash kernel dfffb000-dfffffff : reserved f0000000-f1ffffff : 0000:00:02.0 f2000000-f2000fff : 0000:00:02.0 f2010000-f201ffff : 0000:00:02.0 f2020000-f20200ff : 0000:00:03.0 f2020000-f20200ff : 8139cp f2030000-f203ffff : 0000:00:03.0 fec00000-fec003ff : IOAPIC 0 fee00000-fee00fff : Local APIC fffbc000-ffffffff : reserved 100000000-c9fffffff : System RAM If specified a fixed offset like crashkernel=128M@32M, it failed reservation. initial memory mapped : 0 - 20000000 init_memory_mapping: 0000000000000000-00000000dfffb000 0000000000 - 00dfe00000 page 2M 00dfe00000 - 00dfffb000 page 4k kernel direct mapping tables up to dfffb000 @ 1fffa000-20000000 init_memory_mapping: 0000000100000000-0000000ca0000000 0100000000 - 0ca0000000 page 2M kernel direct mapping tables up to ca0000000 @ dffc7000-dfffb000 RAMDISK: 37599000 - 37ff0000 crashkernel reservation failed - memory is in use. After reverted those commits, it looks like this, init_memory_mapping: 0000000000000000-00000000dfffb000 0000000000 - 00dfe00000 page 2M 00dfe00000 - 00dfffb000 page 4k kernel direct mapping tables up to dfffb000 @ 16000-1c000 init_memory_mapping: 0000000100000000-0000000ca0000000 0100000000 - 0ca0000000 page 2M kernel direct mapping tables up to ca0000000 @ 1a000-4e000 RAMDISK: 375c9000 - 37ff0000 Reserving 128MB of memory at 32MB for crashkernel (System RAM: 51712MB) I can't tell where the memory at 32MB was used, but after reverted those commits I can see those early reservations information, Subtract (76 early reservations) #1 [0001000000 - 0001ff7c08] TEXT DATA BSS #2 [00375c9000 - 0037ff0000] RAMDISK #3 [0001ff8000 - 0001ff8079] BRK #4 [000009f400 - 00000f7fb0] BIOS reserved #5 [00000f7fb0 - 00000f7fc0] MP-table mpf #6 [00000f822c - 0000100000] BIOS reserved #7 [00000f7fc0 - 00000f822c] MP-table mpc #8 [0000010000 - 0000012000] TRAMPOLINE #9 [0000012000 - 0000016000] ACPI WAKEUP #10 [0000016000 - 000001a000] PGTABLE #11 [000001a000 - 0000049000] PGTABLE #12 [0002000000 - 000a000000] CRASH KERNEL But after those commits, those information was gone. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-26 14:47 ` caiqian @ 2010-09-26 19:42 ` Yinghai Lu 0 siblings, 0 replies; 25+ messages in thread From: Yinghai Lu @ 2010-09-26 19:42 UTC (permalink / raw) To: caiqian; +Cc: linux-next, kexec, H. Peter Anvin On 09/26/2010 07:47 AM, caiqian@redhat.com wrote: > > ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > >> On 09/25/2010 11:55 PM, CAI Qian wrote: >>>> >>>> are you kexec from 2.6.35+ to 2.6.36-rc3+? >>> No, both kernels were the same version. I am sorry the above logs >> were misleading that were copy-and-pasted from different kernel >> versions. >> >> can you check tip instead of next tree? > No dice, > # /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc5-tip+kdump.img /boot/vmlinuz-2.6.36-rc5-tip+ > Could not find a free area of memory of a000 bytes... > locate_hole failed looks like you need to update your kexec-tools package. please run following scripts in first kernel. cd /sys/firmware/memmap for dir in * ; do start=$(cat $dir/start) end=$(cat $dir/end) type=$(cat $dir/type) printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" done also enable kexec debug to see what memmap kexec parse. > > After reverted the whole memblock commits, it was working again, > 7950c407c0288b223a200c1bba8198941599ca37 > fb74fb6db91abc3c1ceeb9d2c17b44866a12c63e > f88eff74aa848e58b1ea49768c0bbb874b31357f > 27de794365786b4cdc3461ed4e23af2a33f40612 > 9dc5d569c133819c1ce069ebb1d771c62de32580 > 4d5cf86ce187c0d3a4cdf233ab0cc6526ccbe01f > 88ba088c18457caaf8d2e5f8d36becc731a3d4f6 > edbe7d23b4482e7f33179290bcff3b1feae1c5f3 > 6bcc8176d07f108da3b1af17fb2c0e82c80e948e > b52c17ce854125700c4e19d4427d39bf2504ff63 > e82d42be24bd5d75bf6f81045636e6ca95ab55f2 > 301ff3e88ef9ff4bdb92f36a3e6170fce4c9dd34 > 72d7c3b33c980843e756681fb4867dc1efd62a76 > a9ce6bc15100023b411f8117e53a016d61889800 > a587d2daebcd2bc159d4348b6a7b028950a6d803 > 6f2a75369e7561e800d86927ecd83c970996b21f > > If used crashkernel=128M, the /proc/iomem looks like this. It used a huge offset. > 00000000-00000fff : reserved > 00001000-0009f3ff : System RAM > 0009f400-0009ffff : reserved > 000f0000-000fffff : reserved > 00100000-dfffafff : System RAM > 01000000-0149a733 : Kernel code > 0149a734-01afc46f : Kernel data > 01d9c000-022b18f7 : Kernel bss > dfffb000-dfffffff : reserved > f0000000-f1ffffff : 0000:00:02.0 > f2000000-f2000fff : 0000:00:02.0 > f2010000-f201ffff : 0000:00:02.0 > f2020000-f20200ff : 0000:00:03.0 > f2020000-f20200ff : 8139cp > f2030000-f203ffff : 0000:00:03.0 > fec00000-fec003ff : IOAPIC 0 > fee00000-fee00fff : Local APIC > fffbc000-ffffffff : reserved > 100000000-c9fffffff : System RAM > c98000000-c9fffffff : Crash kernel > > On kernels that are working, it automatically found the offset at 32M. > 00000000-0000ffff : reserved > 00010000-0009f3ff : System RAM > 0009f400-0009ffff : reserved > 000f0000-000fffff : reserved > 00100000-dfffafff : System RAM > 01000000-014250bf : Kernel code > 014250c0-018aca8f : Kernel data > 01b1f000-01ff7c07 : Kernel bss > 02000000-09ffffff : Crash kernel > dfffb000-dfffffff : reserved > f0000000-f1ffffff : 0000:00:02.0 > f2000000-f2000fff : 0000:00:02.0 > f2010000-f201ffff : 0000:00:02.0 > f2020000-f20200ff : 0000:00:03.0 > f2020000-f20200ff : 8139cp > f2030000-f203ffff : 0000:00:03.0 > fec00000-fec003ff : IOAPIC 0 > fee00000-fee00fff : Local APIC > fffbc000-ffffffff : reserved > 100000000-c9fffffff : System RAM > > If specified a fixed offset like crashkernel=128M@32M, it failed reservation. > initial memory mapped : 0 - 20000000 > init_memory_mapping: 0000000000000000-00000000dfffb000 > 0000000000 - 00dfe00000 page 2M > 00dfe00000 - 00dfffb000 page 4k > kernel direct mapping tables up to dfffb000 @ 1fffa000-20000000 > init_memory_mapping: 0000000100000000-0000000ca0000000 > 0100000000 - 0ca0000000 page 2M > kernel direct mapping tables up to ca0000000 @ dffc7000-dfffb000 > RAMDISK: 37599000 - 37ff0000 > crashkernel reservation failed - memory is in use. > > After reverted those commits, it looks like this, > init_memory_mapping: 0000000000000000-00000000dfffb000 > 0000000000 - 00dfe00000 page 2M > 00dfe00000 - 00dfffb000 page 4k > kernel direct mapping tables up to dfffb000 @ 16000-1c000 > init_memory_mapping: 0000000100000000-0000000ca0000000 > 0100000000 - 0ca0000000 page 2M > kernel direct mapping tables up to ca0000000 @ 1a000-4e000 > RAMDISK: 375c9000 - 37ff0000 > Reserving 128MB of memory at 32MB for crashkernel (System RAM: 51712MB) yes, default memblock find_range is top_down. old early_res is from bottom_up. during the convecting, we do have one x86 find_range from bottom_up, but later it seems top_down was working on all test cases. ( 32bit etc) Subject: [PATCH] x86, memblock: Add x86 version of memblock_find_in_range() Generic version is going from high to low, and it seems it can not find right area compact enough. the x86 version will go from goal to limit and just like the way We used for early_res use ARCH_FIND_MEMBLOCK_AREA to select from them. Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/Kconfig | 8 +++++++ arch/x86/mm/memblock.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++ mm/memblock.c | 2 - 3 files changed, 63 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/mm/memblock.c =================================================================== --- linux-2.6.orig/arch/x86/mm/memblock.c +++ linux-2.6/arch/x86/mm/memblock.c @@ -352,3 +352,57 @@ u64 __init memblock_x86_hole_size(u64 st return end - start - ((u64)ram << PAGE_SHIFT); } + +#ifdef CONFIG_ARCH_MEMBLOCK_FIND_AREA +/* Check for already reserved areas */ +static inline bool __init check_with_memblock_reserved(u64 *addrp, u64 size, u64 align) +{ + u64 addr = *addrp; + bool changed = false; + struct memblock_region *r; +again: + for_each_memblock(reserved, r) { + if ((addr + size) > r->base && addr < (r->base + r->size)) { + addr = round_up(r->base + r->size, align); + changed = true; + goto again; + } + } + + if (changed) + *addrp = addr; + + return changed; +} + +/* + * Find a free area with specified alignment in a specific range. + */ +u64 __init memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) +{ + struct memblock_region *r; + + for_each_memblock(memory, r) { + u64 ei_start = r->base; + u64 ei_last = ei_start + r->size; + u64 addr, last; + + addr = round_up(ei_start, align); + if (addr < start) + addr = round_up(start, align); + if (addr >= ei_last) + continue; + while (check_with_memblock_reserved(&addr, size, align) && addr+size <= ei_last) + ; + last = addr + size; + if (last > ei_last) + continue; + if (last > end) + continue; + + return addr; + } + + return MEMBLOCK_ERROR; +} +#endif Index: linux-2.6/arch/x86/Kconfig =================================================================== --- linux-2.6.orig/arch/x86/Kconfig +++ linux-2.6/arch/x86/Kconfig @@ -569,6 +569,14 @@ config PARAVIRT_DEBUG Enable to debug paravirt_ops internals. Specifically, BUG if a paravirt_op is missing when it is called. +config ARCH_MEMBLOCK_FIND_AREA + default y + bool "Use x86 own memblock_find_in_range()" + ---help--- + Use memblock_find_in_range() version instead of generic version, it get free + area up from low. + Generic one try to get free area down from limit. + config NO_BOOTMEM def_bool y Index: linux-2.6/mm/memblock.c =================================================================== --- linux-2.6.orig/mm/memblock.c +++ linux-2.6/mm/memblock.c @@ -165,7 +165,7 @@ static phys_addr_t __init_memblock membl /* * Find a free area with specified alignment in a specific range. */ -u64 __init_memblock memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) +u64 __init_memblock __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align) { return memblock_find_base(size, align, start, end); } > > I can't tell where the memory at 32MB was used, but after reverted those commits I can see those early reservations information, > Subtract (76 early reservations) > #1 [0001000000 - 0001ff7c08] TEXT DATA BSS > #2 [00375c9000 - 0037ff0000] RAMDISK > #3 [0001ff8000 - 0001ff8079] BRK > #4 [000009f400 - 00000f7fb0] BIOS reserved > #5 [00000f7fb0 - 00000f7fc0] MP-table mpf > #6 [00000f822c - 0000100000] BIOS reserved > #7 [00000f7fc0 - 00000f822c] MP-table mpc > #8 [0000010000 - 0000012000] TRAMPOLINE > #9 [0000012000 - 0000016000] ACPI WAKEUP > #10 [0000016000 - 000001a000] PGTABLE > #11 [000001a000 - 0000049000] PGTABLE > #12 [0002000000 - 000a000000] CRASH KERNEL > > But after those commits, those information was gone. memblock could merge reserved area, so can not keep tags with it. I have local patchset that could print those name tags... please check git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-2.6-yinghai.git memblock Yinghai _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1614106428.1991831285470588200.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>]
* kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" [not found] <1614106428.1991831285470588200.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-09-26 3:11 ` caiqian 2010-09-26 6:44 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: caiqian @ 2010-09-26 3:11 UTC (permalink / raw) To: Yinghai Lu, H. Peter Anvin; +Cc: linux-next, kexec # /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc3+kdump.img /boot/vmlinuz-2.6.36-rc3+ BUG: unable to handle kernel paging request at ffff8800dfffe400 IP: [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120 PGD 1a26063 PUD 1fffc067 PMD 1fffd067 PTE 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu0/crash_notes CPU 3 Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon pcspkr 8139too 8139cp mii snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg i2c_piix4 i2c_core ext4 mbcache jbd2 floppy sd_mod crc_t10dif virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] Pid: 5671, comm: kexec Not tainted 2.6.35+ #11 /KVM RIP: 0010:[<ffffffff8113376b>] [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120 RSP: 0018:ffff88064567fe38 EFLAGS: 00010286 RAX: ffff8800df440000 RBX: ffff8800df41d990 RCX: ffff8800df400000 RDX: ffff8800dfff6400 RSI: 0000000000001000 RDI: ffff8800df41d990 RBP: ffff88064567fe58 R08: ffffffff81651f20 R09: ffff8800df40cb38 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88083dcd18e0 R13: ffff88064567ff48 R14: 0000000000001000 R15: 00007f969401b000 FS: 00007f96952e4700(0000) GS:ffff8800df4c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8800dfffe400 CR3: 0000000818130000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kexec (pid: 5671, threadinfo ffff88064567e000, task ffff8808ddba0180) Stack: ffff88064567fe68 ffff88083d4f8000 ffff88083dcd18e0 ffff88064567ff48 <0> ffff88064567fe78 ffffffff812ea28b ffff88064567fe78 ffff88083dcd18c0 <0> ffff88064567fe88 ffffffff812e4f0f ffff88064567fee8 ffffffff811a5d11 Call Trace: [<ffffffff812ea28b>] show_crash_notes+0x2b/0x50 [<ffffffff812e4f0f>] sysdev_show+0x1f/0x30 [<ffffffff811a5d11>] sysfs_read_file+0x111/0x1f0 [<ffffffff8113e7e5>] vfs_read+0xb5/0x1a0 [<ffffffff810b5952>] ? audit_syscall_entry+0x252/0x280 [<ffffffff8113e921>] sys_read+0x51/0x90 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Code: 00 00 48 8b 05 bf 81 e2 00 8b 35 dd 46 9c 00 48 8b 15 0a 47 9c 00 48 89 fb 48 8b 48 18 8b 05 a5 46 9c 00 c1 e0 0c 48 98 48 01 c8 <48> 03 04 f2 48 39 c7 0f 83 a0 00 00 00 8b 05 aa 46 9c 00 48 03 RIP [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120 RSP <ffff88064567fe38> CR2: ffff8800dfffe400 ---[ end trace 1f847047fea7430c ]--- It was discovered that this commit introduced the regression, commit a9ce6bc15100023b411f8117e53a016d61889800 Author: Yinghai Lu <yinghai@kernel.org> Date: Wed Aug 25 13:39:17 2010 -0700 x86, memblock: Replace e820_/_early string with memblock_ 1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 8406ed7..8e4a165 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -90,7 +90,7 @@ extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size, #endif /* CONFIG_X86_32 */ extern int add_efi_memmap; -extern void efi_reserve_early(void); +extern void efi_memblock_x86_reserve_range(void); extern void efi_call_phys_prelog(void); extern void efi_call_phys_epilog(void); diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index fcc3c61..d829e75 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -7,6 +7,7 @@ #include <linux/acpi.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/dmi.h> #include <linux/cpumask.h> #include <asm/segment.h> @@ -125,7 +126,7 @@ void acpi_restore_state_mem(void) */ void __init acpi_reserve_wakeup_memory(void) { - unsigned long mem; + phys_addr_t mem; if ((&wakeup_code_end - &wakeup_code_start) > WAKEUP_SIZE) { printk(KERN_ERR @@ -133,15 +134,15 @@ void __init acpi_reserve_wakeup_memory(void) return; } - mem = find_e820_area(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE); + mem = memblock_find_in_range(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE); - if (mem == -1L) { + if (mem == MEMBLOCK_ERROR) { printk(KERN_ERR "ACPI: Cannot allocate lowmem, S3 disabled.\n"); return; } acpi_realmode = (unsigned long) phys_to_virt(mem); acpi_wakeup_address = mem; - reserve_early(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP"); + memblock_x86_reserve_range(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP"); } diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c index 3e28401..960f26a 100644 --- a/arch/x86/kernel/apic/numaq_32.c +++ b/arch/x86/kernel/apic/numaq_32.c @@ -26,6 +26,7 @@ #include <linux/nodemask.h> #include <linux/topology.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/threads.h> #include <linux/cpumask.h> #include <linux/kernel.h> @@ -88,7 +89,7 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd) node_end_pfn[node] = MB_TO_PAGES(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size); - e820_register_active_regions(node, node_start_pfn[node], + memblock_x86_register_active_regions(node, node_start_pfn[node], node_end_pfn[node]); memory_present(node, node_start_pfn[node], node_end_pfn[node]); diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c index c2fa9b8..0fe27d7 100644 --- a/arch/x86/kernel/efi.c +++ b/arch/x86/kernel/efi.c @@ -30,6 +30,7 @@ #include <linux/init.h> #include <linux/efi.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/spinlock.h> #include <linux/uaccess.h> #include <linux/time.h> @@ -275,7 +276,7 @@ static void __init do_add_efi_memmap(void) sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); } -void __init efi_reserve_early(void) +void __init efi_memblock_x86_reserve_range(void) { unsigned long pmap; @@ -290,7 +291,7 @@ void __init efi_reserve_early(void) boot_params.efi_info.efi_memdesc_size; memmap.desc_version = boot_params.efi_info.efi_memdesc_version; memmap.desc_size = boot_params.efi_info.efi_memdesc_size; - reserve_early(pmap, pmap + memmap.nr_map * memmap.desc_size, + memblock_x86_reserve_range(pmap, pmap + memmap.nr_map * memmap.desc_size, "EFI memmap"); } diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index da60aa8..74e4cf6 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -42,7 +42,7 @@ void __init i386_start_kernel(void) memblock_x86_reserve_range(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE"); #endif - reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS"); + memblock_x86_reserve_range(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS"); #ifdef CONFIG_BLK_DEV_INITRD /* Reserve INITRD */ @@ -51,7 +51,7 @@ void __init i386_start_kernel(void) u64 ramdisk_image = boot_params.hdr.ramdisk_image; u64 ramdisk_size = boot_params.hdr.ramdisk_size; u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size); - reserve_early(ramdisk_image, ramdisk_end, "RAMDISK"); + memblock_x86_reserve_range(ramdisk_image, ramdisk_end, "RAMDISK"); } #endif diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 8ee930f..97adf98 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -101,7 +101,7 @@ void __init x86_64_start_reservations(char *real_mode_data) memblock_init(); - reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS"); + memblock_x86_reserve_range(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS"); #ifdef CONFIG_BLK_DEV_INITRD /* Reserve INITRD */ @@ -110,7 +110,7 @@ void __init x86_64_start_reservations(char *real_mode_data) unsigned long ramdisk_image = boot_params.hdr.ramdisk_image; unsigned long ramdisk_size = boot_params.hdr.ramdisk_size; unsigned long ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size); - reserve_early(ramdisk_image, ramdisk_end, "RAMDISK"); + memblock_x86_reserve_range(ramdisk_image, ramdisk_end, "RAMDISK"); } #endif diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index bbe0aaf..a4f0173 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -302,7 +302,7 @@ static inline void init_gbpages(void) static void __init reserve_brk(void) { if (_brk_end > _brk_start) - reserve_early(__pa(_brk_start), __pa(_brk_end), "BRK"); + memblock_x86_reserve_range(__pa(_brk_start), __pa(_brk_end), "BRK"); /* Mark brk area as locked down and no longer taking any new allocations */ @@ -324,17 +324,16 @@ static void __init relocate_initrd(void) char *p, *q; /* We need to move the initrd down into lowmem */ - ramdisk_here = find_e820_area(0, end_of_lowmem, area_size, + ramdisk_here = memblock_find_in_range(0, end_of_lowmem, area_size, PAGE_SIZE); - if (ramdisk_here == -1ULL) + if (ramdisk_here == MEMBLOCK_ERROR) panic("Cannot find place for new RAMDISK of size %lld\n", ramdisk_size); /* Note: this includes all the lowmem currently occupied by the initrd, we rely on that fact to keep the data intact. */ - reserve_early(ramdisk_here, ramdisk_here + area_size, - "NEW RAMDISK"); + memblock_x86_reserve_range(ramdisk_here, ramdisk_here + area_size, "NEW RAMDISK"); initrd_start = ramdisk_here + PAGE_OFFSET; initrd_end = initrd_start + ramdisk_size; printk(KERN_INFO "Allocated new RAMDISK: %08llx - %08llx\n", @@ -390,7 +389,7 @@ static void __init reserve_initrd(void) initrd_start = 0; if (ramdisk_size >= (end_of_lowmem>>1)) { - free_early(ramdisk_image, ramdisk_end); + memblock_x86_free_range(ramdisk_image, ramdisk_end); printk(KERN_ERR "initrd too large to handle, " "disabling initrd\n"); return; @@ -413,7 +412,7 @@ static void __init reserve_initrd(void) relocate_initrd(); - free_early(ramdisk_image, ramdisk_end); + memblock_x86_free_range(ramdisk_image, ramdisk_end); } #else static void __init reserve_initrd(void) @@ -469,7 +468,7 @@ static void __init e820_reserve_setup_data(void) e820_print_map("reserve setup_data"); } -static void __init reserve_early_setup_data(void) +static void __init memblock_x86_reserve_range_setup_data(void) { struct setup_data *data; u64 pa_data; @@ -481,7 +480,7 @@ static void __init reserve_early_setup_data(void) while (pa_data) { data = early_memremap(pa_data, sizeof(*data)); sprintf(buf, "setup data %x", data->type); - reserve_early(pa_data, pa_data+sizeof(*data)+data->len, buf); + memblock_x86_reserve_range(pa_data, pa_data+sizeof(*data)+data->len, buf); pa_data = data->next; early_iounmap(data, sizeof(*data)); } @@ -519,23 +518,23 @@ static void __init reserve_crashkernel(void) if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = find_e820_area(alignment, ULONG_MAX, crash_size, + crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, alignment); - if (crash_base == -1ULL) { + if (crash_base == MEMBLOCK_ERROR) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; } } else { unsigned long long start; - start = find_e820_area(crash_base, ULONG_MAX, crash_size, + start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; } } - reserve_early(crash_base, crash_base + crash_size, "CRASH KERNEL"); + memblock_x86_reserve_range(crash_base, crash_base + crash_size, "CRASH KERNEL"); printk(KERN_INFO "Reserving %ldMB of memory at %ldMB " "for crashkernel (System RAM: %ldMB)\n", @@ -786,7 +785,7 @@ void __init setup_arch(char **cmdline_p) #endif 4)) { efi_enabled = 1; - efi_reserve_early(); + efi_memblock_x86_reserve_range(); } #endif @@ -846,7 +845,7 @@ void __init setup_arch(char **cmdline_p) vmi_activate(); /* after early param, so could get panic from serial */ - reserve_early_setup_data(); + memblock_x86_reserve_range_setup_data(); if (acpi_mps_check()) { #ifdef CONFIG_X86_LOCAL_APIC diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c index c652ef6..7c2102c 100644 --- a/arch/x86/kernel/trampoline.c +++ b/arch/x86/kernel/trampoline.c @@ -1,7 +1,7 @@ #include <linux/io.h> +#include <linux/memblock.h> #include <asm/trampoline.h> -#include <asm/e820.h> #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP) #define __trampinit @@ -16,15 +16,15 @@ unsigned char *__trampinitdata trampoline_base; void __init reserve_trampoline_memory(void) { - unsigned long mem; + phys_addr_t mem; /* Has to be in very low memory so we can execute real-mode AP code. */ - mem = find_e820_area(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE); - if (mem == -1L) + mem = memblock_find_in_range(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE); + if (mem == MEMBLOCK_ERROR) panic("Cannot allocate trampoline\n"); trampoline_base = __va(mem); - reserve_early(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE"); + memblock_x86_reserve_range(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE"); } /* diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index b278535..c0e28a1 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -2,6 +2,7 @@ #include <linux/initrd.h> #include <linux/ioport.h> #include <linux/swap.h> +#include <linux/memblock.h> #include <asm/cacheflush.h> #include <asm/e820.h> @@ -33,6 +34,7 @@ static void __init find_early_table_space(unsigned long end, int use_pse, int use_gbpages) { unsigned long puds, pmds, ptes, tables, start; + phys_addr_t base; puds = (end + PUD_SIZE - 1) >> PUD_SHIFT; tables = roundup(puds * sizeof(pud_t), PAGE_SIZE); @@ -75,12 +77,12 @@ static void __init find_early_table_space(unsigned long end, int use_pse, #else start = 0x8000; #endif - e820_table_start = find_e820_area(start, max_pfn_mapped<<PAGE_SHIFT, + base = memblock_find_in_range(start, max_pfn_mapped<<PAGE_SHIFT, tables, PAGE_SIZE); - if (e820_table_start == -1UL) + if (base == MEMBLOCK_ERROR) panic("Cannot find space for the kernel page tables"); - e820_table_start >>= PAGE_SHIFT; + e820_table_start = base >> PAGE_SHIFT; e820_table_end = e820_table_start; e820_table_top = e820_table_start + (tables >> PAGE_SHIFT); @@ -299,7 +301,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, __flush_tlb_all(); if (!after_bootmem && e820_table_end > e820_table_start) - reserve_early(e820_table_start << PAGE_SHIFT, + memblock_x86_reserve_range(e820_table_start << PAGE_SHIFT, e820_table_end << PAGE_SHIFT, "PGTABLE"); if (!after_bootmem) diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 90e0545..63b09ba 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -25,6 +25,7 @@ #include <linux/pfn.h> #include <linux/poison.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/proc_fs.h> #include <linux/memory_hotplug.h> #include <linux/initrd.h> @@ -712,14 +713,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn, highstart_pfn = highend_pfn = max_pfn; if (max_pfn > max_low_pfn) highstart_pfn = max_low_pfn; - e820_register_active_regions(0, 0, highend_pfn); + memblock_x86_register_active_regions(0, 0, highend_pfn); sparse_memory_present_with_active_regions(0); printk(KERN_NOTICE "%ldMB HIGHMEM available.\n", pages_to_mb(highend_pfn - highstart_pfn)); num_physpages = highend_pfn; high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1; #else - e820_register_active_regions(0, 0, max_low_pfn); + memblock_x86_register_active_regions(0, 0, max_low_pfn); sparse_memory_present_with_active_regions(0); num_physpages = max_low_pfn; high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1; @@ -776,16 +777,16 @@ void __init setup_bootmem_allocator(void) { #ifndef CONFIG_NO_BOOTMEM int nodeid; - unsigned long bootmap_size, bootmap; + phys_addr_t bootmap_size, bootmap; /* * Initialize the boot-time allocator (with low memory only): */ bootmap_size = bootmem_bootmap_pages(max_low_pfn)<<PAGE_SHIFT; - bootmap = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size, + bootmap = memblock_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size, PAGE_SIZE); - if (bootmap == -1L) + if (bootmap == MEMBLOCK_ERROR) panic("Cannot find bootmem map of size %ld\n", bootmap_size); - reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP"); + memblock_x86_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP"); #endif printk(KERN_INFO " mapped low ram: 0 - %08lx\n", @@ -1069,3 +1070,4 @@ void mark_rodata_ro(void) #endif } #endif + diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 634fa08..592b236 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -21,6 +21,7 @@ #include <linux/initrd.h> #include <linux/pagemap.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/proc_fs.h> #include <linux/pci.h> #include <linux/pfn.h> @@ -577,18 +578,18 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn, unsigned long bootmap_size, bootmap; bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT; - bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size, + bootmap = memblock_find_in_range(0, end_pfn<<PAGE_SHIFT, bootmap_size, PAGE_SIZE); - if (bootmap == -1L) + if (bootmap == MEMBLOCK_ERROR) panic("Cannot find bootmem map of size %ld\n", bootmap_size); - reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP"); + memblock_x86_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP"); /* don't touch min_low_pfn */ bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT, 0, end_pfn); - e820_register_active_regions(0, start_pfn, end_pfn); + memblock_x86_register_active_regions(0, start_pfn, end_pfn); free_bootmem_with_active_regions(0, end_pfn); #else - e820_register_active_regions(0, start_pfn, end_pfn); + memblock_x86_register_active_regions(0, start_pfn, end_pfn); #endif } #endif diff --git a/arch/x86/mm/k8topology_64.c b/arch/x86/mm/k8topology_64.c index 970ed57..966de93 100644 --- a/arch/x86/mm/k8topology_64.c +++ b/arch/x86/mm/k8topology_64.c @@ -11,6 +11,8 @@ #include <linux/string.h> #include <linux/module.h> #include <linux/nodemask.h> +#include <linux/memblock.h> + #include <asm/io.h> #include <linux/pci_ids.h> #include <linux/acpi.h> @@ -222,7 +224,7 @@ int __init k8_scan_nodes(void) for_each_node_mask(i, node_possible_map) { int j; - e820_register_active_regions(i, + memblock_x86_register_active_regions(i, nodes[i].start >> PAGE_SHIFT, nodes[i].end >> PAGE_SHIFT); for (j = apicid_base; j < cores + apicid_base; j++) diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c index 18d244f..92faf3a 100644 --- a/arch/x86/mm/memtest.c +++ b/arch/x86/mm/memtest.c @@ -6,8 +6,7 @@ #include <linux/smp.h> #include <linux/init.h> #include <linux/pfn.h> - -#include <asm/e820.h> +#include <linux/memblock.h> static u64 patterns[] __initdata = { 0, @@ -35,7 +34,7 @@ static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad) (unsigned long long) pattern, (unsigned long long) start_bad, (unsigned long long) end_bad); - reserve_early(start_bad, end_bad, "BAD RAM"); + memblock_x86_reserve_range(start_bad, end_bad, "BAD RAM"); } static void __init memtest(u64 pattern, u64 start_phys, u64 size) @@ -74,7 +73,7 @@ static void __init do_one_pass(u64 pattern, u64 start, u64 end) u64 size = 0; while (start < end) { - start = find_e820_area_size(start, &size, 1); + start = memblock_x86_find_in_range_size(start, &size, 1); /* done ? */ if (start >= end) diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c index 809baaa..ddf9730 100644 --- a/arch/x86/mm/numa_32.c +++ b/arch/x86/mm/numa_32.c @@ -24,6 +24,7 @@ #include <linux/mm.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/mmzone.h> #include <linux/highmem.h> #include <linux/initrd.h> @@ -120,7 +121,7 @@ int __init get_memcfg_numa_flat(void) node_start_pfn[0] = 0; node_end_pfn[0] = max_pfn; - e820_register_active_regions(0, 0, max_pfn); + memblock_x86_register_active_regions(0, 0, max_pfn); memory_present(0, 0, max_pfn); node_remap_size[0] = node_memmap_size_bytes(0, 0, max_pfn); @@ -161,14 +162,14 @@ static void __init allocate_pgdat(int nid) NODE_DATA(nid) = (pg_data_t *)node_remap_start_vaddr[nid]; else { unsigned long pgdat_phys; - pgdat_phys = find_e820_area(min_low_pfn<<PAGE_SHIFT, + pgdat_phys = memblock_find_in_range(min_low_pfn<<PAGE_SHIFT, max_pfn_mapped<<PAGE_SHIFT, sizeof(pg_data_t), PAGE_SIZE); NODE_DATA(nid) = (pg_data_t *)(pfn_to_kaddr(pgdat_phys>>PAGE_SHIFT)); memset(buf, 0, sizeof(buf)); sprintf(buf, "NODE_DATA %d", nid); - reserve_early(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf); + memblock_x86_reserve_range(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf); } printk(KERN_DEBUG "allocate_pgdat: node %d NODE_DATA %08lx\n", nid, (unsigned long)NODE_DATA(nid)); @@ -291,15 +292,15 @@ static __init unsigned long calculate_numa_remap_pages(void) PTRS_PER_PTE); node_kva_target <<= PAGE_SHIFT; do { - node_kva_final = find_e820_area(node_kva_target, + node_kva_final = memblock_find_in_range(node_kva_target, ((u64)node_end_pfn[nid])<<PAGE_SHIFT, ((u64)size)<<PAGE_SHIFT, LARGE_PAGE_BYTES); node_kva_target -= LARGE_PAGE_BYTES; - } while (node_kva_final == -1ULL && + } while (node_kva_final == MEMBLOCK_ERROR && (node_kva_target>>PAGE_SHIFT) > (node_start_pfn[nid])); - if (node_kva_final == -1ULL) + if (node_kva_final == MEMBLOCK_ERROR) panic("Can not get kva ram\n"); node_remap_size[nid] = size; @@ -318,9 +319,9 @@ static __init unsigned long calculate_numa_remap_pages(void) * but we could have some hole in high memory, and it will only * check page_is_ram(pfn) && !page_is_reserved_early(pfn) to decide * to use it as free. - * So reserve_early here, hope we don't run out of that array + * So memblock_x86_reserve_range here, hope we don't run out of that array */ - reserve_early(node_kva_final, + memblock_x86_reserve_range(node_kva_final, node_kva_final+(((u64)size)<<PAGE_SHIFT), "KVA RAM"); @@ -367,14 +368,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn, kva_target_pfn = round_down(max_low_pfn - kva_pages, PTRS_PER_PTE); do { - kva_start_pfn = find_e820_area(kva_target_pfn<<PAGE_SHIFT, + kva_start_pfn = memblock_find_in_range(kva_target_pfn<<PAGE_SHIFT, max_low_pfn<<PAGE_SHIFT, kva_pages<<PAGE_SHIFT, PTRS_PER_PTE<<PAGE_SHIFT) >> PAGE_SHIFT; kva_target_pfn -= PTRS_PER_PTE; - } while (kva_start_pfn == -1UL && kva_target_pfn > min_low_pfn); + } while (kva_start_pfn == MEMBLOCK_ERROR && kva_target_pfn > min_low_pfn); - if (kva_start_pfn == -1UL) + if (kva_start_pfn == MEMBLOCK_ERROR) panic("Can not get kva space\n"); printk(KERN_INFO "kva_start_pfn ~ %lx max_low_pfn ~ %lx\n", @@ -382,7 +383,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn, printk(KERN_INFO "max_pfn = %lx\n", max_pfn); /* avoid clash with initrd */ - reserve_early(kva_start_pfn<<PAGE_SHIFT, + memblock_x86_reserve_range(kva_start_pfn<<PAGE_SHIFT, (kva_start_pfn + kva_pages)<<PAGE_SHIFT, "KVA PG"); #ifdef CONFIG_HIGHMEM diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c index 3d54f9f..984b1ff 100644 --- a/arch/x86/mm/numa_64.c +++ b/arch/x86/mm/numa_64.c @@ -87,16 +87,16 @@ static int __init allocate_cachealigned_memnodemap(void) addr = 0x8000; nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES); - nodemap_addr = find_e820_area(addr, max_pfn<<PAGE_SHIFT, + nodemap_addr = memblock_find_in_range(addr, max_pfn<<PAGE_SHIFT, nodemap_size, L1_CACHE_BYTES); - if (nodemap_addr == -1UL) { + if (nodemap_addr == MEMBLOCK_ERROR) { printk(KERN_ERR "NUMA: Unable to allocate Memory to Node hash map\n"); nodemap_addr = nodemap_size = 0; return -1; } memnodemap = phys_to_virt(nodemap_addr); - reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP"); + memblock_x86_reserve_range(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP"); printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n", nodemap_addr, nodemap_addr + nodemap_size); @@ -227,7 +227,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end) if (node_data[nodeid] == NULL) return; nodedata_phys = __pa(node_data[nodeid]); - reserve_early(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA"); + memblock_x86_reserve_range(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA"); printk(KERN_INFO " NODE_DATA [%016lx - %016lx]\n", nodedata_phys, nodedata_phys + pgdat_size - 1); nid = phys_to_nid(nodedata_phys); @@ -246,7 +246,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end) * Find a place for the bootmem map * nodedata_phys could be on other nodes by alloc_bootmem, * so need to sure bootmap_start not to be small, otherwise - * early_node_mem will get that with find_e820_area instead + * early_node_mem will get that with memblock_find_in_range instead * of alloc_bootmem, that could clash with reserved range */ bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn); @@ -258,12 +258,12 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end) bootmap = early_node_mem(nodeid, bootmap_start, end, bootmap_pages<<PAGE_SHIFT, PAGE_SIZE); if (bootmap == NULL) { - free_early(nodedata_phys, nodedata_phys + pgdat_size); + memblock_x86_free_range(nodedata_phys, nodedata_phys + pgdat_size); node_data[nodeid] = NULL; return; } bootmap_start = __pa(bootmap); - reserve_early(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT), + memblock_x86_reserve_range(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT), "BOOTMAP"); bootmap_size = init_bootmem_node(NODE_DATA(nodeid), @@ -417,7 +417,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr, nr_nodes = MAX_NUMNODES; } - size = (max_addr - addr - e820_hole_size(addr, max_addr)) / nr_nodes; + size = (max_addr - addr - memblock_x86_hole_size(addr, max_addr)) / nr_nodes; /* * Calculate the number of big nodes that can be allocated as a result * of consolidating the remainder. @@ -453,7 +453,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr, * non-reserved memory is less than the per-node size. */ while (end - physnodes[i].start - - e820_hole_size(physnodes[i].start, end) < size) { + memblock_x86_hole_size(physnodes[i].start, end) < size) { end += FAKE_NODE_MIN_SIZE; if (end > physnodes[i].end) { end = physnodes[i].end; @@ -467,7 +467,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr, * this one must extend to the boundary. */ if (end < dma32_end && dma32_end - end - - e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE) + memblock_x86_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE) end = dma32_end; /* @@ -476,7 +476,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr, * physical node. */ if (physnodes[i].end - end - - e820_hole_size(end, physnodes[i].end) < size) + memblock_x86_hole_size(end, physnodes[i].end) < size) end = physnodes[i].end; /* @@ -504,7 +504,7 @@ static u64 __init find_end_of_node(u64 start, u64 max_addr, u64 size) { u64 end = start + size; - while (end - start - e820_hole_size(start, end) < size) { + while (end - start - memblock_x86_hole_size(start, end) < size) { end += FAKE_NODE_MIN_SIZE; if (end > max_addr) { end = max_addr; @@ -533,7 +533,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size) * creates a uniform distribution of node sizes across the entire * machine (but not necessarily over physical nodes). */ - min_size = (max_addr - addr - e820_hole_size(addr, max_addr)) / + min_size = (max_addr - addr - memblock_x86_hole_size(addr, max_addr)) / MAX_NUMNODES; min_size = max(min_size, FAKE_NODE_MIN_SIZE); if ((min_size & FAKE_NODE_MIN_HASH_MASK) < min_size) @@ -566,7 +566,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size) * this one must extend to the boundary. */ if (end < dma32_end && dma32_end - end - - e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE) + memblock_x86_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE) end = dma32_end; /* @@ -575,7 +575,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size) * physical node. */ if (physnodes[i].end - end - - e820_hole_size(end, physnodes[i].end) < size) + memblock_x86_hole_size(end, physnodes[i].end) < size) end = physnodes[i].end; /* @@ -639,7 +639,7 @@ static int __init numa_emulation(unsigned long start_pfn, */ remove_all_active_ranges(); for_each_node_mask(i, node_possible_map) { - e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT, + memblock_x86_register_active_regions(i, nodes[i].start >> PAGE_SHIFT, nodes[i].end >> PAGE_SHIFT); setup_node_bootmem(i, nodes[i].start, nodes[i].end); } @@ -692,7 +692,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn, node_set(0, node_possible_map); for (i = 0; i < nr_cpu_ids; i++) numa_set_node(i, 0); - e820_register_active_regions(0, start_pfn, last_pfn); + memblock_x86_register_active_regions(0, start_pfn, last_pfn); setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT); } diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c index 9324f13..a17dffd 100644 --- a/arch/x86/mm/srat_32.c +++ b/arch/x86/mm/srat_32.c @@ -25,6 +25,7 @@ */ #include <linux/mm.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/mmzone.h> #include <linux/acpi.h> #include <linux/nodemask.h> @@ -264,7 +265,7 @@ int __init get_memcfg_from_srat(void) if (node_read_chunk(chunk->nid, chunk)) continue; - e820_register_active_regions(chunk->nid, chunk->start_pfn, + memblock_x86_register_active_regions(chunk->nid, chunk->start_pfn, min(chunk->end_pfn, max_pfn)); } /* for out of order entries in SRAT */ diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c index f9897f7..7f44eb6 100644 --- a/arch/x86/mm/srat_64.c +++ b/arch/x86/mm/srat_64.c @@ -16,6 +16,7 @@ #include <linux/module.h> #include <linux/topology.h> #include <linux/bootmem.h> +#include <linux/memblock.h> #include <linux/mm.h> #include <asm/proto.h> #include <asm/numa.h> @@ -98,15 +99,15 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit) unsigned long phys; length = slit->header.length; - phys = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, length, + phys = memblock_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, length, PAGE_SIZE); - if (phys == -1L) + if (phys == MEMBLOCK_ERROR) panic(" Can not save slit!\n"); acpi_slit = __va(phys); memcpy(acpi_slit, slit, length); - reserve_early(phys, phys + length, "ACPI SLIT"); + memblock_x86_reserve_range(phys, phys + length, "ACPI SLIT"); } /* Callback for Proximity Domain -> x2APIC mapping */ @@ -324,7 +325,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes) pxmram = 0; } - e820ram = max_pfn - (e820_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT); + e820ram = max_pfn - (memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT); /* We seem to lose 3 pages somewhere. Allow 1M of slack. */ if ((long)(e820ram - pxmram) >= (1<<(20 - PAGE_SHIFT))) { printk(KERN_ERR @@ -421,7 +422,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end) } for_each_node_mask(i, nodes_parsed) - e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT, + memblock_x86_register_active_regions(i, nodes[i].start >> PAGE_SHIFT, nodes[i].end >> PAGE_SHIFT); /* for out of order entries in SRAT */ sort_node_map(); diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 914f046..b511f19 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -44,6 +44,7 @@ #include <linux/bug.h> #include <linux/module.h> #include <linux/gfp.h> +#include <linux/memblock.h> #include <asm/pgtable.h> #include <asm/tlbflush.h> @@ -1735,7 +1736,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, __xen_write_cr3(true, __pa(pgd)); xen_mc_issue(PARAVIRT_LAZY_CPU); - reserve_early(__pa(xen_start_info->pt_base), + memblock_x86_reserve_range(__pa(xen_start_info->pt_base), __pa(xen_start_info->pt_base + xen_start_info->nr_pt_frames * PAGE_SIZE), "XEN PAGETABLES"); @@ -1773,7 +1774,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir))); - reserve_early(__pa(xen_start_info->pt_base), + memblock_x86_reserve_range(__pa(xen_start_info->pt_base), __pa(xen_start_info->pt_base + xen_start_info->nr_pt_frames * PAGE_SIZE), "XEN PAGETABLES"); diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index ad0047f..2ac8f29 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -8,6 +8,7 @@ #include <linux/sched.h> #include <linux/mm.h> #include <linux/pm.h> +#include <linux/memblock.h> #include <asm/elf.h> #include <asm/vdso.h> @@ -61,7 +62,7 @@ char * __init xen_memory_setup(void) * - xen_start_info * See comment above "struct start_info" in <xen/interface/xen.h> */ - reserve_early(__pa(xen_start_info->mfn_list), + memblock_x86_reserve_range(__pa(xen_start_info->mfn_list), __pa(xen_start_info->pt_base), "XEN START INFO"); diff --git a/mm/bootmem.c b/mm/bootmem.c index fda01a2..13b0caa 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -436,7 +436,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr, { #ifdef CONFIG_NO_BOOTMEM kmemleak_free_part(__va(physaddr), size); - free_early(physaddr, physaddr + size); + memblock_x86_free_range(physaddr, physaddr + size); #else unsigned long start, end; @@ -462,7 +462,7 @@ void __init free_bootmem(unsigned long addr, unsigned long size) { #ifdef CONFIG_NO_BOOTMEM kmemleak_free_part(__va(addr), size); - free_early(addr, addr + size); + memblock_x86_free_range(addr, addr + size); #else unsigned long start, end; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-26 3:11 ` caiqian @ 2010-09-26 6:44 ` Yinghai Lu 2010-09-26 6:55 ` CAI Qian 0 siblings, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-26 6:44 UTC (permalink / raw) To: caiqian; +Cc: linux-next, kexec, H. Peter Anvin On 09/25/2010 08:11 PM, caiqian@redhat.com wrote: > # /sbin/kexec -p '--command-line=ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll maxcpus=1 reset_devices cgroup_disable=memory ' --initrd=/boot/initrd-2.6.36-rc3+kdump.img /boot/vmlinuz-2.6.36-rc3+ > > BUG: unable to handle kernel paging request at ffff8800dfffe400 > IP: [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120 > PGD 1a26063 PUD 1fffc067 PMD 1fffd067 PTE 0 > Oops: 0000 [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu0/crash_notes > CPU 3 > Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon pcspkr 8139too 8139cp mii snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg i2c_piix4 i2c_core ext4 mbcache jbd2 floppy sd_mod crc_t10dif virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] > > Pid: 5671, comm: kexec Not tainted 2.6.35+ #11 /KVM > RIP: 0010:[<ffffffff8113376b>] [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120 are you kexec from 2.6.35+ to 2.6.36-rc3+? Yinghai _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-26 6:44 ` Yinghai Lu @ 2010-09-26 6:55 ` CAI Qian 2010-09-26 6:56 ` Yinghai Lu 0 siblings, 1 reply; 25+ messages in thread From: CAI Qian @ 2010-09-26 6:55 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > On 09/25/2010 08:11 PM, caiqian@redhat.com wrote: > > # /sbin/kexec -p '--command-line=ro > root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root > rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM > LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us > rhgb quiet console=tty0 console=ttyS0,115200 crashkernel=128M irqpoll > maxcpus=1 reset_devices cgroup_disable=memory ' > --initrd=/boot/initrd-2.6.36-rc3+kdump.img /boot/vmlinuz-2.6.36-rc3+ > > > > BUG: unable to handle kernel paging request at ffff8800dfffe400 > > IP: [<ffffffff8113376b>] per_cpu_ptr_to_phys+0x3b/0x120 > > PGD 1a26063 PUD 1fffc067 PMD 1fffd067 PTE 0 > > Oops: 0000 [#1] SMP > > last sysfs file: /sys/devices/system/cpu/cpu0/crash_notes > > CPU 3 > > Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 > iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state > nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon pcspkr > 8139too 8139cp mii snd_intel8x0 snd_ac97_codec ac97_bus snd_seq > snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg > i2c_piix4 i2c_core ext4 mbcache jbd2 floppy sd_mod crc_t10dif > virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod > [last unloaded: scsi_wait_scan] > > > > Pid: 5671, comm: kexec Not tainted 2.6.35+ #11 /KVM > > RIP: 0010:[<ffffffff8113376b>] [<ffffffff8113376b>] > per_cpu_ptr_to_phys+0x3b/0x120 > > are you kexec from 2.6.35+ to 2.6.36-rc3+? No, both kernels were the same version. I am sorry the above logs were misleading that were copy-and-pasted from different kernel versions. > > Yinghai > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-26 6:55 ` CAI Qian @ 2010-09-26 6:56 ` Yinghai Lu 2010-09-26 10:37 ` CAI Qian 0 siblings, 1 reply; 25+ messages in thread From: Yinghai Lu @ 2010-09-26 6:56 UTC (permalink / raw) To: CAI Qian; +Cc: linux-next, kexec, H. Peter Anvin On 09/25/2010 11:55 PM, CAI Qian wrote: >> >> are you kexec from 2.6.35+ to 2.6.36-rc3+? > No, both kernels were the same version. I am sorry the above logs were misleading that were copy-and-pasted from different kernel versions. can you check tip instead of next tree? Yinghai _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" 2010-09-26 6:56 ` Yinghai Lu @ 2010-09-26 10:37 ` CAI Qian 0 siblings, 0 replies; 25+ messages in thread From: CAI Qian @ 2010-09-26 10:37 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-next, kexec, H. Peter Anvin ----- "Yinghai Lu" <yinghai@kernel.org> wrote: > On 09/25/2010 11:55 PM, CAI Qian wrote: > >> > >> are you kexec from 2.6.35+ to 2.6.36-rc3+? > > No, both kernels were the same version. I am sorry the above logs > were misleading that were copy-and-pasted from different kernel > versions. > > can you check tip instead of next tree? I am wondering which patches there do you think would make the regression go away? > > Yinghai > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-09-28 14:01 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1909915255.2046011285586388234.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 11:21 ` kexec load failure introduced by "x86, memblock: Replace e820_/_early string with memblock_" caiqian
2010-09-27 22:22 ` Yinghai Lu
2010-09-27 22:50 ` H. Peter Anvin
2010-09-27 23:20 ` Yinghai Lu
2010-09-27 23:26 ` H. Peter Anvin
2010-09-27 23:32 ` Yinghai Lu
2010-09-27 23:34 ` H. Peter Anvin
2010-09-27 23:41 ` Yinghai Lu
2010-09-28 0:53 ` Vivek Goyal
2010-09-28 2:41 ` Yinghai Lu
2010-09-28 3:46 ` H. Peter Anvin
2010-09-28 7:14 ` Yinghai Lu
2010-09-28 14:01 ` Vivek Goyal
2010-09-28 13:54 ` Vivek Goyal
[not found] <1346740216.2003261285553562018.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-27 2:42 ` caiqian
2010-09-27 5:58 ` Yinghai Lu
2010-09-27 6:31 ` Yinghai Lu
2010-09-27 9:16 ` CAI Qian
[not found] <1834151968.1996101285512089968.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-26 14:47 ` caiqian
2010-09-26 19:42 ` Yinghai Lu
[not found] <1614106428.1991831285470588200.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-09-26 3:11 ` caiqian
2010-09-26 6:44 ` Yinghai Lu
2010-09-26 6:55 ` CAI Qian
2010-09-26 6:56 ` Yinghai Lu
2010-09-26 10:37 ` CAI Qian
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox