xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: cleanup highmap after brk is concluded
@ 2011-02-28 18:24 Stefano Stabellini
  2011-02-28 18:25 ` [PATCH 1/2] x86: Cleanup " stefano.stabellini
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Stefano Stabellini @ 2011-02-28 18:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: xen-devel, Jeremy Fitzhardinge, H. Peter Anvin,
	Konrad Rzeszutek Wilk, Yinghai Lu, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 3226 bytes --]

Hi all,
a little while ago I sent a patch titled "x86/mm/init: respect memblock
reserved regions when destroying mappings"
(https://lkml.org/lkml/2011/1/31/232) to fix a serious boot crash
problem on Xen (full logs attached):

Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600 Workstation/0A98h
RIP: e030:[<ffffffff81008314>]  [<ffffffff81008314>] get_phys_to_machine+0x44/0x50
RSP: e02b:ffffffff82001ca0  EFLAGS: 00010002
RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010
RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004
RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004
R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000
FS:  0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020)
Stack:
 ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10
 00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e
 ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007
Call Trace:
 [<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0
 [<ffffffff8100587e>] xen_pgd_val+0xe/0x10
 [<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e
 [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
 [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db
 [<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0
 [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21
 [<ffffffff8217de95>] setup_arch+0xa59/0xd89
 [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30
 [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
 [<ffffffff82177b35>] start_kernel+0xc6/0x4df
 [<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9
 [<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9


Even though a clear solution wasn't reached in the following discussion,
Yinghai Lu sent a patch to move cleanup_highmap() after reserve_brk() so
that we don't have to clear the initial mappings in two steps.
The patch is a nice cleanup and with few small changes to honour the
variable max_pfn_mapped can be used to fix the boot issue on Xen: all we
have to do is setting max_pfn_mapped to the last valid pfn mapped on Xen
that is the page baking _end.


The list of patches with diffstat follows, comments and suggestions are
very welcome:

Stefano Stabellini (1):
      xen: set max_pfn_mapped to the last pfn mapped

Yinghai Lu (1):
      x86: Cleanup highmap after brk is concluded

 arch/x86/kernel/head64.c |    3 ---
 arch/x86/kernel/setup.c  |    6 ++++++
 arch/x86/mm/init.c       |   19 -------------------
 arch/x86/mm/init_64.c    |   11 ++++++-----
 arch/x86/xen/mmu.c       |   13 +++++++------
 5 files changed, 19 insertions(+), 33 deletions(-)


A git branch based on 2.6.38-rc6 is available here:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-rc6-mm-fix

Cheers,

Stefano

[-- Attachment #2: Type: text/plain, Size: 16800 bytes --]

__  __            _  _    _   ___              _  _                     
 \ \/ /___ _ __   | || |  / | / _ \    _ __ ___| || |     _ __  _ __ ___ 
  \  // _ \ '_ \  | || |_ | || | | |__| '__/ __| || |_ __| '_ \| '__/ _ \
  /  \  __/ | | | |__   _|| || |_| |__| | | (__|__   _|__| |_) | | |  __/
 /_/\_\___|_| |_|    |_|(_)_(_)___/   |_|  \___|  |_|    | .__/|_|  \___|
                                                         |_|             
(XEN) Xen version 4.1.0-rc4-pre (sstabellini@localdomain) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) Thu Feb 10 18:45:30 GMT 2011
(XEN) Latest ChangeSet: Thu Feb 10 09:02:50 2011 +0000 22895:19b2424be183
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: iommu=1 com1=115200,8n1 console=com1,tty dom0_mem=770M
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 0000000000096400 (usable)
(XEN)  0000000000096400 - 00000000000a0000 (reserved)
(XEN)  00000000000e8000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000cffc2840 (usable)
(XEN)  00000000cffc2840 - 00000000d0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000130000000 (usable)
(XEN) ACPI: RSDP 000E9810, 0024 (r2 HPQOEM)
(XEN) ACPI: XSDT CFFC52EC, 006C (r1 HPQOEM SLIC-WKS 20080808             0)
(XEN) ACPI: FACP CFFC5494, 00F4 (r3 HPQOEM SEABURG         1             0)
(XEN) ACPI Error (tbfadt-0455): 32/64X address mismatch in "Gpe0Block": [0000F828] [000000000001F030], using 64X [20070126]
(XEN) ACPI: DSDT CFFC5A7A, 26A4 (r1 HPQOEM     DSDT        1 MSFT  100000E)
(XEN) ACPI: FACS CFFC5200, 0040
(XEN) ACPI: SSDT CFFC811E, 6DC0 (r1 HPQOEM  PROJECT        1 MSFT  100000E)
(XEN) ACPI: APIC CFFC5588, 00D4 (r1 HPQOEM SEABURG         1             0)
(XEN) ACPI: ASF! CFFC565C, 006A (r32 HPQOEM SEABURG         1             0)
(XEN) ACPI: MCFG CFFC585E, 003C (r1 HPQOEM SEABURG         1             0)
(XEN) ACPI: SLIC CFFC589A, 0176 (r1 HPQOEM SLIC-WKS        1             0)
(XEN) ACPI: HPET CFFC5A10, 0038 (r1 HPQOEM SEABURG         1             0)
(XEN) ACPI: TCPA CFFC5A48, 0032 (r1 HPQOEM SEABURG         1             0)
(XEN) ACPI: DMAR CFFC56C6, 0120 (r1 HPQOEM SEABURG         1             0)
(XEN) System RAM: 4095MB (4193632kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000130000000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000fe700
(XEN) DMI 2.5 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0xf808
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[f804,460], pm1x_evt[f800,0]
(XEN) ACPI:                  wakeup_vec[cffc520c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
(XEN) Processor #1 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
(XEN) Processor #3 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x08] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x09] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x0a] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x0b] disabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec89000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec89000, GSI 24-47
(XEN) ACPI: IOAPIC (id[0x03] address[0xfec88000] gsi_base[48])
(XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec88000, GSI 48-71
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 3 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
(XEN) PCI: MCFG area at e0000000 reserved in E820
(XEN) Table is not found!
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) IRQ limits: 72 GSI, 712 MSI/MSI-X
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2000.131 MHz processor.
(XEN) Initing memory sharing.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation not enabled.
(XEN) Intel VT-d Interrupt Remapping not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 14.318MHz HPET
XEN) Allocated console ring of 32 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) Brought up 4 CPUs
(XEN) HPET: 3 timers in total, 0 timers will be used for broadcast
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) elf_parse_binary: phdr: paddr=0x1000000 memsz=0xecf000
(XEN) elf_parse_binary: phdr: paddr=0x2000000 memsz=0x1606c0
(XEN) elf_parse_binary: phdr: paddr=0x2161000 memsz=0x8c8
(XEN) elf_parse_binary: phdr: paddr=0x2162000 memsz=0x14580
(XEN) elf_parse_binary: phdr: paddr=0x2177000 memsz=0x347000
(XEN) elf_parse_binary: memory: 0x1000000 -> 0x24be000
(XEN) elf_xen_parse_note: GUEST_OS = "linux"
(XEN) elf_xen_parse_note: GUEST_VERSION = "2.6"
(XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0"
(XEN) elf_xen_parse_note: VIRT_BASE = 0xffffffff80000000
(XEN) elf_xen_parse_note: ENTRY = 0xffffffff82177200
(XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xffffffff81001000
(XEN) elf_xen_parse_note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb"
(XEN) elf_xen_parse_note: PAE_MODE = "yes"
(XEN) elf_xen_parse_note: LOADER = "generic"
(XEN) elf_xen_parse_note: unknown xen elf note (0xd)
(XEN) elf_xen_parse_note: SUSPEND_CANCEL = 0x1
(XEN) elf_xen_parse_note: HV_START_LOW = 0xffff800000000000
(XEN) elf_xen_parse_note: PADDR_OFFSET = 0x0
(XEN) elf_xen_addr_calc_check: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff81000000
(XEN)     virt_kend        = 0xffffffff824be000
(XEN)     virt_entry       = 0xffffffff82177200
(XEN)     p2m_base         = 0xffffffffffffffff
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x24be000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000124000000->0000000128000000 (180736 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff824be000
(XEN)  Init. ramdisk: ffffffff824be000->ffffffff824be000
(XEN)  Phys-Mach map: ffffffff824be000->ffffffff8263f000
(XEN)  Start info:    ffffffff8263f000->ffffffff8263f4b4
(XEN)  Page tables:   ffffffff82640000->ffffffff82657000
(XEN)  Boot stack:    ffffffff82657000->ffffffff82658000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82800000
(XEN)  ENTRY ADDRESS: ffffffff82177200
(XEN) Dom0 has maximum 4 VCPUs
(XEN) elf_load_binary: phdr 0 at 0xffffffff81000000 -> 0xffffffff81ecf000
(XEN) elf_load_binary: phdr 1 at 0xffffffff82000000 -> 0xffffffff821606c0
(XEN) elf_load_binary: phdr 2 at 0xffffffff82161000 -> 0xffffffff821618c8
(XEN) elf_load_binary: phdr 3 at 0xffffffff82162000 -> 0xffffffff82176580
(XEN) elf_load_binary: phdr 4 at 0xffffffff82177000 -> 0xffffffff82252000
(XEN) Scrubbing Free RAM: ................................done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 216kB init memory.
mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Linux version 2.6.38-rc6+ (sstabellini@cosworth) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1270 SMP Mon Feb 28 18:04:13 GMT 2011
[    0.000000] Command line: earlyprintk=xenboot debug root=/dev/sda1 console=hvc0 loglevel=9
[    0.000000] released 0 pages of unused memory
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 0000000000096000 (usable)
[    0.000000]  Xen: 0000000000096400 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000030200000 (usable)
[    0.000000]  Xen: 00000000cffc2840 - 00000000d0000000 (reserved)
[    0.000000]  Xen: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 00000001cfdc2000 (usable)
[    0.000000] bootconsole [xenboot0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.5 present.
[    0.000000] DMI: Hewlett-Packard HP xw8600 Workstation/0A98h, BIOS 786F5 v01.27 08/08/2008
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1cfdc2 max_arch_pfn = 0x400000000
[    0.000000] last_pfn = 0x30200 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000fe700] fe700
[    0.000000] Scanning 0 areas for low memory corruption
[    0.000000] initial memory mapped : 0 - 02fff000
[    0.000000] init_memory_mapping: 0000000000000000-0000000030200000
[    0.000000]  0000000000 - 0030200000 page 4k
[    0.000000] kernel direct mapping tables up to 30200000 @ 2e7c000-2fff000
[    0.000000] init_memory_mapping: 0000000100000000-00000001cfdc2000
[    0.000000]  0100000000 - 01cfdc2000 page 4k
[    0.000000] kernel direct mapping tables up to 1cfdc2000 @ 2f378000-30200000
[    0.000000] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    0.000000] IP: [<ffffffff81008314>] get_phys_to_machine+0x44/0x50
[    0.000000] PGD 0 
[    0.000000] Oops: 0000 [#1] SMP 
[    0.000000] last sysfs file: 
[    0.000000] CPU 0 
[    0.000000] Modules linked in:
[    0.000000] 
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600 Workstation/0A98h
[    0.000000] RIP: e030:[<ffffffff81008314>]  [<ffffffff81008314>] get_phys_to_machine+0x44/0x50
[    0.000000] RSP: e02b:ffffffff82001ca0  EFLAGS: 00010002
[    0.000000] RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010
[    0.000000] RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004
[    0.000000] RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000
[    0.000000] R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004
[    0.000000] R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000
[    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020)
[    0.000000] Stack:
[    0.000000]  ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10
[    0.000000]  00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e
[    0.000000]  ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0
[    0.000000]  [<ffffffff8100587e>] xen_pgd_val+0xe/0x10
[    0.000000]  [<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e
[    0.000000]  [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
[    0.000000]  [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db
[    0.000000]  [<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0
[    0.000000]  [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21
[    0.000000]  [<ffffffff8217de95>] setup_arch+0xa59/0xd89
[    0.000000]  [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30
[    0.000000]  [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
[    0.000000]  [<ffffffff82177b35>] start_kernel+0xc6/0x4df
[    0.000000]  [<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9
[    0.000000]  [<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9
[    0.000000] Code: 48 89 fa 48 8b 05 ed 1a 25 01 48 89 f9 48 c1 ea 12 48 c1 e9 09 81 e7 ff 01 00 00 89 d2 81 e1 ff 01 00 00 48 8b 04 d0 48 8b 04 c8 <48> 8b 04 f8 c9 c3 66 0f 1f 44 00 00 55 48 89 e5 e8 97 3a 00 00 
[    0.000000] RIP  [<ffffffff81008314>] get_phys_to_machine+0x44/0x50
[    0.000000]  RSP <ffffffff82001ca0>
[    0.000000] CR2: 0000000000000000
[    0.000000] ---[ end trace a7919e7f17c0a725 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Pid: 0, comm: swapper Tainted: G      D     2.6.38-rc6+ #1270
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff819b68ca>] ? panic+0xbf/0x1f0
[    0.000000]  [<ffffffff8107ff21>] ? __blocking_notifier_call_chain+0x21/0x90
[    0.000000]  [<ffffffff8105cdd1>] ? do_exit+0x791/0x860
[    0.000000]  [<ffffffff81058990>] ? kmsg_dump+0x50/0x110
[    0.000000]  [<ffffffff819bb0c4>] ? oops_end+0xe4/0xf0
[    0.000000]  [<ffffffff810383a9>] ? no_context+0xf9/0x270
[    0.000000]  [<ffffffff81038675>] ? __bad_area_nosemaphore+0x155/0x200
[    0.000000]  [<ffffffff810082d9>] ? get_phys_to_machine+0x9/0x50
[    0.000000]  [<ffffffff8100749f>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
[    0.000000]  [<ffffffff81038733>] ? bad_area_nosemaphore+0x13/0x20
[    0.000000]  [<ffffffff819bd97e>] ? do_page_fault+0x3ae/0x4d0
[    0.000000]  [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
[    0.000000]  [<ffffffff8100749f>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
[    0.000000]  [<ffffffff8100749f>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
[    0.000000]  [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
[    0.000000]  [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30
[    0.000000]  [<ffffffff819ba3d5>] ? page_fault+0x25/0x30
[    0.000000]  [<ffffffff81008314>] ? get_phys_to_machine+0x44/0x50
[    0.000000]  [<ffffffff8100582c>] ? pte_mfn_to_pfn+0x8c/0xb0
[    0.000000]  [<ffffffff8100587e>] ? xen_pgd_val+0xe/0x10
[    0.000000]  [<ffffffff8100498f>] ? __raw_callee_save_xen_pgd_val+0x11/0x1e
[    0.000000]  [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
[    0.000000]  [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db
[    0.000000]  [<ffffffff8195469f>] ? init_memory_mapping+0x31f/0x6d0
[    0.000000]  [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21
[    0.000000]  [<ffffffff8217de95>] ? setup_arch+0xa59/0xd89
[    0.000000]  [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30
[    0.000000]  [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
[    0.000000]  [<ffffffff82177b35>] ? start_kernel+0xc6/0x4df
[    0.000000]  [<ffffffff821772c5>] ? x86_64_start_reservations+0xa5/0xc9
[    0.000000]  [<ffffffff8217b6fa>] ? xen_start_kernel+0x5d3/0x6a9
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] x86: Cleanup highmap after brk is concluded
  2011-02-28 18:24 [PATCH 0/2] x86: cleanup highmap after brk is concluded Stefano Stabellini
@ 2011-02-28 18:25 ` stefano.stabellini
  2011-02-28 18:25 ` [PATCH 2/2] xen: set max_pfn_mapped to the last pfn mapped stefano.stabellini
  2011-02-28 18:42 ` [PATCH 0/2] x86: cleanup highmap after brk is concluded Yinghai Lu
  2 siblings, 0 replies; 6+ messages in thread
From: stefano.stabellini @ 2011-02-28 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: xen-devel, hpa, konrad.wilk, yinghai, jeremy, Stefano.Stabellini,
	Stefano Stabellini

From: Yinghai Lu <yinghai@kernel.org>

Now cleanup_highmap actually is in two steps: one is early in head64.c
and only clears above _end; a second one is in init_memory_mapping() and
tries to clean from _brk_end to _end.
It should check if those boundaries are PMD_SIZE aligned but currently
does not.
Also init_memory_mapping() is called several times for numa or memory
hotplug, so we really should not handle initial kernel mappings there.

This patch moves cleanup_highmap() down after _brk_end is settled so
we can do everything in one step.
Also we honor max_pfn_mapped in the implementation of cleanup_highmap.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/kernel/head64.c |    3 ---
 arch/x86/kernel/setup.c  |    6 ++++++
 arch/x86/mm/init.c       |   19 -------------------
 arch/x86/mm/init_64.c    |   11 ++++++-----
 4 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2d2673c..5655c22 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -77,9 +77,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 
-	/* Cleanup the over mapped high alias */
-	cleanup_highmap();
-
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d3cfe26..f03e6e0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -297,6 +297,9 @@ static void __init init_gbpages(void)
 static inline void init_gbpages(void)
 {
 }
+static void __init cleanup_highmap(void)
+{
+}
 #endif
 
 static void __init reserve_brk(void)
@@ -922,6 +925,9 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	reserve_brk();
 
+	/* Cleanup the over mapped high alias after _brk_end*/
+	cleanup_highmap();
+
 	memblock.current_limit = get_max_mapped();
 	memblock_x86_fill();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 947f42a..f13ff3a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -279,25 +279,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 	load_cr3(swapper_pg_dir);
 #endif
 
-#ifdef CONFIG_X86_64
-	if (!after_bootmem && !start) {
-		pud_t *pud;
-		pmd_t *pmd;
-
-		mmu_cr4_features = read_cr4();
-
-		/*
-		 * _brk_end cannot change anymore, but it and _end may be
-		 * located on different 2M pages. cleanup_highmap(), however,
-		 * can only consider _end when it runs, so destroy any
-		 * mappings beyond _brk_end here.
-		 */
-		pud = pud_offset(pgd_offset_k(_brk_end), _brk_end);
-		pmd = pmd_offset(pud, _brk_end - 1);
-		while (++pmd <= pmd_offset(pud, (unsigned long)_end - 1))
-			pmd_clear(pmd);
-	}
-#endif
 	__flush_tlb_all();
 
 	if (!after_bootmem && e820_table_end > e820_table_start)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..a8d08c2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -51,6 +51,7 @@
 #include <asm/numa.h>
 #include <asm/cacheflush.h>
 #include <asm/init.h>
+#include <asm/setup.h>
 
 static int __init parse_direct_gbpages_off(char *arg)
 {
@@ -293,18 +294,18 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
  * to the compile time generated pmds. This results in invalid pmds up
  * to the point where we hit the physaddr 0 mapping.
  *
- * We limit the mappings to the region from _text to _end.  _end is
- * rounded up to the 2MB boundary. This catches the invalid pmds as
+ * We limit the mappings to the region from _text to _brk_end.  _brk_end
+ * is rounded up to the 2MB boundary. This catches the invalid pmds as
  * well, as they are located before _text:
  */
 void __init cleanup_highmap(void)
 {
 	unsigned long vaddr = __START_KERNEL_map;
-	unsigned long end = roundup((unsigned long)_end, PMD_SIZE) - 1;
+	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
 	pmd_t *pmd = level2_kernel_pgt;
-	pmd_t *last_pmd = pmd + PTRS_PER_PMD;
 
-	for (; pmd < last_pmd; pmd++, vaddr += PMD_SIZE) {
+	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
 		if (pmd_none(*pmd))
 			continue;
 		if (vaddr < (unsigned long) _text || vaddr > end)
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] xen: set max_pfn_mapped to the last pfn mapped
  2011-02-28 18:24 [PATCH 0/2] x86: cleanup highmap after brk is concluded Stefano Stabellini
  2011-02-28 18:25 ` [PATCH 1/2] x86: Cleanup " stefano.stabellini
@ 2011-02-28 18:25 ` stefano.stabellini
  2011-02-28 18:42 ` [PATCH 0/2] x86: cleanup highmap after brk is concluded Yinghai Lu
  2 siblings, 0 replies; 6+ messages in thread
From: stefano.stabellini @ 2011-02-28 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: xen-devel, hpa, konrad.wilk, yinghai, jeremy, Stefano.Stabellini,
	Stefano Stabellini

From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Do not set max_pfn_mapped to the end of the initial memory mappings,
that also contain pages that don't belong in pfn space (like the mfn
list).

Set max_pfn_mapped to the last real pfn mapped in the initial memory
mappings that is the pfn backing _end.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/mmu.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e92b61..6092f73 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
 		for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) {
 			pte_t pte;
 
-			if (pfn > max_pfn_mapped)
-				max_pfn_mapped = pfn;
-
 			if (!pte_none(pte_page[pteidx]))
 				continue;
 
@@ -1713,6 +1710,12 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 	pud_t *l3;
 	pmd_t *l2;
 
+	/* max_pfn_mapped is the last pfn mapped in the initial memory
+	 * mappings. Considering that on Xen after the kernel mappings we
+	 * have the mappings of some pages that don't exist in pfn space, we
+	 * set max_pfn_mapped to the last real pfn mapped. */
+	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
+
 	/* Zap identity mapping */
 	init_level4_pgt[0] = __pgd(0);
 
@@ -1817,9 +1820,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 	initial_kernel_pmd =
 		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
 
-	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
-				  xen_start_info->nr_pt_frames * PAGE_SIZE +
-				  512*1024);
+	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
 
 	kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
 	memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
  2011-02-28 18:24 [PATCH 0/2] x86: cleanup highmap after brk is concluded Stefano Stabellini
  2011-02-28 18:25 ` [PATCH 1/2] x86: Cleanup " stefano.stabellini
  2011-02-28 18:25 ` [PATCH 2/2] xen: set max_pfn_mapped to the last pfn mapped stefano.stabellini
@ 2011-02-28 18:42 ` Yinghai Lu
  2011-03-01 15:13   ` Stefano Stabellini
  2 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2011-02-28 18:42 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Jeremy Fitzhardinge, xen-devel, linux-kernel,
	Konrad Rzeszutek Wilk, H. Peter Anvin

On 02/28/2011 10:24 AM, Stefano Stabellini wrote:
> Hi all,
> a little while ago I sent a patch titled "x86/mm/init: respect memblock
> reserved regions when destroying mappings"
> (https://lkml.org/lkml/2011/1/31/232) to fix a serious boot crash
> problem on Xen (full logs attached):
> 
> Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600 Workstation/0A98h
> RIP: e030:[<ffffffff81008314>]  [<ffffffff81008314>] get_phys_to_machine+0x44/0x50
> RSP: e02b:ffffffff82001ca0  EFLAGS: 00010002
> RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004
> RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000
> R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004
> R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000
> FS:  0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020)
> Stack:
>  ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10
>  00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e
>  ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007
> Call Trace:
>  [<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0
>  [<ffffffff8100587e>] xen_pgd_val+0xe/0x10
>  [<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e
>  [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
>  [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db
>  [<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0
>  [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21
>  [<ffffffff8217de95>] setup_arch+0xa59/0xd89
>  [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30
>  [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
>  [<ffffffff82177b35>] start_kernel+0xc6/0x4df
>  [<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9
>  [<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9
> 
> 
> Even though a clear solution wasn't reached in the following discussion,
> Yinghai Lu sent a patch to move cleanup_highmap() after reserve_brk() so
> that we don't have to clear the initial mappings in two steps.
> The patch is a nice cleanup and with few small changes to honour the
> variable max_pfn_mapped can be used to fix the boot issue on Xen: all we
> have to do is setting max_pfn_mapped to the last valid pfn mapped on Xen
> that is the page baking _end.
> 
> 
> The list of patches with diffstat follows, comments and suggestions are
> very welcome:
> 
> Stefano Stabellini (1):
>       xen: set max_pfn_mapped to the last pfn mapped
> 
> Yinghai Lu (1):
>       x86: Cleanup highmap after brk is concluded
> 
>  arch/x86/kernel/head64.c |    3 ---
>  arch/x86/kernel/setup.c  |    6 ++++++
>  arch/x86/mm/init.c       |   19 -------------------
>  arch/x86/mm/init_64.c    |   11 ++++++-----
>  arch/x86/xen/mmu.c       |   13 +++++++------
>  5 files changed, 19 insertions(+), 33 deletions(-)
> 
> 
> A git branch based on 2.6.38-rc6 is available here:
> 
Can you please rebase them on top of tip/x86/mm?

http://people.redhat.com/mingo/tip.git/readme.txt

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
  2011-02-28 18:42 ` [PATCH 0/2] x86: cleanup highmap after brk is concluded Yinghai Lu
@ 2011-03-01 15:13   ` Stefano Stabellini
  2011-03-08 14:27     ` Stefano Stabellini
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Stabellini @ 2011-03-01 15:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, linux-kernel@vger.kernel.org,
	xen-devel@lists.xensource.com, Jeremy Fitzhardinge,
	H. Peter Anvin, Konrad Rzeszutek Wilk

On Mon, 28 Feb 2011, Yinghai Lu wrote:
> Can you please rebase them on top of tip/x86/mm?
> 
> http://people.redhat.com/mingo/tip.git/readme.txt
> 
 
Sure, I rebased the two patches on the very latest tip/x86/mm:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-mm-fix

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
  2011-03-01 15:13   ` Stefano Stabellini
@ 2011-03-08 14:27     ` Stefano Stabellini
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Stabellini @ 2011-03-08 14:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Yinghai Lu, linux-kernel@vger.kernel.org,
	xen-devel@lists.xensource.com, Jeremy Fitzhardinge,
	H. Peter Anvin, Konrad Rzeszutek Wilk

On Tue, 1 Mar 2011, Stefano Stabellini wrote:
> On Mon, 28 Feb 2011, Yinghai Lu wrote:
> > Can you please rebase them on top of tip/x86/mm?
> > 
> > http://people.redhat.com/mingo/tip.git/readme.txt
> > 
>  
> Sure, I rebased the two patches on the very latest tip/x86/mm:
> 
> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-mm-fix
> 

ping?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-03-08 14:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-28 18:24 [PATCH 0/2] x86: cleanup highmap after brk is concluded Stefano Stabellini
2011-02-28 18:25 ` [PATCH 1/2] x86: Cleanup " stefano.stabellini
2011-02-28 18:25 ` [PATCH 2/2] xen: set max_pfn_mapped to the last pfn mapped stefano.stabellini
2011-02-28 18:42 ` [PATCH 0/2] x86: cleanup highmap after brk is concluded Yinghai Lu
2011-03-01 15:13   ` Stefano Stabellini
2011-03-08 14:27     ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).