From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: current xen/stable 2.6.32.9 failed upgrade from 2.6.31.6 Date: Tue, 23 Mar 2010 22:10:48 -0700 Message-ID: <4BA99ED8.8030209@goop.org> References: <20100306115833.GA28039@orion.carnet.hr> <20100306132711.GK2580@reaktio.net> <20100307233147.GA20068@orion.carnet.hr> <20100311150823.GA9011@orion.carnet.hr> <20100311192456.GY1878@reaktio.net> <20100312114139.GA4067@orion.carnet.hr> <20100312120914.GA15561@orion.carnet.hr> <20100323231853.GA21109@orion.carnet.hr> <20100323232223.GA22681@orion.carnet.hr> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20100323232223.GA22681@orion.carnet.hr> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Josip Rodin Cc: Xen-devel List-Id: xen-devel@lists.xenproject.org On 03/23/2010 04:22 PM, Josip Rodin wrote: > On Wed, Mar 24, 2010 at 12:18:53AM +0100, Josip Rodin wrote: > >> On Fri, Mar 12, 2010 at 01:09:14PM +0100, Josip Rodin wrote: >> >>> On Fri, Mar 12, 2010 at 12:41:39PM +0100, Josip Rodin wrote: >>> >>>> And now here goes the whole output preceding the 2.6.32 crash: >>>> >>> [...] >>> >>>> In the meantime there was another update to the stable branch, I'll go >>>> compile that... >>>> >>> The symptoms remained the same, only the CPU MHz calculation and some memory >>> offsets are different. >>> >>> (XEN) mm.c:720:d0 Bad L1 flags 800000 >>> (XEN) mm.c:4221:d0 ptwr_emulate: could not get_page_from_l1e() >>> (XEN) d0:v0: unhandled page fault (ec=0003) >>> (XEN) Pagetable walk from ffff8800014fdfd8: >>> (XEN) L4[0x110] = 0000000115002067 0000000000001002 >>> (XEN) L3[0x000] = 0000000115006067 0000000000001006 >>> (XEN) L2[0x00a] = 0000000116c8a067 0000000000002c8a >>> (XEN) L1[0x0fd] = 00100001154fd065 00000000000014fd >>> (XEN) domain_crash_sync called from entry.S >>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>> (XEN) ----[ Xen-3.4 x86_64 debug=n Not tainted ]---- >>> >> FWIW I tried to git bisect this in the last couple of days, but the result >> turned out to be fairly obvious and useless as after 14 bisections >> I only came to this: >> >> commit 18ecfad3aaeead019b0e07078f643deaa7d10d44 >> x86: make /dev/mem mappings _PAGE_IOMAP >> commit 56f27a6d47275f6dc94adf3ecc5fe958cdcdebee >> xen/dom0: add XEN_DOM0 config option >> >> I didn't follow through with the last bisection, it had seemed increasingly >> futile for a while now... :) >> >> I saw a peculiar side effect at one point, when I went back to a random >> working 2.6.31.1 dom0, all userland processes started crashing with Illegal >> instruction. One iLO reset later, it's all good again. I'm guessing it was >> a transient broken state. >> >> And then when I gave up and updated to latest xen/stable for one last try, >> that was the biggest d'oh moment - it's fixed :) Was it de67ec8b? >> Yes. > BTW with the working .32 kernel, the log says: > > [ 0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) > [ 0.000000] IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-0 > [ 0.000000] ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24]) > [ 0.000000] IOAPIC[1]: apic_id 9, version 0, address 0xfec80000, GSI 24-24 > [ 0.000000] ACPI: IOAPIC (id[0x0a] address[0xfec80400] gsi_base[48]) > [ 0.000000] IOAPIC[2]: apic_id 10, version 0, address 0xfec80400, GSI 48-48 > [ 0.000000] ACPI: IOAPIC (id[0x0b] address[0xfec84000] gsi_base[72]) > [ 0.000000] IOAPIC[3]: apic_id 11, version 0, address 0xfec84000, GSI 72-72 > [ 0.000000] ACPI: IOAPIC (id[0x0c] address[0xfec84400] gsi_base[96]) > [ 0.000000] IOAPIC[4]: apic_id 12, version 0, address 0xfec84400, GSI 96-96 > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) > [ 0.000000] ERROR: Unable to locate IOAPIC for GSI 2 > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > [ 0.000000] ERROR: Unable to locate IOAPIC for GSI 9 > > [...] > > [ 0.023694] ACPI: bus type pci registered > [ 0.023915] PCI: Found Intel Corporation E7520 Memory Controller Hub with MMCONFIG support. > [ 0.023935] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255 > [ 0.023942] PCI: Not using MMCONFIG. > [ 0.023948] PCI: Using configuration type 1 for base access > [ 0.023959] PCI: HP ProLiant DL380 detected, enabling pci=bfsort. > [ 0.028634] bio: create slab at 0 > [ 0.030115] ERROR: Unable to locate IOAPIC for GSI 9 > > Is there anything I can do to avoid these? > These are just noise; the kernel thinks it can poke at the IO APICs, but they're owned by Xen and so don't exist for the kernel; instead some alternate mechanisms come into play to keep the interrupts flowing. At some point I hope we can completely remove all trace of the APICs from the kernel's sight, so it won't even try to access them and print these confused messages. J