From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: L1[0x1fb] = 0000000000000000 which faults on one type of machine but on another works? Date: Thu, 17 Mar 2011 11:52:12 -0400 Message-ID: <20110317155211.GA29603@dumpdata.com> References: <20110316221912.GA13035@dumpdata.com> <4D81EF97020000780003706E@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4D81EF97020000780003706E@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: Jeremy Fitzhardinge , xen-devel@lists.xensource.com, andrew.thomas@oracle.com, Ian Campbell , keir.xen@gmail.com, swente@infinitumb.de, gianni.tedesco@citrix.com List-Id: xen-devel@lists.xenproject.org On Thu, Mar 17, 2011 at 10:25:11AM +0000, Jan Beulich wrote: > >>> On 16.03.11 at 23:19, Konrad Rzeszutek Wilk wrote: > > But one thing I can't understand is why on one machine (IBM x3850) > > I get this crash, while another one with the same pagetable contents > > (L1 has nothing for 0x1fb) it works just fine? I added a panic and used > > the Xen hypervisor kdb to manually inspect the pagetable, and it has > > the same contents as the IBM x3850 -but it boots fine with this invalid > > value. > > Any ideas? > > Without seeing the full stack trace it's hard to tell. To me, it looks > like a mistake for native_apic_read() to be called at all under Xen, > and perhaps there's one lurking somewhere that gets hit only on > those IBM (Summit?) machines. That was it. When we bootup we call 'set_xen_basic_apic_ops' which sets apic->read to xen_apic_read. The default 'apic' is set to apic_flat, so in essence we change apic_flat->read from native_apic_read to xen_apic_read. During bootup, the default_acpi_madt_oem_check is run which runs through all of the apic_probe[] array, on which the last one is is apic_physflat. And apic_physflat->probe() returns true on this IBM Summit box (and ES7000 boxs, and whatever has FADT set to ACPI_FADT_APIC_PHYSICAL) so we set apic now to apic_physflat and the apic->read ends up being native_apic_read. 2.6.38 fixes this by allowing in acpi_register_lapic_address, the the set_fixmap_nocache(FIX_APIC_BASE, address) to be called and we can provide it with a dummy page and native_apic_read can happily read from that fake page.