From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH] to fix ACPI slit table access at runtime Date: Thu, 25 Feb 2010 13:03:21 +0000 Message-ID: <4B868329020000780003144E@vpn.id2.novell.com> References: <8EA2C2C4116BF44AB370468FBF85A7770124F1698B@orsmsx504.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: "xen-devel@lists.xensource.com" , Ian Jackson List-Id: xen-devel@lists.xenproject.org >>> Keir Fraser 25.02.10 13:13 >>> >By the way, a separate question for Jan: I notice you added an >alloc_boot_pages() call in arch/x86/numa.c, and get a virtual address = from >mfn_to_virt(). This is bogus for x86_32, so can we just stub out that >function for the 32-bit build: looks like the caller would then gracefully= >fail? I wonder whether this could explain the crash that Ian Jackson >reported on one system with PAE Xen the other day. Yes, quite possible. Although - specifically after his report - I ran a 32-bit image on my only somewhat NUMA-ish system, and did not see it die: (XEN) SRAT: PXM 0 -> APIC 0 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 1 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 2 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 3 -> Node 0 (XEN) SRAT: PXM 1 -> APIC 4 -> Node 1 (XEN) SRAT: PXM 1 -> APIC 5 -> Node 1 (XEN) SRAT: PXM 1 -> APIC 6 -> Node 1 (XEN) SRAT: PXM 1 -> APIC 7 -> Node 1 (XEN) SRAT: Node 0 PXM 0 0-a0000 (XEN) SRAT: Node 0 PXM 0 100000-80000000 (XEN) SRAT: Node 1 PXM 1 80000000-d0000000 (XEN) SRAT: Node 1 PXM 1 100000000-130000000 (XEN) NUMA: Allocated memnodemap from 2e37e000 - 2e380000 (XEN) NUMA: Using 8 for the hash shift. All it took was the patch that's now c/s 20978. But indeed the addresses printed suggest this cannot work properly, even if it didn't crash. Whether this indeed causes Ian's problems could be easily clarified by him adding loglvl=3Dall to see those two NUMA: messages. What makes me think it may not is that CR2 was zero in his log. Removing the code for 32-bit altogether is certainly one option (in that case I'd want to see _memnodemap to be reasonably increased though, plus we should probably make an attempt to reduce memnodemapsize again - the hash shift currently is unduly small - and I have a patch for the original Linux code to do so). All other options are likely indeed not worth it to make 32-bit happy. Jan