From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH] Fix hypervisor crash with unpopulated NUMA nodes Date: Wed, 7 Oct 2009 14:13:06 +0200 Message-ID: <4ACC85D2.9090100@amd.com> References: <4ACC6346.5080309@amd.com> <4ACC9D73020000780001879A@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4ACC9D73020000780001879A@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Jan Beulich wrote: >>>> Andre Przywara 07.10.09 11:45 >>> >> on NUMA systems with memory-less nodes Xen crashes quite early in the >> hypervisor (while initializing the heaps). This is not an issue if this >> happens to be the last node, but "inner" nodes trigger this reliably. >> On multi-node processors it is much more likely to leave a node unequipped. >> The attached patch fixes this by enumerating the node via the >> node_online_map instead of counting from 0 to num_nodes. > > While I do not see anything wrong with the patch, I still wonder why it > would be needed: It seems to indicate that node_online_map represents > only nodes with memory, but imo should be representing nodes with > memory or processors (leaving aside pure I/O nodes for the moment). > So perhaps there's rather a problem with the setup of node_online_map > somewhere? Yes, because the map creation is callback driven by ACPI code. The BIOS of my machine is omitting the memory entries for memory-less nodes, so there is no callback triggered for these nodes. Nevertheless Xen uses the SRAT provided node numbers, this creates the hole. (My setup: 2 + 0 + 2 + 0 GB per node, Xen sees two nodes named 0 and 2). I agree that should be changed (that is what I meant with "will rework later"), not only because the "lonely" cores will simply be added to another node. But since I will be not in the office for the next two weeks I would like to get this patch applied for the time being. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448 3567 12 ----to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632