From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: Dom0 crash with old style AMD NUMA detection Date: Sat, 22 Sep 2012 01:46:57 +0200 Message-ID: <505CFC71.5090702@amd.com> References: <501BC20F.3040205@amd.com> <20120803123628.GB10670@andromeda.dapyr.net> <20120817142237.GA8467@phenom.dumpdata.com> <505CA8AB.6000808@amd.com> <20120921174833.GC6821@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120921174833.GC6821@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk , Jeremy Fitzhardinge , xen-devel , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote: >> Acked-by: Andre Przywara >> >> I compiled and boot-tested this on my (single node ;-) test box. >> First bare-metal, dmesg: No NUMA configuration found >> Then again, but with numa=off on the cmd-line: NUMA turned off >> Then under Xen as Dom0 kernel: NUMA turned off >> >> So the code behaves under Xen as one would have explicitly specified >> numa=off, which is what we want. > > Right. >> I couldn't get hold of the test machine (old K8 server) that the bug >> was once triggered, that's why I'm reluctant to give my Tested-by. >> Will try this ASAP. > > OK, will wait with this - it would be a bit silly if the patch did not > fix the issue :-) Thanks for you patience. I tried some machines, it not only affects K8s, but also Barcelonas and Magny-Cours. Boot those with a Xen HV and restrict Dom0's memory to something well below the first node's size (say dom0_mem=512M). If the 3.x Dom0 kernel has CONFIG_AMD_NUMA compiled in, the box will crash, because the hardware's NUMA info read from the northbridge does not fit to Dom0's understanding of it's memory. With your fix the box booted fine, NUMA is turned off and everyone is happy. Double checked by commenting the numa_off=1 line in your patch: crash again. So this line definitely fixes this. Tested-by: Andre Przywara Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany