From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chao Peng Subject: Re: Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion Date: Mon, 23 Nov 2015 13:39:02 +0800 Message-ID: <20151123053902.GB9174@pengc-linux.bj.intel.com> References: <564F030C02000078000B70A6@prv-mh.provo.novell.com> <20151123011008.GA9174@pengc-linux.bj.intel.com> Reply-To: Chao Peng Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20151123011008.GA9174@pengc-linux.bj.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ed Swierk Cc: Andrew Cooper , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Mon, Nov 23, 2015 at 09:10:08AM +0800, Chao Peng wrote: > On Fri, Nov 20, 2015 at 05:21:11PM -0800, Ed Swierk wrote: > > The problem is that the index of the socket_cpumask array is derived via > > cpu_to_socket() from the APIC ID of the processor in a given socket, but > > the size of the array is computed based on nr_sockets, which is not > > necessarily equal to the maximum APIC ID. > > > > Sizing the socket_cpumask to MAX_APICS rather than nr_sockets seems safer, > > though a bit wasteful. I verified that this change fixes the boot crash > > with 4 or 8 CPUs on VMware Fusion. > > > > --- a/xen/arch/x86/smpboot.c > > +++ b/xen/arch/x86/smpboot.c > > @@ -819,7 +819,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) > > > > set_nr_sockets(); > > > > - socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets); > > + socket_cpumask = xzalloc_array(cpumask_t *, MAX_APICS); > > Just replacing nr_sockets with MAX_APICS can not really solve problem. > socket_cpumask should always be synchronized with nr_sockets, otherwise > at least some function will be missing, if not cause panic in another > place. > > If possible, I'd suggest you can debug set_nr_sockets(), especially you > can inspect the following two values for panic case: > boot_cpu_data.x86_max_cores > boot_cpu_data.x86_num_siblings After carefully checked the log, it looks nr_sockets is computed correctly for your case, instead phys_proc_id is not right. It could be again caused by bad CPUID information. Therefor you need debug the CPU detection code which set phys_proc_id. Thanks, Chao