From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chao Peng Subject: Re: [PATCH v10 01/13] x86: add socket_cpumask Date: Wed, 8 Jul 2015 10:43:48 +0800 Message-ID: <20150708024348.GB3333@pengc-linux.bj.intel.com> References: <1435308227-30586-1-git-send-email-chao.p.peng@linux.intel.com> <1435308227-30586-2-git-send-email-chao.p.peng@linux.intel.com> <559C5397.4090606@oracle.com> Reply-To: Chao Peng Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <559C5397.4090606@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Boris Ostrovsky Cc: keir@xen.org, Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com, andrew.cooper3@citrix.com, dario.faggioli@citrix.com, Ian.Jackson@eu.citrix.com, xen-devel@lists.xen.org, will.auld@intel.com, JBeulich@suse.com, wei.liu2@citrix.com, dgdegra@tycho.nsa.gov List-Id: xen-devel@lists.xenproject.org On Tue, Jul 07, 2015 at 06:32:55PM -0400, Boris Ostrovsky wrote: > On 06/26/2015 04:43 AM, Chao Peng wrote: > >Maintain socket_cpumask which contains all the HT and core siblings > >in the same socket. > > > >Signed-off-by: Chao Peng > >Acked-by: Jan Beulich > >--- > >Changes in v9: > >* Add comments for set_nr_sockets. > >* Move set_nr_sockets() invocation from __start_xen() to smp_prepare_cpus(). > >Changes in v8: > >* Remove total_cpus and retrofit the algorithm for calculating nr_sockets. > >* Change per-socket cpumask allocation as on demand. > >* socket_to_cpumask => socket_cpumask. > >Changes in v7: > >* Introduce total_cpus to calculate nr_sockets. > >* Minor code sequence improvement in set_cpu_sibling_map. > >* Improve comments for nr_sockets. > >--- > > xen/arch/x86/mpparse.c | 17 +++++++++++++++++ > > xen/arch/x86/smpboot.c | 26 +++++++++++++++++++++++++- > > xen/include/asm-x86/smp.h | 11 +++++++++++ > > 3 files changed, 53 insertions(+), 1 deletion(-) > > > >diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c > >index 003c56e..8609f4a 100644 > >--- a/xen/arch/x86/mpparse.c > >+++ b/xen/arch/x86/mpparse.c > >@@ -87,6 +87,23 @@ void __init set_nr_cpu_ids(unsigned int max_cpus) > > #endif > > } > >+void __init set_nr_sockets(void) > >+{ > >+ /* > >+ * Count the actual cpus in the socket 0 and use it to calculate nr_sockets > >+ * so that the latter will be always >= the actual socket number in the > >+ * system even when APIC IDs from MP table are too sparse. > >+ */ > >+ unsigned int cpus = bitmap_weight(phys_cpu_present_map.mask, > >+ boot_cpu_data.x86_max_cores * > >+ boot_cpu_data.x86_num_siblings); > >+ > >+ if ( cpus == 0 ) > >+ cpus = 1; > >+ > >+ nr_sockets = DIV_ROUND_UP(num_processors + disabled_cpus, cpus); > >+} > >+ > > /* > > * Intel MP BIOS table parsing routines: > > */ > >diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > >index 2289284..e75bbd3 100644 > >--- a/xen/arch/x86/smpboot.c > >+++ b/xen/arch/x86/smpboot.c > >@@ -60,6 +60,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); > > cpumask_t cpu_online_map __read_mostly; > > EXPORT_SYMBOL(cpu_online_map); > >+unsigned int __read_mostly nr_sockets; > >+cpumask_var_t *__read_mostly socket_cpumask; > >+ > > struct cpuinfo_x86 cpu_data[NR_CPUS]; > > u32 x86_cpu_to_apicid[NR_CPUS] __read_mostly = > >@@ -245,6 +248,8 @@ static void set_cpu_sibling_map(int cpu) > > cpumask_set_cpu(cpu, &cpu_sibling_setup_map); > >+ cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]); > > This patch crashes Xen on my 32-cpu Intel box here for cpu 16, which is the > first CPU on the second socket (i.e. on socket 1). > > The reason appears to be that cpu_to_socket(16) is (correctly) 1 here, but > ... > > >+ > > if ( c[cpu].x86_num_siblings > 1 ) > > { > > for_each_cpu ( i, &cpu_sibling_setup_map ) > >@@ -649,7 +654,13 @@ void cpu_exit_clear(unsigned int cpu) > > static void cpu_smpboot_free(unsigned int cpu) > > { > >- unsigned int order; > >+ unsigned int order, socket = cpu_to_socket(cpu); > >+ > >+ if ( cpumask_empty(socket_cpumask[socket]) ) > >+ { > >+ free_cpumask_var(socket_cpumask[socket]); > >+ socket_cpumask[socket] = NULL; > >+ } > > free_cpumask_var(per_cpu(cpu_sibling_mask, cpu)); > > free_cpumask_var(per_cpu(cpu_core_mask, cpu)); > >@@ -694,6 +705,7 @@ static int cpu_smpboot_alloc(unsigned int cpu) > > nodeid_t node = cpu_to_node(cpu); > > struct desc_struct *gdt; > > unsigned long stub_page; > >+ unsigned int socket = cpu_to_socket(cpu); > > ... is zero here, meaning that socket_cpumask[1] is NULL. I suspect that > phys_proc_id is probably not set at this point but is by the time we get to > set_cpu_sibling_map(). I haven't looked any further yet. I might do this > tomorrow unless Chao does it before me. Thanks for testing. I think I have found the reason. For AP, phys_proc_id is set in: start_secondary()=>smp_callin()=>smp_store_cpu_info()=>identify_cpu() which is behind cpu_smpboot_alloc() called from CPU_PREPARE. One way would move 'zalloc_cpumask_var(socket_cpumask + socket)' to set_cpu_sibling_map() to fix it if Jan agrees that, otherwise other solution needs to be found. Chao