From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Date: Thu, 25 Jun 2015 15:20:01 +0100 Message-ID: <558C0E11.6050009@citrix.com> References: <20150625103457.3353.39292.stgit@Solace.station> <20150625121520.3353.30808.stgit@Solace.station> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Z880d-0007xf-6w for xen-devel@lists.xenproject.org; Thu, 25 Jun 2015 14:20:07 +0000 In-Reply-To: <20150625121520.3353.30808.stgit@Solace.station> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli , xen-devel@lists.xenproject.org Cc: Juergen Gross , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 25/06/15 13:15, Dario Faggioli wrote: > In fact, if a pCPU belonging to some other pool than > cpupool0 goes down, we want to clear the relevant bit > from its actual pool, rather than always from cpupool0. This sentence is a little hard to parse. I presume you mean "use the correct cpupools valid mask, rather than cpupool0's". For the change itself, I definitely agree that it needs fixing. > > Before this commit, all the pCPUs in the non-default > pool(s) will be considered immediately valid, during > system resume, even the one that have not been brought > up yet. As a result, the (Credit1) scheduler will attempt > to run its load balancing logic on them, causing the > following Oops: > > # xl cpupool-cpu-remove Pool-0 8-15 > # xl cpupool-create name=\"Pool-1\" > # xl cpupool-cpu-add Pool-1 8-15 > --> suspend > --> resume > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 8 > (XEN) RIP: e008:[] csched_schedule+0x4be/0xb97 > (XEN) RFLAGS: 0000000000010087 CONTEXT: hypervisor > (XEN) rax: 80007d2f7fccb780 rbx: 0000000000000009 rcx: 0000000000000000 > (XEN) rdx: ffff82d08031ed40 rsi: ffff82d080334980 rdi: 0000000000000000 > (XEN) rbp: ffff83010000fe20 rsp: ffff83010000fd40 r8: 0000000000000004 > (XEN) r9: 0000ffff0000ffff r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f > (XEN) r12: ffff8303191ea870 r13: ffff8303226aadf0 r14: 0000000000000009 > (XEN) r15: 0000000000000008 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 00000000dba9d000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) ... ... ... > (XEN) Xen call trace: > (XEN) [] csched_schedule+0x4be/0xb97 > (XEN) [] schedule+0x12a/0x63c > (XEN) [] __do_softirq+0x82/0x8d > (XEN) [] do_softirq+0x13/0x15 > (XEN) [] idle_loop+0x5b/0x6b > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 8: > (XEN) GENERAL PROTECTION FAULT > (XEN) [error_code=0000] > (XEN) **************************************** What is the actual cause of the #GP fault? There are no obviously poised registers. Is it something we should modify to be a BUG or ASSERT? > > Signed-off-by: Dario Faggioli > --- > Cc: Juergen Gross > Cc: Jan Beulich > Cc: Andrew Cooper > --- > xen/arch/x86/smpboot.c | 1 - > xen/common/cpupool.c | 2 ++ > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > index 2289284..a4ec396 100644 > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -887,7 +887,6 @@ void __cpu_disable(void) > remove_siblinginfo(cpu); > > /* It's now safe to remove this processor from the online map */ > - cpumask_clear_cpu(cpu, cpupool0->cpu_valid); > cpumask_clear_cpu(cpu, &cpu_online_map); > fixup_irqs(); > > diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c > index 5471f93..b48ae17 100644 > --- a/xen/common/cpupool.c > +++ b/xen/common/cpupool.c > @@ -530,6 +530,7 @@ static int cpupool_cpu_remove(unsigned int cpu) > if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) ) > { > cpumask_set_cpu(cpu, (*c)->cpu_suspended); > + cpumask_clear_cpu(cpu, (*c)->cpu_valid); > break; > } > } > @@ -552,6 +553,7 @@ static int cpupool_cpu_remove(unsigned int cpu) > * If we are not suspending, we are hot-unplugging cpu, and that is > * allowed only for CPUs in pool0. > */ > + cpumask_clear_cpu(cpu, cpupool0->cpu_valid); > ret = 0; > } > >