From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs Date: Fri, 08 May 2015 12:47:44 +0200 Message-ID: <554C9450.6080405@suse.com> References: <20150506143312.9537.1503.stgit@Solace.station> <20150506151041.9537.62969.stgit@Solace.station> <554C8DF0.6010304@suse.com> <554CAD480200007800078288@suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <554CAD480200007800078288@suse.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Dario Faggioli Cc: George Dunlap , Andrew Cooper , Keir Fraser , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 05/08/2015 12:34 PM, Jan Beulich wrote: >>>> On 08.05.15 at 12:20, wrote: >> On 05/06/2015 05:10 PM, Dario Faggioli wrote: >>> in fact, before this change, shutting down or suspending the >>> system with some CPUs not assigned to any cpupool, would >>> crash as follows: >>> >>> (XEN) Xen call trace: >>> (XEN) [] disable_nonboot_cpus+0xb5/0x138 >>> (XEN) [] enter_state_helper+0xbd/0x369 >>> (XEN) [] continue_hypercall_tasklet_handler+0x4a/0xb1 >>> (XEN) [] do_tasklet_work+0x78/0xab >>> (XEN) [] do_tasklet+0x5e/0x8a >>> (XEN) [] idle_loop+0x56/0x6b >>> (XEN) >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 0: >>> (XEN) Xen BUG at cpu.c:191 >>> (XEN) **************************************** >>> >>> This is because, for free CPUs, -EBUSY were being returned >>> when trying to tear them down, making cpu_down() unhappy. >>> >>> It is certainly unpractical to forbid shutting down or >>> suspenging if there are unassigned CPUs, so this change >>> fixes the above by just avoiding returning -EBUSY for those >>> CPUs. If shutting off, that does not matter much anyway. If >>> suspending, we make sure that the CPUs remain unassigned >>> when resuming. >>> >>> While there, take the chance to: >>> - fix the doc comment of cpupool_cpu_remove() (it was >>> wrong); >>> - improve comments in general around and in cpupool_cpu_remove() >>> and cpupool_cpu_add(); >>> - add a couple of ASSERT()-s for checking consistency. >> >> I did a test with the patches applied. >> >> # xl cpupool-cpu-remove Pool-0 2 >> # echo mem >/sys/power/state >> >> When resuming this resulted in: >> >> (XEN) mce_intel.c:735: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 >> extended MCE MSR 0 >> (XEN) CPU0 CMCI LVT vector (0xf2) already installed >> (XEN) Finishing wakeup from ACPI S3 state. >> (XEN) Enabling non-boot CPUs ... >> (XEN) Xen BUG at cpu.c:149 >> (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[] cpu_up+0xaf/0xfe >> (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor >> (XEN) rax: 0000000000008016 rbx: 0000000000000000 rcx: 0000000000000000 > [...] >> (XEN) Xen call trace: >> (XEN) [] cpu_up+0xaf/0xfe >> (XEN) [] enable_nonboot_cpus+0x4f/0xfc >> (XEN) [] enter_state_helper+0x2cb/0x370 >> (XEN) [] continue_hypercall_tasklet_handler+0x4a/0xb1 >> (XEN) [] do_tasklet_work+0x78/0xab >> (XEN) [] do_tasklet+0x5e/0x8a >> (XEN) [] idle_loop+0x56/0x70 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at cpu.c:149 >> (XEN) **************************************** > > Which would seem to more likely be a result of patch 2. Having > taken a closer look - is setting ret to -EINVAL at the top of > cpupool_cpu_add() really correct? I.e. it is guaranteed that > at least one of the two places altering ret will always be run > into? If it is, then I'd still suspect one of the two > cpupool_assign_cpu_locked() invocations to be failing. Indeed. Setting ret to 0 initially does the trick. With this modification suspend/resume and power off are working with cpus not allocated to any cpupool. Dario, I suggest you write another patch to correct patch 2. For patch 3 with patch 2 corrected: Reviewed-by: Juergen Gross Tested-by: Juergen Gross Juergen