From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030959AbbD1VdP (ORCPT ); Tue, 28 Apr 2015 17:33:15 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:43177 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966026AbbD1VdN (ORCPT ); Tue, 28 Apr 2015 17:33:13 -0400 Message-ID: <553FFC19.6040301@oracle.com> Date: Tue, 28 Apr 2015 17:31:05 -0400 From: Boris Ostrovsky User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: "Ouyang Zhaowei (Charles)" , Konrad Rzeszutek Wilk , David Vrabel CC: linux-kernel@vger.kernel.org, Dingweiping , Yanqiangjun , jinjian@huawei.com Subject: Re: [PATCH] xen: vcpu_info reinit error after 'xl save -c' & 'xl restore' on PVOPS VM which has multi-cpu References: <553A0D49.2020300@huawei.com> <553C23EE.9090101@oracle.com> <553F7D80.3010409@huawei.com> In-Reply-To: <553F7D80.3010409@huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/28/2015 08:30 AM, Ouyang Zhaowei (Charles) wrote: > > On 2015.4.26 7:31, Boris Ostrovsky wrote: >> On 04/24/2015 05:30 AM, Ouyang Zhaowei (Charles) wrote: >>> If a PVOPS VM has multi-cpu the vcpu_info of cpu0 is the member of the structure HYPERVISOR_shared_info, >>> and the others is not, but after 'xl save -c/restore' the vcpu_info will be reinitialized, >>> the vcpu_info of all the vcpus will be considered as the member of HYPERVISOR_shared_info. >>> This will cause the cpu1 and other cpu keep receiving interrupts, and the cpu0 is waiting them to >>> finish the job. >>> So we do not reinit the vcpu_info when PVOPS vm is doing 'xl save -c/restore'. >>> >>> Signed-off-by: Charles Ouyang >>> --- >>> arch/x86/xen/suspend.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c >>> index d949769..b2bed45 100644 >>> --- a/arch/x86/xen/suspend.c >>> +++ b/arch/x86/xen/suspend.c >>> @@ -32,7 +32,8 @@ static void xen_hvm_post_suspend(int suspend_cancelled) >>> { >>> #ifdef CONFIG_XEN_PVHVM >>> int cpu; >>> - xen_hvm_init_shared_info(); >>> + if (!suspend_cancelled) >>> + xen_hvm_init_shared_info(); >>> xen_callback_vector(); >>> xen_unplug_emulated_devices(); >>> if (xen_feature(XENFEAT_hvm_safe_pvclock)) { >> Do we need to call other routines if suspend is canceled? >> >> Also, if suspend is canceled then we don't do xen_irq_resume() if that's what you meant by "vcpu_info will be reinitialized". Were you referring some other re-initialization? >> > Hi Boris, > > Sorry I didn't make myself clear. > > About the "vcpu_info reinitialize", I mean in the function "xen_hvm_init_shared_info()" the pointer "xen_vcpu" will be reset and all > point to HYPERVISOR_shared_info->vcpu_info[cpu]. > > void __ref xen_hvm_init_shared_info(void) > ---- > 1702 * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is > 1703 * online but xen_hvm_init_shared_info is run at resume time too and > 1704 * in that case multiple vcpus might be online. */ > 1705 for_each_online_cpu(cpu) { > 1706 /* Leave it to be NULL. */ > 1707 if (cpu >= MAX_VIRT_CPUS) > 1708 continue; > 1709 per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; > 1710 } > 1711 } > > > But on Xen boot the init function "xen_start_kernel" only set the cpu0 to point to HYPERVISOR_shared_info->vcpu_info[0] > > asmlinkage __visible void __init xen_start_kernel(void) We are talking about HVM guests here and xen_start_kernel is only called for PV. But even if it was, xen_vcpu pointers for other VCPUs are set in xen_vcpu_setup(), which is called when non-boot VCPUs are coming up. And I wonder whether the actual problem is that we don't call xen_vcpu_setup() on canceled suspend (as we don't need to, really) and therefore if we call xen_hvm_init_shared_info() then per_cpu(xen_vcpu,cpu) for *non-boot* cpus is will become wrong. -boris > ---- > 1563 /* Don't do the full vcpu_info placement stuff until we have a > 1564 possible map and a non-dummy shared_info. */ > 1565 per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0]; > 1566 > 1567 local_irq_disable(); > > Other cpus are set to point to "xen_vcpu_info" in function xen_vcpu_setup(). > > So after xl save -c/restore, the pointer xen_vcpu will be reset in function "xen_hvm_init_shared_info" and point to a wrong place. > This may cause all the cpus cannot handle irqs except cpu0, so IMHO it's not necessary to call xen_hvm_init_shared_info again if > suspend is canceled. > >> (The patch itself looks like the right thing to do though). >> >> -boris >> >> . >>