From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time Date: Tue, 9 Jun 2015 08:59:59 +0800 Message-ID: <55763A8F.6040608@cn.fujitsu.com> References: <1433734997-26570-1-git-send-email-yanghy@cn.fujitsu.com> <1433734997-26570-4-git-send-email-yanghy@cn.fujitsu.com> <55756468.4090500@citrix.com> <55756757.7020900@cn.fujitsu.com> <55756B45.8020708@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55756B45.8020708@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , xen-devel@lists.xen.org Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, rshriram@cs.ubc.ca, ian.jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On 06/08/2015 06:15 PM, Andrew Cooper wrote: > On 08/06/15 10:58, Yang Hongyang wrote: >> >> >> On 06/08/2015 05:46 PM, Andrew Cooper wrote: >>> On 08/06/15 04:43, Yang Hongyang wrote: >>>> ioreq page contains evtchn which will be set when we resume the >>>> secondary vm the first time. The hypervisor will check if the >>>> evtchn is corrupted, so we cannot zero the ioreq page more >>>> than one time. >>>> >>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is >>>> suspended, so it is OK if we only zero it one time. >>>> >>>> Signed-off-by: Yang Hongyang >>>> Signed-off-by: Wen congyang >>>> CC: Andrew Cooper >>> >>> The issue here is that we are running the restore algorithm over a >>> domain which has already been running in Xen for a while. This is a >>> brand new usecase, as far as I am aware. >> >> Exactly. >> >>> >>> Does the qemu process associated with this domain get frozen while the >>> secondary is being reset, or does the process get destroyed and >>> recreated. >> >> What do you mean by reset? do you mean secondary is suspended at >> checkpoint? > > Well - at the point that the buffered records are being processed, we > are in the process of resetting the state of the secondary to match the > primary. Yes, at this point, the qemu process associated with this domain is frozen. the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to pause qemu. After we processed all records, qemu will be restored with the received state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in qemu) api to restore qemu with received state. Currently in libxl, qemu only start with the received state, there's no api to load received state while qemu is running for a while. > > ~Andrew > >> >>> >>> I have a gut feeling that it would be safer to clear all of the page >>> other than the event channel, but that depends on exactly what else is >>> going on. We absolutely don't want to do is have an update to this page >>> from the primary with an in-progress IOREQ. >>> >>> ~Andrew >>> >>>> --- >>>> tools/libxc/xc_sr_restore_x86_hvm.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c >>>> b/tools/libxc/xc_sr_restore_x86_hvm.c >>>> index 6f5af0e..06177e0 100644 >>>> --- a/tools/libxc/xc_sr_restore_x86_hvm.c >>>> +++ b/tools/libxc/xc_sr_restore_x86_hvm.c >>>> @@ -78,7 +78,8 @@ static int handle_hvm_params(struct xc_sr_context >>>> *ctx, >>>> break; >>>> case HVM_PARAM_IOREQ_PFN: >>>> case HVM_PARAM_BUFIOREQ_PFN: >>>> - xc_clear_domain_page(xch, ctx->domid, entry->value); >>>> + if ( !ctx->restore.buffer_all_records ) >>>> + xc_clear_domain_page(xch, ctx->domid, entry->value); >>>> break; >>>> } >>>> >>> >>> . >>> >> > > . > -- Thanks, Yang.