From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time Date: Wed, 10 Jun 2015 13:26:01 +0800 Message-ID: <5577CA69.6090103@cn.fujitsu.com> References: <1433734997-26570-1-git-send-email-yanghy@cn.fujitsu.com> <1433734997-26570-4-git-send-email-yanghy@cn.fujitsu.com> <55756468.4090500@citrix.com> <55756757.7020900@cn.fujitsu.com> <55756B45.8020708@citrix.com> <55763A8F.6040608@cn.fujitsu.com> <5576960D.5090407@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5576960D.5090407@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , xen-devel@lists.xen.org Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, rshriram@cs.ubc.ca, ian.jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On 06/09/2015 03:30 PM, Andrew Cooper wrote: > On 09/06/2015 01:59, Yang Hongyang wrote: >> >> >> On 06/08/2015 06:15 PM, Andrew Cooper wrote: >>> On 08/06/15 10:58, Yang Hongyang wrote: >>>> >>>> >>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote: >>>>> On 08/06/15 04:43, Yang Hongyang wrote: >>>>>> ioreq page contains evtchn which will be set when we resume the >>>>>> secondary vm the first time. The hypervisor will check if the >>>>>> evtchn is corrupted, so we cannot zero the ioreq page more >>>>>> than one time. >>>>>> >>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is >>>>>> suspended, so it is OK if we only zero it one time. >>>>>> >>>>>> Signed-off-by: Yang Hongyang >>>>>> Signed-off-by: Wen congyang >>>>>> CC: Andrew Cooper >>>>> >>>>> The issue here is that we are running the restore algorithm over a >>>>> domain which has already been running in Xen for a while. This is a >>>>> brand new usecase, as far as I am aware. >>>> >>>> Exactly. >>>> >>>>> >>>>> Does the qemu process associated with this domain get frozen while the >>>>> secondary is being reset, or does the process get destroyed and >>>>> recreated. >>>> >>>> What do you mean by reset? do you mean secondary is suspended at >>>> checkpoint? >>> >>> Well - at the point that the buffered records are being processed, we >>> are in the process of resetting the state of the secondary to match the >>> primary. >> >> Yes, at this point, the qemu process associated with this domain is >> frozen. >> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to >> pause >> qemu. After we processed all records, qemu will be restored with the >> received >> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in >> qemu) >> api to restore qemu with received state. Currently in libxl, qemu only >> start >> with the received state, there's no api to load received state while >> qemu is >> running for a while. > > Now I consider this more, it is absolutely wrong to not zero the page > here. The event channel in the page is not guaranteed to be the same > between the primary and secondary, That's why we don't zero it on secondary. > and we don't want to unexpectedly > find a pending/in-flight ioreq. ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there should be no pending/in-flight ioreq at checkpoint. > > Either qemu needs to take care of re-initialising the event channels > back to appropriate values, or Xen should tolerate the channels > disappearing. > > ~Andrew > . > -- Thanks, Yang.