From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests Date: Fri, 19 Feb 2016 15:16:27 +0000 Message-ID: <20160219151627.GU3723@citrix.com> References: <1455763403-18641-1-git-send-email-wency@cn.fujitsu.com> <1455763403-18641-6-git-send-email-wency@cn.fujitsu.com> <20160218121336.GG3723@citrix.com> <20160219141537.GD31079@localhost.localdomain> <20160219144350.GT3723@citrix.com> <1455893531.6225.106.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <1455893531.6225.106.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Lars Kurth , Changlong Xie , Wei Liu , Dong Eddie , Wen Congyang , Andrew Cooper , Jiang Yunhong , Ian Jackson , xen devel , Gui Jianfeng , Shriram Rajagopalan , Yang Hongyang List-Id: xen-devel@lists.xenproject.org On Fri, Feb 19, 2016 at 02:52:11PM +0000, Ian Campbell wrote: > On Fri, 2016-02-19 at 14:43 +0000, Wei Liu wrote: > > On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote: > > > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote: > > > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote: > > > > > Before this patch: > > > > > 1. suspend > > > > > a. PVHVM and PV: we use the same way to suspend the guest (send t= he > > > > > suspend > > > > > =A0=A0=A0request to the guest). If the guest doesn't support evtc= hn, the > > > > > xenstore > > > > > =A0=A0=A0variant will be used, suspending the guest via XenBus co= ntrol > > > > > node. > > > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to > > > > > suspend > > > > > =A0=A0=A0the guest > > > > > = > > > > > 2. Resume: > > > > > a. fast path(fast=3D1) > > > > > =A0=A0=A0Do not change the guest state. We call libxl__domain_res= ume(.., > > > > > 1) which > > > > > =A0=A0=A0calls xc_domain_resume(..., 1 /* fast=3D1*/) to resume t= he guest. > > > > > =A0=A0=A0PV:=A0=A0=A0=A0=A0=A0=A0modify the return code to 1, and= than call the domctl: > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0XEN_DOMCTL_resumedomain > > > > > =A0=A0=A0PVHVM:=A0=A0=A0=A0same with PV > > > > > =A0=A0=A0pure HVM: do nothing in modify_returncode, and than call= the > > > > > domctl: > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0XEN_DOMCTL_resumedomain > > > > > b. slow > > > > > =A0=A0=A0Used when the guest's state have been changed. Will call > > > > > =A0=A0=A0libxl__domain_resume(..., 0) to resume the guest. > > > > > =A0=A0=A0PV:=A0=A0=A0=A0=A0=A0=A0update start info, and reset all= secondary CPU states. > > > > > Than call > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0the domctl: XEN_DOMCTL_res= umedomain > > > > > =A0=A0=A0PVHVM:=A0=A0=A0=A0can not be resumed. You will get the f= ollowing error > > > > > message: > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0"Cannot resume= uncooperative HVM guests" > > > > > =A0=A0=A0pure HVM: same with PVHVM > > > > > = > > > > > After this patch: > > > > > 1. suspend > > > > > =A0=A0=A0unchanged > > > > > = > > > > > 2. Resume > > > > > a. fast path: > > > > > =A0=A0=A0unchanged > > > > > b. slow > > > > > =A0=A0=A0PV:=A0=A0=A0=A0=A0=A0=A0unchanged > > > > > =A0=A0=A0PVHVM:=A0=A0=A0=A0call XEN_DOMCTL_resumedomain to resume= the guest. > > > > > Because we > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0don't modify the return co= de, the PV driver will > > > > > disconnect > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0and reconnect. > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0The guest ends up doing th= e XENMAPSPACE_shared_info > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0XENMEM_add_to_physmap hype= rcall and resetting all of > > > > > its CPU > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0states to point to the sha= red_info(well except the > > > > > ones past 32). > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0That is the Linux kernel d= oes that - regardless > > > > > whether the > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0SCHEDOP_shutdown:SHUTDOWN_= suspend returns 1 or not. > > > > > =A0=A0=A0Pure HVM: call XEN_DOMCTL_resumedomain to resume the gue= st. > > > > > = > > > > > Under COLO, we will update the guest's state(modify memory, cpu's > > > > > registers, > > > > > device status...). In this case, we cannot use the fast path to > > > > > resume it. > > > > > Keep the return code 0, and use a slow path to resume the guest. > > > > > While > > > > > resuming HVM using slow path is not supported currently, this pat= ch > > > > > is to > > > > > make the resume call to not fail. > > > > > = > > > > > Signed-off-by: Wen Congyang > > > > > Signed-off-by: Yang Hongyang > > > > > Reviewed-by: Konrad Rzeszutek Wilk > > > > = > > > > I proposed an alternative commit log in a previous reply: > > > > = > > > > =3D=3D=3D > > > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path > > > > = > > > > Previously it was not possible to resume PVHVM or pure HVM guest in > > > > slow > > > > path because libxc didn't support that. > > > > = > > > > Using XEN_DOMCTL_resumedomain without modifying guest return code= =A0=A0to > > > > resume a > > > > guest is considered to be always safe.=A0=A0Introduce a function to= do > > > > that for > > > > (PV)HVM guests in slow path resume. > > > > = > > > > This patch fixes a bug that denies (PV)HVM slow path resume.=A0=A0T= his > > > > will > > > > enable COLO to work properly:=A0=A0COLO requires HVM guest to start= in > > > > the > > > > new context that has been set up by COLO, hence slow path resume is > > > > required. > > > > =3D=3D=3D > > > > = > > > > Note that I fix one place in this version from "guest state" to > > > > "guest > > > > return code" in the second paragraph. And that sentence is a big big > > > > assumption that I don't know whether it is true or not -- > > > > reverse-engineer from comment before xc_domain_resume and what Linux > > > > does. > > > > = > > > > But the more I think the more I'm not sure if I'm writing the right > > > > thing. I also can't judge what is the right behaviour on the Linux > > > > side. > > > > = > > > > Konrad, can you fact-check the commit message a bit? And maybe you > > > > can > > > > help answer the following questions? > > > > = > > > > 1. If we use fast=3D0 on PVHVM guest, will it work? > > > = > > > Yes. > > > > 2. If we use fast=3D0 on HVM guest, will it work? > > > = > > > Yes. > > > = > > > > = > > > > What is worse, when I say "work" I actually have no clear definition > > > > of > > > > it. There doesn't seem to be a defined state that the guest needs to > > > > be. > > > = > > > For PVHVM guests, fast =3D 0, requires that the guest makes an hyperc= all > > > to=A0=A0SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has > > > completed (so Xen has suspended the guest then later resumed it), it > > > would be the guest responsibility to setup Xen infrastructure. As in > > > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc. > > > = > > > For HVM guests, fast =3D 0, suspends the guests without the guest mak= ing > > > any hypercalls. It is in effect the hypervisor injecting an S3 suspen= d. > > > Afterwards the guest is resumed and continues as usual. No PV drivers= - > > > hence no need to re-establish Xen PV infrastructure. > > > = > > = > > Wait, isn't this function about resuming a guest? I'm confused because > > you talk about HV injecting S3 suspend. I guess you wrote the wrong > > thing? > > = > > My guess is below, from the perspective of resuming a guest > > = > > =A0 PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to > > =A0 suspend. So when toolstack uses fast=3D0, the guest resumes from the > > =A0 hypercall with return code unmodified. Guest then re-setup Xen > > =A0 infrastructure. > = > Who or what has torn down the existing infrastructure from the guest's li= fe > before the suspend in this case? AFAI Remember a guest expects to return > from=A0SCHEDOP_shutdown(SHUTDOWN_suspend) with return code =3D=3D 0 in a = freshly > minted new domain, but in the resume case it is actually resuming in the > original domain, complete with any evtchn's and grant tables mappings etc > still intact from before it slept. > = > Perhaps I'm misremembering and the guest is expected to deal with the > possibility of resources already being in place when it re-sets up the > infra? > = Sigh, this is that sort of things that get to my nerves. I should try to write something down when we come to a conclusion. I would be happy to have any definite answer to the expected behaviour of guest. Extrapolation is not very helpful in the face of some many different versions of Linux'es and BSDs. But, if the confusion is only about PVHVM guest with fast=3D0, we can forbid that specific combination for now. That should be enough to move COLO forward. Wei. > Ian. > =