From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests Date: Fri, 19 Feb 2016 11:20:08 -0500 Message-ID: <20160219162007.GD31685@char.us.oracle.com> References: <1455763403-18641-1-git-send-email-wency@cn.fujitsu.com> <1455763403-18641-6-git-send-email-wency@cn.fujitsu.com> <20160218121336.GG3723@citrix.com> <20160219141537.GD31079@localhost.localdomain> <20160219144350.GT3723@citrix.com> <1455893531.6225.106.camel@citrix.com> <20160219151627.GU3723@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <20160219151627.GU3723@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: Lars Kurth , Changlong Xie , Ian Campbell , Wen Congyang , Andrew Cooper , Jiang Yunhong , Dong Eddie , xen devel , Gui Jianfeng , Shriram Rajagopalan , Ian Jackson , Yang Hongyang List-Id: xen-devel@lists.xenproject.org On Fri, Feb 19, 2016 at 03:16:27PM +0000, Wei Liu wrote: > On Fri, Feb 19, 2016 at 02:52:11PM +0000, Ian Campbell wrote: > > On Fri, 2016-02-19 at 14:43 +0000, Wei Liu wrote: > > > On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote: > > > > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote: > > > > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote: > > > > > > Before this patch: > > > > > > 1. suspend > > > > > > a. PVHVM and PV: we use the same way to suspend the guest (send= the > > > > > > suspend > > > > > > =A0=A0=A0request to the guest). If the guest doesn't support ev= tchn, the > > > > > > xenstore > > > > > > =A0=A0=A0variant will be used, suspending the guest via XenBus = control > > > > > > node. > > > > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) = to > > > > > > suspend > > > > > > =A0=A0=A0the guest > > > > > > = > > > > > > 2. Resume: > > > > > > a. fast path(fast=3D1) > > > > > > =A0=A0=A0Do not change the guest state. We call libxl__domain_r= esume(.., > > > > > > 1) which > > > > > > =A0=A0=A0calls xc_domain_resume(..., 1 /* fast=3D1*/) to resume= the guest. > > > > > > =A0=A0=A0PV:=A0=A0=A0=A0=A0=A0=A0modify the return code to 1, a= nd than call the domctl: > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0XEN_DOMCTL_resumedomain > > > > > > =A0=A0=A0PVHVM:=A0=A0=A0=A0same with PV > > > > > > =A0=A0=A0pure HVM: do nothing in modify_returncode, and than ca= ll the > > > > > > domctl: > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0XEN_DOMCTL_resumedomain > > > > > > b. slow > > > > > > =A0=A0=A0Used when the guest's state have been changed. Will ca= ll > > > > > > =A0=A0=A0libxl__domain_resume(..., 0) to resume the guest. > > > > > > =A0=A0=A0PV:=A0=A0=A0=A0=A0=A0=A0update start info, and reset a= ll secondary CPU states. > > > > > > Than call > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0the domctl: XEN_DOMCTL_r= esumedomain > > > > > > =A0=A0=A0PVHVM:=A0=A0=A0=A0can not be resumed. You will get the= following error > > > > > > message: > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0"Cannot resu= me uncooperative HVM guests" > > > > > > =A0=A0=A0pure HVM: same with PVHVM > > > > > > = > > > > > > After this patch: > > > > > > 1. suspend > > > > > > =A0=A0=A0unchanged > > > > > > = > > > > > > 2. Resume > > > > > > a. fast path: > > > > > > =A0=A0=A0unchanged > > > > > > b. slow > > > > > > =A0=A0=A0PV:=A0=A0=A0=A0=A0=A0=A0unchanged > > > > > > =A0=A0=A0PVHVM:=A0=A0=A0=A0call XEN_DOMCTL_resumedomain to resu= me the guest. > > > > > > Because we > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0don't modify the return = code, the PV driver will > > > > > > disconnect > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0and reconnect. > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0The guest ends up doing = the XENMAPSPACE_shared_info > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0XENMEM_add_to_physmap hy= percall and resetting all of > > > > > > its CPU > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0states to point to the s= hared_info(well except the > > > > > > ones past 32). > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0That is the Linux kernel= does that - regardless > > > > > > whether the > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0SCHEDOP_shutdown:SHUTDOW= N_suspend returns 1 or not. > > > > > > =A0=A0=A0Pure HVM: call XEN_DOMCTL_resumedomain to resume the g= uest. > > > > > > = > > > > > > Under COLO, we will update the guest's state(modify memory, cpu= 's > > > > > > registers, > > > > > > device status...). In this case, we cannot use the fast path to > > > > > > resume it. > > > > > > Keep the return code 0, and use a slow path to resume the guest. > > > > > > While > > > > > > resuming HVM using slow path is not supported currently, this p= atch > > > > > > is to > > > > > > make the resume call to not fail. > > > > > > = > > > > > > Signed-off-by: Wen Congyang > > > > > > Signed-off-by: Yang Hongyang > > > > > > Reviewed-by: Konrad Rzeszutek Wilk > > > > > = > > > > > I proposed an alternative commit log in a previous reply: > > > > > = > > > > > =3D=3D=3D > > > > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path > > > > > = > > > > > Previously it was not possible to resume PVHVM or pure HVM guest = in > > > > > slow > > > > > path because libxc didn't support that. > > > > > = > > > > > Using XEN_DOMCTL_resumedomain without modifying guest return code= =A0=A0to > > > > > resume a > > > > > guest is considered to be always safe.=A0=A0Introduce a function = to do > > > > > that for > > > > > (PV)HVM guests in slow path resume. > > > > > = > > > > > This patch fixes a bug that denies (PV)HVM slow path resume.=A0= =A0This > > > > > will > > > > > enable COLO to work properly:=A0=A0COLO requires HVM guest to sta= rt in > > > > > the > > > > > new context that has been set up by COLO, hence slow path resume = is > > > > > required. > > > > > =3D=3D=3D > > > > > = > > > > > Note that I fix one place in this version from "guest state" to > > > > > "guest > > > > > return code" in the second paragraph. And that sentence is a big = big > > > > > assumption that I don't know whether it is true or not -- > > > > > reverse-engineer from comment before xc_domain_resume and what Li= nux > > > > > does. > > > > > = > > > > > But the more I think the more I'm not sure if I'm writing the rig= ht > > > > > thing. I also can't judge what is the right behaviour on the Linux > > > > > side. > > > > > = > > > > > Konrad, can you fact-check the commit message a bit? And maybe you > > > > > can > > > > > help answer the following questions? > > > > > = > > > > > 1. If we use fast=3D0 on PVHVM guest, will it work? > > > > = > > > > Yes. > > > > > 2. If we use fast=3D0 on HVM guest, will it work? > > > > = > > > > Yes. > > > > = > > > > > = > > > > > What is worse, when I say "work" I actually have no clear definit= ion > > > > > of > > > > > it. There doesn't seem to be a defined state that the guest needs= to > > > > > be. > > > > = > > > > For PVHVM guests, fast =3D 0, requires that the guest makes an hype= rcall > > > > to=A0=A0SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has > > > > completed (so Xen has suspended the guest then later resumed it), it > > > > would be the guest responsibility to setup Xen infrastructure. As in > > > > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, e= tc. > > > > = > > > > For HVM guests, fast =3D 0, suspends the guests without the guest m= aking > > > > any hypercalls. It is in effect the hypervisor injecting an S3 susp= end. > > > > Afterwards the guest is resumed and continues as usual. No PV drive= rs - > > > > hence no need to re-establish Xen PV infrastructure. > > > > = > > > = > > > Wait, isn't this function about resuming a guest? I'm confused because > > > you talk about HV injecting S3 suspend. I guess you wrote the wrong > > > thing? I was writing the whole chain - suspend, and then resume. This patch is about resume - but to get to resume you need to suspend first. > > > = > > > My guess is below, from the perspective of resuming a guest > > > = > > > =A0 PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to > > > =A0 suspend. So when toolstack uses fast=3D0, the guest resumes from = the > > > =A0 hypercall with return code unmodified. Guest then re-setup Xen > > > =A0 infrastructure. > > = > > Who or what has torn down the existing infrastructure from the guest's = life > > before the suspend in this case? AFAI Remember a guest expects to return The guest. Or it can ignore it and and just re-init all its settings. > > from=A0SCHEDOP_shutdown(SHUTDOWN_suspend) with return code =3D=3D 0 in = a freshly > > minted new domain, but in the resume case it is actually resuming in the > > original domain, complete with any evtchn's and grant tables mappings e= tc > > still intact from before it slept. > > = > > Perhaps I'm misremembering and the guest is expected to deal with the > > possibility of resources already being in place when it re-sets up the > > infra? Correct - albeit all of them are stale. Thought on some off-chance they may be set correctly. > > = > = > Sigh, this is that sort of things that get to my nerves. I should try to > write something down when we come to a conclusion. I would be happy to > have any definite answer to the expected behaviour of guest. > Extrapolation is not very helpful in the face of some many different > versions of Linux'es and BSDs. > = > But, if the confusion is only about PVHVM guest with fast=3D0, we can > forbid that specific combination for now. That should be enough to move > COLO forward. .. forbid what? PVHVM resuming with fast=3D0? Why? Because the guest may fall on its face? > = > Wei. > = > > Ian. > > =