xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: Lars Kurth <lars.kurth@citrix.com>,
	Changlong Xie <xiecl.fnst@cn.fujitsu.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	xen devel <xen-devel@lists.xen.org>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	Shriram Rajagopalan <rshriram@cs.ubc.ca>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Yang Hongyang <hongyang.yang@easystack.cn>
Subject: Re: [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests
Date: Fri, 19 Feb 2016 11:20:08 -0500	[thread overview]
Message-ID: <20160219162007.GD31685@char.us.oracle.com> (raw)
In-Reply-To: <20160219151627.GU3723@citrix.com>

On Fri, Feb 19, 2016 at 03:16:27PM +0000, Wei Liu wrote:
> On Fri, Feb 19, 2016 at 02:52:11PM +0000, Ian Campbell wrote:
> > On Fri, 2016-02-19 at 14:43 +0000, Wei Liu wrote:
> > > On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> > > > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > > > > > Before this patch:
> > > > > > 1. suspend
> > > > > > a. PVHVM and PV: we use the same way to suspend the guest (send the
> > > > > > suspend
> > > > > >    request to the guest). If the guest doesn't support evtchn, the
> > > > > > xenstore
> > > > > >    variant will be used, suspending the guest via XenBus control
> > > > > > node.
> > > > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to
> > > > > > suspend
> > > > > >    the guest
> > > > > > 
> > > > > > 2. Resume:
> > > > > > a. fast path(fast=1)
> > > > > >    Do not change the guest state. We call libxl__domain_resume(..,
> > > > > > 1) which
> > > > > >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> > > > > >    PV:       modify the return code to 1, and than call the domctl:
> > > > > >              XEN_DOMCTL_resumedomain
> > > > > >    PVHVM:    same with PV
> > > > > >    pure HVM: do nothing in modify_returncode, and than call the
> > > > > > domctl:
> > > > > >              XEN_DOMCTL_resumedomain
> > > > > > b. slow
> > > > > >    Used when the guest's state have been changed. Will call
> > > > > >    libxl__domain_resume(..., 0) to resume the guest.
> > > > > >    PV:       update start info, and reset all secondary CPU states.
> > > > > > Than call
> > > > > >              the domctl: XEN_DOMCTL_resumedomain
> > > > > >    PVHVM:    can not be resumed. You will get the following error
> > > > > > message:
> > > > > >                  "Cannot resume uncooperative HVM guests"
> > > > > >    pure HVM: same with PVHVM
> > > > > > 
> > > > > > After this patch:
> > > > > > 1. suspend
> > > > > >    unchanged
> > > > > > 
> > > > > > 2. Resume
> > > > > > a. fast path:
> > > > > >    unchanged
> > > > > > b. slow
> > > > > >    PV:       unchanged
> > > > > >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest.
> > > > > > Because we
> > > > > >              don't modify the return code, the PV driver will
> > > > > > disconnect
> > > > > >              and reconnect.
> > > > > >              The guest ends up doing the XENMAPSPACE_shared_info
> > > > > >              XENMEM_add_to_physmap hypercall and resetting all of
> > > > > > its CPU
> > > > > >              states to point to the shared_info(well except the
> > > > > > ones past 32).
> > > > > >              That is the Linux kernel does that - regardless
> > > > > > whether the
> > > > > >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> > > > > >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > > > > > 
> > > > > > Under COLO, we will update the guest's state(modify memory, cpu's
> > > > > > registers,
> > > > > > device status...). In this case, we cannot use the fast path to
> > > > > > resume it.
> > > > > > Keep the return code 0, and use a slow path to resume the guest.
> > > > > > While
> > > > > > resuming HVM using slow path is not supported currently, this patch
> > > > > > is to
> > > > > > make the resume call to not fail.
> > > > > > 
> > > > > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > > > > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > > > > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > > 
> > > > > I proposed an alternative commit log in a previous reply:
> > > > > 
> > > > > ===
> > > > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> > > > > 
> > > > > Previously it was not possible to resume PVHVM or pure HVM guest in
> > > > > slow
> > > > > path because libxc didn't support that.
> > > > > 
> > > > > Using XEN_DOMCTL_resumedomain without modifying guest return code  to
> > > > > resume a
> > > > > guest is considered to be always safe.  Introduce a function to do
> > > > > that for
> > > > > (PV)HVM guests in slow path resume.
> > > > > 
> > > > > This patch fixes a bug that denies (PV)HVM slow path resume.  This
> > > > > will
> > > > > enable COLO to work properly:  COLO requires HVM guest to start in
> > > > > the
> > > > > new context that has been set up by COLO, hence slow path resume is
> > > > > required.
> > > > > ===
> > > > > 
> > > > > Note that I fix one place in this version from "guest state" to
> > > > > "guest
> > > > > return code" in the second paragraph. And that sentence is a big big
> > > > > assumption that I don't know whether it is true or not --
> > > > > reverse-engineer from comment before xc_domain_resume and what Linux
> > > > > does.
> > > > > 
> > > > > But the more I think the more I'm not sure if I'm writing the right
> > > > > thing. I also can't judge what is the right behaviour on the Linux
> > > > > side.
> > > > > 
> > > > > Konrad, can you fact-check the commit message a bit? And maybe you
> > > > > can
> > > > > help answer the following questions?
> > > > > 
> > > > > 1. If we use fast=0 on PVHVM guest, will it work?
> > > > 
> > > > Yes.
> > > > > 2. If we use fast=0 on HVM guest, will it work?
> > > > 
> > > > Yes.
> > > > 
> > > > > 
> > > > > What is worse, when I say "work" I actually have no clear definition
> > > > > of
> > > > > it. There doesn't seem to be a defined state that the guest needs to
> > > > > be.
> > > > 
> > > > For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> > > > to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> > > > completed (so Xen has suspended the guest then later resumed it), it
> > > > would be the guest responsibility to setup Xen infrastructure. As in
> > > > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> > > > 
> > > > For HVM guests, fast = 0, suspends the guests without the guest making
> > > > any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> > > > Afterwards the guest is resumed and continues as usual. No PV drivers -
> > > > hence no need to re-establish Xen PV infrastructure.
> > > > 
> > > 
> > > Wait, isn't this function about resuming a guest? I'm confused because
> > > you talk about HV injecting S3 suspend. I guess you wrote the wrong
> > > thing?

I was writing the whole chain - suspend, and then resume. This patch is
about resume - but to get to resume you need to suspend first.

> > > 
> > > My guess is below, from the perspective of resuming a guest
> > > 
> > >   PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
> > >   suspend. So when toolstack uses fast=0, the guest resumes from the
> > >   hypercall with return code unmodified. Guest then re-setup Xen
> > >   infrastructure.
> > 
> > Who or what has torn down the existing infrastructure from the guest's life
> > before the suspend in this case? AFAI Remember a guest expects to return

The guest. Or it can ignore it and and just re-init all its settings.

> > from SCHEDOP_shutdown(SHUTDOWN_suspend) with return code == 0 in a freshly
> > minted new domain, but in the resume case it is actually resuming in the
> > original domain, complete with any evtchn's and grant tables mappings etc
> > still intact from before it slept.
> > 
> > Perhaps I'm misremembering and the guest is expected to deal with the
> > possibility of resources already being in place when it re-sets up the
> > infra?

Correct - albeit all of them are stale. Thought on some off-chance they may
be set correctly.

> > 
> 
> Sigh, this is that sort of things that get to my nerves. I should try to
> write something down when we come to a conclusion.  I would be happy to
> have any definite answer to the expected behaviour of guest.
> Extrapolation is not very helpful in the face of some many different
> versions of Linux'es and BSDs.
> 
> But, if the confusion is only about PVHVM guest with fast=0, we can
> forbid that specific combination for now. That should be enough to move
> COLO forward.

.. forbid what? PVHVM resuming with fast=0? Why?  Because the guest may
fall on its face?
> 
> Wei.
> 
> > Ian.
> > 

  reply	other threads:[~2016-02-19 16:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-18  2:43 [PATCH v8 00/13] Prerequisite patches for COLO Wen Congyang
2016-02-18  2:43 ` [PATCH v8 01/13] libxl/remus: init checkpoint callback in Remus setup callback Wen Congyang
2016-02-18 12:30   ` Wei Liu
2016-02-18  2:43 ` [PATCH v8 02/13] tools/libxl: move remus code into libxl_remus.c Wen Congyang
2016-02-18  2:43 ` [PATCH v8 03/13] tools/libxl: move save/restore code into libxl_dom_save.c Wen Congyang
2016-02-18  2:43 ` [PATCH v8 04/13] libxl/save: Refactor libxl__domain_suspend_state Wen Congyang
2016-02-18  2:43 ` [PATCH v8 05/13] tools/libxc: support to resume uncooperative HVM guests Wen Congyang
2016-02-18 12:13   ` Wei Liu
2016-02-19 14:15     ` Konrad Rzeszutek Wilk
2016-02-19 14:43       ` Wei Liu
2016-02-19 14:52         ` Ian Campbell
2016-02-19 15:16           ` Wei Liu
2016-02-19 16:20             ` Konrad Rzeszutek Wilk [this message]
2016-02-19 16:42               ` Wei Liu
2016-02-19 17:16                 ` Konrad Rzeszutek Wilk
2016-02-19 17:21                   ` Wei Liu
2016-02-18  2:43 ` [PATCH v8 06/13] tools/libxl: introduce enum type libxl_checkpointed_stream Wen Congyang
2016-02-18  2:43 ` [PATCH v8 07/13] migration/save: pass checkpointed_stream from libxl to libxc Wen Congyang
2016-02-18  2:43 ` [PATCH v8 08/13] tools/libxl: export logdirty_init Wen Congyang
2016-02-18  2:43 ` [PATCH v8 09/13] tools/libxl: rename remus device to checkpoint device Wen Congyang
2016-02-18  2:43 ` [PATCH v8 10/13] tools/libxl: adjust the indentation Wen Congyang
2016-02-18  2:43 ` [PATCH v8 11/13] tools/libxl: store remus_ops in checkpoint device state Wen Congyang
2016-02-18  2:43 ` [PATCH v8 12/13] tools/libxl: move remus state into a seperate structure Wen Congyang
2016-02-18  2:43 ` [PATCH v8 13/13] tools/libxl: seperate device init/cleanup from checkpoint device layer Wen Congyang
2016-02-26 15:54 ` [PATCH v8 00/13] Prerequisite patches for COLO Wei Liu
2016-02-26 18:16   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160219162007.GD31685@char.us.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=lars.kurth@citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=wei.liu2@citrix.com \
    --cc=wency@cn.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    --cc=xiecl.fnst@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).