Re: [RFC v2] xSplice design - Konrad Rzeszutek Wilk

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Martin Pohlack <mpohlack@amazon.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	jeremy@goop.org, hanweidong@huawei.com, jbeulich@suse.com,
	john.liuqiming@huawei.com,
	Paul Voccio <paul.voccio@rackspace.com>,
	Daniel Kiper <daniel.kiper@oracle.com>,
	Major Hayden <major.hayden@rackspace.com>,
	liuyingdong@huawei.com, aliguori@amazon.com,
	xiantao.zxt@alibaba-inc.com, lars.kurth@citrix.com,
	Steven Wilson <steven.wilson@rackspace.com>,
	peter.huangpeng@huawei.com, msw@amazon.com,
	xen-devel@lists.xenproject.org,
	Rick Harris <rick.harris@rackspace.com>,
	boris.ostrovsky@oracle.com,
	Josh Kearney <josh.kearney@rackspace.com>,
	jinsong.liu@alibaba-inc.com,
	Antony Messerli <amesserl@rackspace.com>,
	konrad@darnok.org, fanhenglong@huawei.com,
	andrew.cooper3@citrix.com
Subject: Re: [RFC v2] xSplice design
Date: Fri, 12 Jun 2015 12:09:24 -0400	[thread overview]
Message-ID: <20150612160924.GC20667@l.oracle.com> (raw)
In-Reply-To: <557AED30.4070703@amazon.com>

On Fri, Jun 12, 2015 at 04:31:12PM +0200, Martin Pohlack wrote:
> On 12.06.2015 16:03, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jun 12, 2015 at 01:39:05PM +0200, Martin Pohlack wrote:
> >> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
> >> [...]
> >>> ## Hypercalls
> >>>
> >>> We will employ the sub operations of the system management hypercall (sysctl).
> >>> There are to be four sub-operations:
> >>>
> >>>  * upload the payloads.
> >>>  * listing of payloads summary uploaded and their state.
> >>>  * getting an particular payload summary and its state.
> >>>  * command to apply, delete, or revert the payload.
> >>>
> >>> The patching is asynchronous therefore the caller is responsible
> >>> to verify that it has been applied properly by retrieving the summary of it
> >>> and verifying that there are no error codes associated with the payload.
> >>>
> >>> We **MUST** make it asynchronous due to the nature of patching: it requires
> >>> every physical CPU to be lock-step with each other. The patching mechanism
> >>> while an implementation detail, is not an short operation and as such
> >>> the design **MUST** assume it will be an long-running operation.
> >>
> >> I am not convinced yet, that you need an asynchronous approach here.
> >>
> >> The experience from our prototype suggests that hotpatching itself is
> >> not an expensive operation.  It can usually be completed well below 1ms
> >> with the most expensive part being getting the hypervisor to a quiet state.
> >>
> >> If we go for a barrier at hypervisor exit, combined with forcing all
> >> other CPUs through the hypervisor with IPIs, the typical case is very quick.
> >>
> >> The only reason why that would take some time is, if another CPU is
> >> executing a lengthy operation in the hypervisor already.  In that case,
> >> you probably don't want to block the whole machine waiting for the
> >> joining of that single CPU anyway and instead re-try later, for example,
> >> using a timeout on the barrier.  That could be signaled to the user-land
> >> process (EAGAIN) so that he could re-attempt hotpatching after some seconds.
> > 
> > Which is also an asynchronous operation.
> 
> Right, but in userland.  My main aim is to have as little complicated
> code as possible in the hypervisor for obvious reasons.  This approach
> would not require any further tracking of state in the hypervisor.

True.
> 
> > The experience with previous preemption XSAs have left me quite afraid of
> > long-running operations - which is why I was thinking to have this
> > baked this at the start.
> > 
> > Both ways - EAGAIN or doing an _GET_STATUS would provide an mechanism for
> > the VCPU to do other work instead of being tied up.
> 
> If I understood your proposal correctly, there is a difference.  With
> EAGAIN, all activity is dropped and the machine remains fully available
> to whatever guests are running at the time.

Correct.
> 
> With _GET_STATUS, you would continue to try to bring the hypervisor to a
> quiet state in the background but return to userland to let this one
> thread continue.  Behind the scenes though, you would still need to

<nods>
> capture all CPUs at one point and all captured CPUs would have to wait
> for the last straggler.  That would lead to noticeable dead-time for
> guests running on-top.

Potentially. Using the time calibration routine to do the patching guarantees
that we will have an sync-up every second on machine - so there will be always
that possiblity.
> 
> I might have misunderstood your proposal though.

You got it right.
> 
> > The EAGAIN mandates that the 'bringing the CPUs together' must be done
> > under 1ms and that there must be code to enforce an timeout on the barrier.
> 
> The 1ms is just a random number.  I would actually suggest to allow a
> sysadmin or hotpatch management tooling to specify how long one is
> willing to potentially block the whole machine when waiting for a
> stop_machine-like barrier as part of a relevant hypercall.  You could
> imagine userland to start out with 1ms and slowly work its way up
> whenever it retries.
> 
> > The _GET_STATUS does not enforce this and can take longer giving us
> > more breathing room - and also unbounded time - which means if
> > we were to try to cancel it (say it had run for an hour and still
> > could not patch it)- we have to add some hairy code to
> > deal with cancelling asynchronous code.
> > 
> > Your way is simpler - but I would advocate expanding the -EAGAIN to _all_
> > the xSplice hypercalls. Thoughts?
> 
> In my experience, you only need the EAGAIN for hypercalls that use the
> quiet state.  Depending on the design, that would be the operations that
> do hotpatch activation and deactivation (i.e., the actual splicing).

The uploading of the patch could be slow - as in the checking to be done
and on an big patch (2MB or more?) it would be good to try again.

> 
> Martin
>

next prev parent reply	other threads:[~2015-06-12 16:10 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-15 19:44 [RFC v2] xSplice design Konrad Rzeszutek Wilk
2015-05-18 12:41 ` Jan Beulich
2015-06-05 14:49   ` Konrad Rzeszutek Wilk
2015-06-05 15:16     ` Jan Beulich
2015-06-05 16:00       ` Konrad Rzeszutek Wilk
2015-06-05 16:14         ` Jan Beulich
2015-05-18 12:54 ` Liuqiming (John)
2015-05-18 13:11   ` Daniel Kiper
2015-06-05 14:50   ` Konrad Rzeszutek Wilk
2015-05-19 19:13 ` Lars Kurth
2015-05-20 15:11 ` Martin Pohlack
2015-06-05 15:00   ` Konrad Rzeszutek Wilk
2015-06-05 15:15     ` Andrew Cooper
2015-06-05 15:27     ` Jan Beulich
2015-06-08  8:34       ` Martin Pohlack
2015-06-08  8:51         ` Jan Beulich
2015-06-08 14:38     ` Martin Pohlack
2015-06-08 15:19       ` Konrad Rzeszutek Wilk
2015-06-12 11:51         ` Martin Pohlack
2015-06-12 14:06           ` Konrad Rzeszutek Wilk
2015-06-12 11:39 ` Martin Pohlack
2015-06-12 14:03   ` Konrad Rzeszutek Wilk
2015-06-12 14:31     ` Martin Pohlack
2015-06-12 14:43       ` Jan Beulich
2015-06-12 17:31         ` Martin Pohlack
2015-06-12 18:46           ` Konrad Rzeszutek Wilk
2015-06-12 16:09       ` Konrad Rzeszutek Wilk [this message]
2015-06-12 16:17         ` Andrew Cooper
2015-06-12 16:39           ` Konrad Rzeszutek Wilk
2015-06-12 18:36             ` Martin Pohlack
2015-06-12 18:51               ` Konrad Rzeszutek Wilk
2015-07-06 19:36         ` Konrad Rzeszutek Wilk
2015-10-27 12:05   ` Ross Lagerwall
2015-10-29 16:55     ` Ross Lagerwall
2015-10-30 10:39       ` Martin Pohlack
2015-10-30 14:03         ` Ross Lagerwall
2015-10-30 14:06           ` Martin Pohlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150612160924.GC20667@l.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=aliguori@amazon.com \
    --cc=amesserl@rackspace.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=daniel.kiper@oracle.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=fanhenglong@huawei.com \
    --cc=hanweidong@huawei.com \
    --cc=jbeulich@suse.com \
    --cc=jeremy@goop.org \
    --cc=jinsong.liu@alibaba-inc.com \
    --cc=john.liuqiming@huawei.com \
    --cc=josh.kearney@rackspace.com \
    --cc=konrad@darnok.org \
    --cc=lars.kurth@citrix.com \
    --cc=liuyingdong@huawei.com \
    --cc=major.hayden@rackspace.com \
    --cc=mpohlack@amazon.com \
    --cc=msw@amazon.com \
    --cc=paul.voccio@rackspace.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=rick.harris@rackspace.com \
    --cc=steven.wilson@rackspace.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xiantao.zxt@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).