From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Martin Pohlack <mpohlack@amazon.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
jeremy@goop.org, hanweidong@huawei.com, jbeulich@suse.com,
john.liuqiming@huawei.com,
Paul Voccio <paul.voccio@rackspace.com>,
Daniel Kiper <daniel.kiper@oracle.com>,
Major Hayden <major.hayden@rackspace.com>,
liuyingdong@huawei.com, aliguori@amazon.com,
xiantao.zxt@alibaba-inc.com, lars.kurth@citrix.com,
Steven Wilson <steven.wilson@rackspace.com>,
peter.huangpeng@huawei.com, msw@amazon.com,
xen-devel@lists.xenproject.org,
Rick Harris <rick.harris@rackspace.com>,
boris.ostrovsky@oracle.com,
Josh Kearney <josh.kearney@rackspace.com>,
jinsong.liu@alibaba-inc.com,
Antony Messerli <amesserl@rackspace.com>,
konrad@darnok.org, fanhenglong@huawei.com,
andrew.cooper3@citrix.com
Subject: Re: [RFC v2] xSplice design
Date: Fri, 12 Jun 2015 12:09:24 -0400 [thread overview]
Message-ID: <20150612160924.GC20667@l.oracle.com> (raw)
In-Reply-To: <557AED30.4070703@amazon.com>
On Fri, Jun 12, 2015 at 04:31:12PM +0200, Martin Pohlack wrote:
> On 12.06.2015 16:03, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jun 12, 2015 at 01:39:05PM +0200, Martin Pohlack wrote:
> >> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
> >> [...]
> >>> ## Hypercalls
> >>>
> >>> We will employ the sub operations of the system management hypercall (sysctl).
> >>> There are to be four sub-operations:
> >>>
> >>> * upload the payloads.
> >>> * listing of payloads summary uploaded and their state.
> >>> * getting an particular payload summary and its state.
> >>> * command to apply, delete, or revert the payload.
> >>>
> >>> The patching is asynchronous therefore the caller is responsible
> >>> to verify that it has been applied properly by retrieving the summary of it
> >>> and verifying that there are no error codes associated with the payload.
> >>>
> >>> We **MUST** make it asynchronous due to the nature of patching: it requires
> >>> every physical CPU to be lock-step with each other. The patching mechanism
> >>> while an implementation detail, is not an short operation and as such
> >>> the design **MUST** assume it will be an long-running operation.
> >>
> >> I am not convinced yet, that you need an asynchronous approach here.
> >>
> >> The experience from our prototype suggests that hotpatching itself is
> >> not an expensive operation. It can usually be completed well below 1ms
> >> with the most expensive part being getting the hypervisor to a quiet state.
> >>
> >> If we go for a barrier at hypervisor exit, combined with forcing all
> >> other CPUs through the hypervisor with IPIs, the typical case is very quick.
> >>
> >> The only reason why that would take some time is, if another CPU is
> >> executing a lengthy operation in the hypervisor already. In that case,
> >> you probably don't want to block the whole machine waiting for the
> >> joining of that single CPU anyway and instead re-try later, for example,
> >> using a timeout on the barrier. That could be signaled to the user-land
> >> process (EAGAIN) so that he could re-attempt hotpatching after some seconds.
> >
> > Which is also an asynchronous operation.
>
> Right, but in userland. My main aim is to have as little complicated
> code as possible in the hypervisor for obvious reasons. This approach
> would not require any further tracking of state in the hypervisor.
True.
>
> > The experience with previous preemption XSAs have left me quite afraid of
> > long-running operations - which is why I was thinking to have this
> > baked this at the start.
> >
> > Both ways - EAGAIN or doing an _GET_STATUS would provide an mechanism for
> > the VCPU to do other work instead of being tied up.
>
> If I understood your proposal correctly, there is a difference. With
> EAGAIN, all activity is dropped and the machine remains fully available
> to whatever guests are running at the time.
Correct.
>
> With _GET_STATUS, you would continue to try to bring the hypervisor to a
> quiet state in the background but return to userland to let this one
> thread continue. Behind the scenes though, you would still need to
<nods>
> capture all CPUs at one point and all captured CPUs would have to wait
> for the last straggler. That would lead to noticeable dead-time for
> guests running on-top.
Potentially. Using the time calibration routine to do the patching guarantees
that we will have an sync-up every second on machine - so there will be always
that possiblity.
>
> I might have misunderstood your proposal though.
You got it right.
>
> > The EAGAIN mandates that the 'bringing the CPUs together' must be done
> > under 1ms and that there must be code to enforce an timeout on the barrier.
>
> The 1ms is just a random number. I would actually suggest to allow a
> sysadmin or hotpatch management tooling to specify how long one is
> willing to potentially block the whole machine when waiting for a
> stop_machine-like barrier as part of a relevant hypercall. You could
> imagine userland to start out with 1ms and slowly work its way up
> whenever it retries.
>
> > The _GET_STATUS does not enforce this and can take longer giving us
> > more breathing room - and also unbounded time - which means if
> > we were to try to cancel it (say it had run for an hour and still
> > could not patch it)- we have to add some hairy code to
> > deal with cancelling asynchronous code.
> >
> > Your way is simpler - but I would advocate expanding the -EAGAIN to _all_
> > the xSplice hypercalls. Thoughts?
>
> In my experience, you only need the EAGAIN for hypercalls that use the
> quiet state. Depending on the design, that would be the operations that
> do hotpatch activation and deactivation (i.e., the actual splicing).
The uploading of the patch could be slow - as in the checking to be done
and on an big patch (2MB or more?) it would be good to try again.
>
> Martin
>
next prev parent reply other threads:[~2015-06-12 16:10 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-15 19:44 [RFC v2] xSplice design Konrad Rzeszutek Wilk
2015-05-18 12:41 ` Jan Beulich
2015-06-05 14:49 ` Konrad Rzeszutek Wilk
2015-06-05 15:16 ` Jan Beulich
2015-06-05 16:00 ` Konrad Rzeszutek Wilk
2015-06-05 16:14 ` Jan Beulich
2015-05-18 12:54 ` Liuqiming (John)
2015-05-18 13:11 ` Daniel Kiper
2015-06-05 14:50 ` Konrad Rzeszutek Wilk
2015-05-19 19:13 ` Lars Kurth
2015-05-20 15:11 ` Martin Pohlack
2015-06-05 15:00 ` Konrad Rzeszutek Wilk
2015-06-05 15:15 ` Andrew Cooper
2015-06-05 15:27 ` Jan Beulich
2015-06-08 8:34 ` Martin Pohlack
2015-06-08 8:51 ` Jan Beulich
2015-06-08 14:38 ` Martin Pohlack
2015-06-08 15:19 ` Konrad Rzeszutek Wilk
2015-06-12 11:51 ` Martin Pohlack
2015-06-12 14:06 ` Konrad Rzeszutek Wilk
2015-06-12 11:39 ` Martin Pohlack
2015-06-12 14:03 ` Konrad Rzeszutek Wilk
2015-06-12 14:31 ` Martin Pohlack
2015-06-12 14:43 ` Jan Beulich
2015-06-12 17:31 ` Martin Pohlack
2015-06-12 18:46 ` Konrad Rzeszutek Wilk
2015-06-12 16:09 ` Konrad Rzeszutek Wilk [this message]
2015-06-12 16:17 ` Andrew Cooper
2015-06-12 16:39 ` Konrad Rzeszutek Wilk
2015-06-12 18:36 ` Martin Pohlack
2015-06-12 18:51 ` Konrad Rzeszutek Wilk
2015-07-06 19:36 ` Konrad Rzeszutek Wilk
2015-10-27 12:05 ` Ross Lagerwall
2015-10-29 16:55 ` Ross Lagerwall
2015-10-30 10:39 ` Martin Pohlack
2015-10-30 14:03 ` Ross Lagerwall
2015-10-30 14:06 ` Martin Pohlack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150612160924.GC20667@l.oracle.com \
--to=konrad.wilk@oracle.com \
--cc=aliguori@amazon.com \
--cc=amesserl@rackspace.com \
--cc=andrew.cooper3@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=daniel.kiper@oracle.com \
--cc=elena.ufimtseva@oracle.com \
--cc=fanhenglong@huawei.com \
--cc=hanweidong@huawei.com \
--cc=jbeulich@suse.com \
--cc=jeremy@goop.org \
--cc=jinsong.liu@alibaba-inc.com \
--cc=john.liuqiming@huawei.com \
--cc=josh.kearney@rackspace.com \
--cc=konrad@darnok.org \
--cc=lars.kurth@citrix.com \
--cc=liuyingdong@huawei.com \
--cc=major.hayden@rackspace.com \
--cc=mpohlack@amazon.com \
--cc=msw@amazon.com \
--cc=paul.voccio@rackspace.com \
--cc=peter.huangpeng@huawei.com \
--cc=rick.harris@rackspace.com \
--cc=steven.wilson@rackspace.com \
--cc=xen-devel@lists.xenproject.org \
--cc=xiantao.zxt@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).