From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: RFC: making the PVH 64bit ABI as stableo
Date: Sat, 06 Jun 2015 11:50:52 -0400
Message-ID: <557316DC.8060006@oracle.com>
References: <556DEB9A020000780008079A@mail.emea.novell.com>
	<alpine.DEB.2.02.1506021648160.19838@kaball.uk.xensource.com>
	<556DE352.3030703@citrix.com>
	<alpine.DEB.2.02.1506031058090.19838@kaball.uk.xensource.com>
	<556F0A410200007800080E4B@mail.emea.novell.com>
	<5571CB4D.5010403@citrix.com> <5571D1BD.1090108@oracle.com>
	<5571D501.80305@citrix.com>
	<alpine.DEB.2.02.1506051813140.19838@kaball.uk.xensource.com>
	<5571DAB5.9030507@citrix.com>
	<20150605215228.GA28885@deinos.phlegethon.org>
	<5572C40B.40500@citrix.com> <5573068D.4010702@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <boris.ostrovsky@oracle.com>) id 1Z1GNQ-0000m2-LU
	for xen-devel@lists.xenproject.org; Sat, 06 Jun 2015 15:51:16 +0000
In-Reply-To: <5573068D.4010702@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>, =?windows-1252?Q?Roger_P?= =?windows-1252?Q?au_Monn=E9?= <roger.pau@citrix.com>, Tim Deegan <tim@xen.org>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>, Lars Kurth <lars.kurth@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, David Vrabel <david.vrabel@citrix.com>, Jan Beulich <JBeulich@suse.com>, xen-devel <xen-devel@lists.xenproject.org>
List-Id: xen-devel@lists.xenproject.org


On 06/06/2015 10:41 AM, Andrew Cooper wrote:
> On 06/06/15 10:57, Roger Pau Monn=E9 wrote:
>> El 05/06/15 a les 23.52, Tim Deegan ha escrit:
>>> At 18:21 +0100 on 05 Jun (1433528517), Andrew Cooper wrote:
>>>> On 05/06/15 18:16, Stefano Stabellini wrote:
>>>>> On Fri, 5 Jun 2015, Andrew Cooper wrote:
>>>>>> On 05/06/15 17:43, Boris Ostrovsky wrote:
>>>>>>> On 06/05/2015 12:16 PM, Roger Pau Monn=E9 wrote:
>>>>>>>> El 03/06/15 a les 14.08, Jan Beulich ha escrit:
>>>>>>>>>>>> On 03.06.15 at 12:02, <stefano.stabellini@eu.citrix.com> wrote:
>>>>>>>>>> On Tue, 2 Jun 2015, Andrew Cooper wrote:
>>>>>>>>>>> With my x86 maintainer hat on, the following is an absolute
>>>>>>>>>>> minimum set
>>>>>>>>>>> of prerequisite for PVH.
>>>>>>>>>>>
>>>>>>>>>>> * 32bit support
>>>>>>>>>> Could you please explain why 32bit is important to get PVH out o=
f tech
>>>>>>>>>> preview? I don't see 32 bit OSes as an important use case. Maybe=
 there
>>>>>>>>>> is more behind it that I cannot see.
>>>>>>>>> The primary reason was named before: 32-bit support will likely
>>>>>>>>> end up changing the way 64-bit guests get launched.
>>>>>>>> I can work on the new boot ABI, even if it's just a design documen=
t now,
>>>>>>>> but the actual implementation needs to be done on top of the 32-bit
>>>>>>>> support series.
>>>>>>>>
>>>>>>>> Boris, do you think you could send an early RFC of your 32-bit sup=
port
>>>>>>>> series in a couple of weeks at most?
>>>>>>> That's highly unlikely. For one, I am still unable to boot MP guest=
s.
>>>>>>> In addition, it is all held together by rubber bands and matchsticks
>>>>>>> so calling it an RFC would be an insult to RFCs. (for example, I
>>>>>>> apparently broke HVM somewhere along the way).
>>>>>> How about working it the other way around.
>>>>>>
>>>>>> Start with an HVM guest and start with a sane method of booting.  I
>>>>>> highly suggest multiboot1 as it is very easy and we have most of the
>>>>>> code already.  Whomever actually gets around to doing this gets leew=
ay,
>>>>>> subject to it being sane (which the current method very certainly is=
 not).
>> I agree that using a boot ABI similar to multiboot1 is going to solve
>> some of the issues that we currently have, while probably simplifying
>> the code to build a domain. There are also several multiboot1
>> implementations around which can be used as a basis for this for guest
>> OSes that don't have native multiboot support.
>>
>>>>>> Start the domain without qemu, and expose some of the PV hypercalls =
to
>>>>>> HVM guests, and see how far it gets.  One will find suddenly that all
>>>>>> 32bit and AMD problems have already been solved.  All the PV(h) kern=
el
>>>>>> needs to know is that there is no real hardware, and not to touch it.
>>>>> This seems like a clean and nice way forward, but rather than PVH is
>>>>> actually something else.  Am I the only one to think that making this
>>>>> drastic change in the design at this stange (3 years in) is too late?
>> I don't think the ABI is going to change much, most of this plumbing is
>> going to be in Xen internals, so I wouldn't call it a drastic change.
> To my mind, the ABI covers the boot method, but I would agree that the
> runtime environment is unlikely to need changing.  Most of what is
> involved along these lines is to lift the current restrictions.
>
>>>> There was no design in the slightest, which is why we have got 3 years
>>>> in and are in this position.
>>> Please try to keep things friendly and contructive on this list.  Yes,
>>> there was design; it was discussed on this list and at the Xen summit.
>>> With hindsight, it turned out that "PV guest that uses a lightweight
>>> HVM container" took a lot more work/code than was originally expected.
>>>
>>> I suspect that an implementation of "HVM without qemu and some
>>> hypercalls" will also turn out more complex than it sounds.  I believe
>>> I've made my opinion clear that that's where PVH ought to end up, but
>>> I'm unconvinced that starting from scratch will be the fastest way.
>> I believe the right way to move forward is to start implementing this
>> new boot ABI on top of HVM, without axing out the PVH code. I think most
>> of the current PVH code would still be needed for the HVM-without-dm
>> kind of guest, and that at some point both will meet.


This is a good thing to do in the medium-to-long term but it seems to me =

that we still should get current PVH implementation to work first (i.e. =

AMD/32-bit). If past experience is any indication, this rework will take =

a while while we have something working right now (warts and all).

For example, the diffstat of what I have now for 32-bit (i.e. UP only) is
     20 files changed, 503 insertions(+), 95 deletions(-)
and that is probably 75% debugging/crud.

Linux is even less:
     17 files changed, 221 insertions(+), 28 deletions(-)
(although I did some fairly awful things there).

So adding that part appears to be doable without major disruption on the =

Xen side. And I don't think AMD support will be any worse.


-boris


> +1.  I agree, and think this is a very sensible way forwards.
>
>> I will send a design document for this boot ABI next week, but the plan
>> is as follows:
>>
>>   - Start the guest in protected mode without paging.
> A step in the middle here is finding finding the Xen cpuid leaves.
>
> An elfnote signifying that the kernel is PVH-capable and a cpuid leaf
> indicating that the environment is PVH seems like a good combination.
>
> ~Andrew
>
>>   - Fill the hypercall page using wrmsr (HVM).
>>   - Map the shared info page using XENMEM_add_to_physmap (HVM).
>>
>> That means we can get rid of some of the ELFNOTES, the ones that come to
>> mind right now are:
>>
>>   - XEN_ELFNOTE_VIRT_BASE
>>   - XEN_ELFNOTE_HYPERCALL_PAGE
>>   - XEN_ELFNOTE_HV_START_LOW
>>   - XEN_ELFNOTE_PAE_MODE
>>   - XEN_ELFNOTE_L1_MFN_VALID
>>
>> And probably some more.