From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: RFC: making the PVH 64bit ABI as stableo Date: Sat, 6 Jun 2015 15:41:17 +0100 Message-ID: <5573068D.4010702@citrix.com> References: <556DEB9A020000780008079A@mail.emea.novell.com> <556DE352.3030703@citrix.com> <556F0A410200007800080E4B@mail.emea.novell.com> <5571CB4D.5010403@citrix.com> <5571D1BD.1090108@oracle.com> <5571D501.80305@citrix.com> <5571DAB5.9030507@citrix.com> <20150605215228.GA28885@deinos.phlegethon.org> <5572C40B.40500@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Z1FHo-0005St-Ux for xen-devel@lists.xenproject.org; Sat, 06 Jun 2015 14:41:25 +0000 In-Reply-To: <5572C40B.40500@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: =?windows-1252?Q?Roger_Pau_Monn=E9?= , Tim Deegan Cc: Elena Ufimtseva , Lars Kurth , Stefano Stabellini , David Vrabel , Jan Beulich , xen-devel , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org On 06/06/15 10:57, Roger Pau Monn=E9 wrote: > El 05/06/15 a les 23.52, Tim Deegan ha escrit: >> At 18:21 +0100 on 05 Jun (1433528517), Andrew Cooper wrote: >>> On 05/06/15 18:16, Stefano Stabellini wrote: >>>> On Fri, 5 Jun 2015, Andrew Cooper wrote: >>>>> On 05/06/15 17:43, Boris Ostrovsky wrote: >>>>>> On 06/05/2015 12:16 PM, Roger Pau Monn=E9 wrote: >>>>>>> El 03/06/15 a les 14.08, Jan Beulich ha escrit: >>>>>>>>>>> On 03.06.15 at 12:02, wrote: >>>>>>>>> On Tue, 2 Jun 2015, Andrew Cooper wrote: >>>>>>>>>> With my x86 maintainer hat on, the following is an absolute >>>>>>>>>> minimum set >>>>>>>>>> of prerequisite for PVH. >>>>>>>>>> >>>>>>>>>> * 32bit support >>>>>>>>> Could you please explain why 32bit is important to get PVH out of= tech >>>>>>>>> preview? I don't see 32 bit OSes as an important use case. Maybe = there >>>>>>>>> is more behind it that I cannot see. >>>>>>>> The primary reason was named before: 32-bit support will likely >>>>>>>> end up changing the way 64-bit guests get launched. >>>>>>> I can work on the new boot ABI, even if it's just a design document= now, >>>>>>> but the actual implementation needs to be done on top of the 32-bit >>>>>>> support series. >>>>>>> >>>>>>> Boris, do you think you could send an early RFC of your 32-bit supp= ort >>>>>>> series in a couple of weeks at most? >>>>>> That's highly unlikely. For one, I am still unable to boot MP guests. >>>>>> In addition, it is all held together by rubber bands and matchsticks >>>>>> so calling it an RFC would be an insult to RFCs. (for example, I >>>>>> apparently broke HVM somewhere along the way). >>>>> How about working it the other way around. >>>>> >>>>> Start with an HVM guest and start with a sane method of booting. I >>>>> highly suggest multiboot1 as it is very easy and we have most of the >>>>> code already. Whomever actually gets around to doing this gets leewa= y, >>>>> subject to it being sane (which the current method very certainly is = not). > I agree that using a boot ABI similar to multiboot1 is going to solve > some of the issues that we currently have, while probably simplifying > the code to build a domain. There are also several multiboot1 > implementations around which can be used as a basis for this for guest > OSes that don't have native multiboot support. > >>>>> Start the domain without qemu, and expose some of the PV hypercalls to >>>>> HVM guests, and see how far it gets. One will find suddenly that all >>>>> 32bit and AMD problems have already been solved. All the PV(h) kernel >>>>> needs to know is that there is no real hardware, and not to touch it. >>>> This seems like a clean and nice way forward, but rather than PVH is >>>> actually something else. Am I the only one to think that making this >>>> drastic change in the design at this stange (3 years in) is too late? > I don't think the ABI is going to change much, most of this plumbing is > going to be in Xen internals, so I wouldn't call it a drastic change. To my mind, the ABI covers the boot method, but I would agree that the runtime environment is unlikely to need changing. Most of what is involved along these lines is to lift the current restrictions. > >>> There was no design in the slightest, which is why we have got 3 years >>> in and are in this position. >> Please try to keep things friendly and contructive on this list. Yes, >> there was design; it was discussed on this list and at the Xen summit. >> With hindsight, it turned out that "PV guest that uses a lightweight >> HVM container" took a lot more work/code than was originally expected. >> >> I suspect that an implementation of "HVM without qemu and some >> hypercalls" will also turn out more complex than it sounds. I believe >> I've made my opinion clear that that's where PVH ought to end up, but >> I'm unconvinced that starting from scratch will be the fastest way. > I believe the right way to move forward is to start implementing this > new boot ABI on top of HVM, without axing out the PVH code. I think most > of the current PVH code would still be needed for the HVM-without-dm > kind of guest, and that at some point both will meet. +1. I agree, and think this is a very sensible way forwards. > > I will send a design document for this boot ABI next week, but the plan > is as follows: > > - Start the guest in protected mode without paging. A step in the middle here is finding finding the Xen cpuid leaves. An elfnote signifying that the kernel is PVH-capable and a cpuid leaf indicating that the environment is PVH seems like a good combination. ~Andrew > - Fill the hypercall page using wrmsr (HVM). > - Map the shared info page using XENMEM_add_to_physmap (HVM). > > That means we can get rid of some of the ELFNOTES, the ones that come to > mind right now are: > > - XEN_ELFNOTE_VIRT_BASE > - XEN_ELFNOTE_HYPERCALL_PAGE > - XEN_ELFNOTE_HV_START_LOW > - XEN_ELFNOTE_PAE_MODE > - XEN_ELFNOTE_L1_MFN_VALID > > And probably some more.