From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: RFC: making the PVH 64bit ABI as stableo Date: Sat, 06 Jun 2015 11:50:52 -0400 Message-ID: <557316DC.8060006@oracle.com> References: <556DEB9A020000780008079A@mail.emea.novell.com> <556DE352.3030703@citrix.com> <556F0A410200007800080E4B@mail.emea.novell.com> <5571CB4D.5010403@citrix.com> <5571D1BD.1090108@oracle.com> <5571D501.80305@citrix.com> <5571DAB5.9030507@citrix.com> <20150605215228.GA28885@deinos.phlegethon.org> <5572C40B.40500@citrix.com> <5573068D.4010702@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Z1GNQ-0000m2-LU for xen-devel@lists.xenproject.org; Sat, 06 Jun 2015 15:51:16 +0000 In-Reply-To: <5573068D.4010702@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , =?windows-1252?Q?Roger_P?= =?windows-1252?Q?au_Monn=E9?= , Tim Deegan Cc: Elena Ufimtseva , Lars Kurth , Stefano Stabellini , David Vrabel , Jan Beulich , xen-devel List-Id: xen-devel@lists.xenproject.org On 06/06/2015 10:41 AM, Andrew Cooper wrote: > On 06/06/15 10:57, Roger Pau Monn=E9 wrote: >> El 05/06/15 a les 23.52, Tim Deegan ha escrit: >>> At 18:21 +0100 on 05 Jun (1433528517), Andrew Cooper wrote: >>>> On 05/06/15 18:16, Stefano Stabellini wrote: >>>>> On Fri, 5 Jun 2015, Andrew Cooper wrote: >>>>>> On 05/06/15 17:43, Boris Ostrovsky wrote: >>>>>>> On 06/05/2015 12:16 PM, Roger Pau Monn=E9 wrote: >>>>>>>> El 03/06/15 a les 14.08, Jan Beulich ha escrit: >>>>>>>>>>>> On 03.06.15 at 12:02, wrote: >>>>>>>>>> On Tue, 2 Jun 2015, Andrew Cooper wrote: >>>>>>>>>>> With my x86 maintainer hat on, the following is an absolute >>>>>>>>>>> minimum set >>>>>>>>>>> of prerequisite for PVH. >>>>>>>>>>> >>>>>>>>>>> * 32bit support >>>>>>>>>> Could you please explain why 32bit is important to get PVH out o= f tech >>>>>>>>>> preview? I don't see 32 bit OSes as an important use case. Maybe= there >>>>>>>>>> is more behind it that I cannot see. >>>>>>>>> The primary reason was named before: 32-bit support will likely >>>>>>>>> end up changing the way 64-bit guests get launched. >>>>>>>> I can work on the new boot ABI, even if it's just a design documen= t now, >>>>>>>> but the actual implementation needs to be done on top of the 32-bit >>>>>>>> support series. >>>>>>>> >>>>>>>> Boris, do you think you could send an early RFC of your 32-bit sup= port >>>>>>>> series in a couple of weeks at most? >>>>>>> That's highly unlikely. For one, I am still unable to boot MP guest= s. >>>>>>> In addition, it is all held together by rubber bands and matchsticks >>>>>>> so calling it an RFC would be an insult to RFCs. (for example, I >>>>>>> apparently broke HVM somewhere along the way). >>>>>> How about working it the other way around. >>>>>> >>>>>> Start with an HVM guest and start with a sane method of booting. I >>>>>> highly suggest multiboot1 as it is very easy and we have most of the >>>>>> code already. Whomever actually gets around to doing this gets leew= ay, >>>>>> subject to it being sane (which the current method very certainly is= not). >> I agree that using a boot ABI similar to multiboot1 is going to solve >> some of the issues that we currently have, while probably simplifying >> the code to build a domain. There are also several multiboot1 >> implementations around which can be used as a basis for this for guest >> OSes that don't have native multiboot support. >> >>>>>> Start the domain without qemu, and expose some of the PV hypercalls = to >>>>>> HVM guests, and see how far it gets. One will find suddenly that all >>>>>> 32bit and AMD problems have already been solved. All the PV(h) kern= el >>>>>> needs to know is that there is no real hardware, and not to touch it. >>>>> This seems like a clean and nice way forward, but rather than PVH is >>>>> actually something else. Am I the only one to think that making this >>>>> drastic change in the design at this stange (3 years in) is too late? >> I don't think the ABI is going to change much, most of this plumbing is >> going to be in Xen internals, so I wouldn't call it a drastic change. > To my mind, the ABI covers the boot method, but I would agree that the > runtime environment is unlikely to need changing. Most of what is > involved along these lines is to lift the current restrictions. > >>>> There was no design in the slightest, which is why we have got 3 years >>>> in and are in this position. >>> Please try to keep things friendly and contructive on this list. Yes, >>> there was design; it was discussed on this list and at the Xen summit. >>> With hindsight, it turned out that "PV guest that uses a lightweight >>> HVM container" took a lot more work/code than was originally expected. >>> >>> I suspect that an implementation of "HVM without qemu and some >>> hypercalls" will also turn out more complex than it sounds. I believe >>> I've made my opinion clear that that's where PVH ought to end up, but >>> I'm unconvinced that starting from scratch will be the fastest way. >> I believe the right way to move forward is to start implementing this >> new boot ABI on top of HVM, without axing out the PVH code. I think most >> of the current PVH code would still be needed for the HVM-without-dm >> kind of guest, and that at some point both will meet. This is a good thing to do in the medium-to-long term but it seems to me = that we still should get current PVH implementation to work first (i.e. = AMD/32-bit). If past experience is any indication, this rework will take = a while while we have something working right now (warts and all). For example, the diffstat of what I have now for 32-bit (i.e. UP only) is 20 files changed, 503 insertions(+), 95 deletions(-) and that is probably 75% debugging/crud. Linux is even less: 17 files changed, 221 insertions(+), 28 deletions(-) (although I did some fairly awful things there). So adding that part appears to be doable without major disruption on the = Xen side. And I don't think AMD support will be any worse. -boris > +1. I agree, and think this is a very sensible way forwards. > >> I will send a design document for this boot ABI next week, but the plan >> is as follows: >> >> - Start the guest in protected mode without paging. > A step in the middle here is finding finding the Xen cpuid leaves. > > An elfnote signifying that the kernel is PVH-capable and a cpuid leaf > indicating that the environment is PVH seems like a good combination. > > ~Andrew > >> - Fill the hypercall page using wrmsr (HVM). >> - Map the shared info page using XENMEM_add_to_physmap (HVM). >> >> That means we can get rid of some of the ELFNOTES, the ones that come to >> mind right now are: >> >> - XEN_ELFNOTE_VIRT_BASE >> - XEN_ELFNOTE_HYPERCALL_PAGE >> - XEN_ELFNOTE_HV_START_LOW >> - XEN_ELFNOTE_PAE_MODE >> - XEN_ELFNOTE_L1_MFN_VALID >> >> And probably some more.