From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Mukesh Rathor <mukesh.rathor@oracle.com>
Cc: xen-devel <xen-devel@lists.xen.org>,
"David Vrabel" <david.vrabel@citrix.com>,
"Jan Beulich" <JBeulich@suse.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: RFC: very initial PVH design document
Date: Fri, 29 Aug 2014 11:09:07 -0400 [thread overview]
Message-ID: <20140829150906.GB7706@laptop.dumpdata.com> (raw)
In-Reply-To: <20140827153842.767743de@mantra.us.oracle.com>
On Wed, Aug 27, 2014 at 03:38:42PM -0700, Mukesh Rathor wrote:
> On Wed, 27 Aug 2014 16:45:37 -0400
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>
> > On Tue, Aug 26, 2014 at 05:33:21PM -0700, Mukesh Rathor wrote:
> > > On Fri, 22 Aug 2014 16:55:08 +0200
> > > Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > I've started writing a document in order to describe the
> > > > interface exposed by Xen to PVH guests, and how it should be used
> > > > (by guest OSes). The document is far from complete (see the
> > > > amount of TODOs scattered around), but given the lack of
> > > > documentation regarding PVH I think it's a good starting point.
> > > > The aim of this is that it should be committed to the Xen
> > > > repository once it's ready. Given that this is still a *very*
> > > > early version I'm not even posting it as a patch.
> > > >
> > > > Please comment, and try to fill the holes if possible ;).
> > > >
> > > > Roger.
> > > >
> > > > ---
> > > > # PVH Specification #
> > > >
> > > > ## Rationale ##
> > > >
> > > > PVH is a new kind of guest that has been introduced on Xen 4.4 as
> > > > a DomU, and on Xen 4.5 as a Dom0. The aim of PVH is to make use
> > > > of the hardware virtualization extensions present in modern x86
> > > > CPUs in order to improve performance.
> > > >
> > > > PVH is considered a mix between PV and HVM, and can be seen as a
> > > > PV guest that runs inside of an HVM container, or as a PVHVM guest
> > > > without any emulated devices. The design goal of PVH is to provide
> > > > the best performance possible and to reduce the amount of
> > > > modifications needed for a guest OS to run in this mode (compared
> > > > to pure PV).
> > > >
> > > > This document tries to describe the interfaces used by PVH guests,
> > > > focusing on how an OS should make use of them in order to support
> > > > PVH.
> > > >
> > > > ## Early boot ##
> > > >
> > > > PVH guests use the PV boot mechanism, that means that the kernel
> > > > is loaded and directly launched by Xen (by jumping into the entry
> > > > point). In order to do this Xen ELF Notes need to be added to the
> > > > guest kernel, so that they contain the information needed by Xen.
> > > > Here is an example of the ELF Notes added to the FreeBSD amd64
> > > > kernel in order to boot as PVH:
> > > >
> > > > ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz,
> > > > __XSTRING(__FreeBSD_version)) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") ELFNOTE(Xen,
> > > > XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_ENTRY, .quad, xen_start) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START)
> > > > ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz,
> > > > "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V)
> > > > ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes")
> > >
> > > It will be helpful to add:
> > >
> > > On the linux side, the above can be found in
> > > arch/x86/xen/xen-head.S.
> > >
> > >
> > > > It is important to highlight the following notes:
> > > >
> > > > * XEN_ELFNOTE_ENTRY: contains the memory address of the kernel
> > > > entry point.
> > > > * XEN_ELFNOTE_HYPERCALL_PAGE: contains the memory address of the
> > > > hypercall page inside of the guest kernel (this memory region
> > > > will be filled by Xen prior to booting).
> > > > * XEN_ELFNOTE_FEATURES: contains the list of features supported
> > > > by the kernel. In this case the kernel is only able to boot as a
> > > > PVH guest, but those options can be mixed with the ones used by
> > > > pure PV guests in order to have a kernel that supports both PV
> > > > and PVH (like Linux). The list of options available can be found
> > > > in the `features.h` public header.
> > >
> > > Hmm... for linux I'd word that as follows:
> > >
> > > A PVH guest is started by specifying pvh=1 in the config file.
> > > However, for the guest to be launched as a PVH guest, it must
> > > minimally advertise certain features which are:
> > > auto_translated_physmap, hvm_callback_vector,
> > > writable_descriptor_tables, and supervisor_mode_kernel. This is
> > > done via XEN_ELFNOTE_FEATURES and XEN_ELFNOTE_SUPPORTED_FEATURES.
> > > See linux:arch/x86/xen/xen-head.S for more info. A list of all xen
> > > features can be found in xen:include/public/features.h. However, at
> > > present the absence of these features does not make it
> > > automatically boot in PV mode, but that may change in future. The
> > > ultimate goal is, if a guest supports these features, then boot it
> > > automatically in PVH mode, otherwise boot it in PV mode.
> > >
> > > [You can leave out the last part if you want, or just take whatever
> > > from above].
> > >
> > > > Xen will jump into the kernel entry point defined in
> > > > `XEN_ELFNOTE_ENTRY` with paging enabled (either long or protected
> > > > mode depending on the kernel bitness) and some basic page tables
> > > > setup.
> > >
> > > If I may rephrase:
> > >
> > > Guest is launched at the entry point specified in XEN_ELFNOTE_ENTRY
> > > with paging, PAE, and long mode enabled. At present only 64bit mode
> > > is supported, however, in future compat mode support will be added.
> > > An important distinction for a 64bit PVH is that it is launched at
> > > privilege level 0 as opposed to a 64bit PV guest which is launched
> > > at privilege level 3.
> > >
> > > > Also, the `rsi` (`esi` on 32bits) register is going to contain the
> > > > virtual memory address were Xen has placed the start_info
> > > > structure. The `rsp` (`esp` on 32bits) will contain a stack, that
> > > > can be used by the guest kernel. The start_info structure
> > > > contains all the info the guest needs in order to initialize.
> > > > More information about the contents can be found on the `xen.h`
> > > > public header.
> > >
> > > Since the above is all true for PV guest, you could begin it with:
> > >
> > > Just like a PV guest, the rsi ....
> > >
> > > >
> > > > ### Initial amd64 control registers values ###
> > > >
> > > > Initial values for the control registers are set up by Xen before
> > > > booting the guest kernel. The guest kernel can expect to find the
> > > > following features enabled by Xen.
> > > >
> > > > On `CR0` the following bits are set by Xen:
> > > >
> > > > * PE (bit 0): protected mode enable.
> > > > * ET (bit 4): 80387 external math coprocessor.
> > > > * PG (bit 31): paging enabled.
> > > >
> > > > On `CR4` the following bits are set by Xen:
> > > >
> > > > * PAE (bit 5): PAE enabled.
> > > >
> > > > And finally on `EFER` the following features are enabled:
> > > >
> > > > * LME (bit 8): Long mode enable.
> > > > * LMA (bit 10): Long mode active.
> > > >
> > > > *TODO*: do we expect this flags to change? Are there other flags
> > > > that might be enabled depending on the hardware we are running on?
> > >
> > > Can't think of anything...
> >
> > What about the initial segments (ES, DS, FS, GS)? We boot with Xen
> > provided ones and need to swap over from them - so that means
> > the DS and CS are initially set to Xen ones. And we should probably
> > mention that when the OS switches from Xen ones it MUST jump an
> > CS with CS.L = 1 set otherwise bad things happen.
>
> CS.L is already covered above:
> with paging, PAE, and long mode enabled. At present only 64bit mode
> is supported, however, in future compat mode support will be added.
>
> that is the CS.L bit. CS.L==1 ==> 64bit mode, CS.L==0 ==> compat mode.
I mean that we should include what the segment actually looks like.
As in what the initial segments it boots with are.
>
>
> > We should probably mention that MSR_FS_BASE, MSR_KERNEL_GS_BASE
> > and MSR_FS_BASE are zeroed out. Not sure about any other MSR?
>
> Could.
Perhaps say that any other MSRS are treated the same as they are
under an HVM guests.
>
> > Should we have a blurb about IDT and GDT and that the PV hypercalls
> > for that will be ignored.
>
> and that they are native and guest managed.
Right. Which means that during early bootup one has to be extra
careful to not get a #GP as there are no page-fault handlers setup.
>
next prev parent reply other threads:[~2014-08-29 15:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-22 14:55 RFC: very initial PVH design document Roger Pau Monné
2014-08-22 15:13 ` Jan Beulich
2014-08-22 15:49 ` Roger Pau Monné
2014-08-27 0:33 ` Mukesh Rathor
2014-08-27 20:45 ` Konrad Rzeszutek Wilk
2014-08-27 22:38 ` Mukesh Rathor
2014-08-29 15:09 ` Konrad Rzeszutek Wilk [this message]
2014-09-16 9:36 ` Roger Pau Monné
2014-09-12 20:38 ` Konrad Rzeszutek Wilk
2014-09-12 21:25 ` Mukesh Rathor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140829150906.GB7706@laptop.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=JBeulich@suse.com \
--cc=david.vrabel@citrix.com \
--cc=mukesh.rathor@oracle.com \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.