xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Kai Huang <kaih.linux@gmail.com>, xen-devel@lists.xen.org
Cc: kevin.tian@intel.com, sstabellini@kernel.org,
	wei.liu2@citrix.com, George.Dunlap@eu.citrix.com, tim@xen.org,
	ian.jackson@eu.citrix.com, jbeulich@suse.com
Subject: Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
Date: Tue, 11 Jul 2017 15:13:57 +0100	[thread overview]
Message-ID: <d6a7f070-687b-24fe-9d3c-2c3f74baa0f7@citrix.com> (raw)
In-Reply-To: <cover.1499586046.git.kai.huang@linux.intel.com>

On 09/07/17 09:03, Kai Huang wrote:
> Hi all,
>
> This series is RFC Xen SGX virtualization support design and RFC draft patches.

Thankyou very much for this design doc.

> 2. SGX Virtualization Design
>
> 2.1 High Level Toolstack Changes:
>
> 2.1.1 New 'epc' parameter
>
> EPC is limited resource. In order to use EPC efficiently among all domains,
> when creating guest, administrator should be able to specify domain's virtual
> EPC size. And admin
> alao should be able to get all domain's virtual EPC size.
>
> For this purpose, a new 'epc = <size>' parameter is added to XL configuration
> file. This parameter specifies guest's virtual EPC size. The EPC base address
> will be calculated by toolstack internally, according to guest's memory size,
> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.

How will this interact with multi-package servers?  Even though its fine 
to implement the single-package support first, the design should be 
extensible to the multi-package case.

First of all, what are the implications of multi-package SGX?

(Somewhere) you mention changes to scheduling.  I presume this is 
because a guest with EPC mappings in EPT must be scheduled on the same 
package, or ENCLU[EENTER] will fail.  I presume also that each package 
will have separate, unrelated private keys?

I presume there is no sensible way (even on native) for a single logical 
process to use multiple different enclaves?  By extension, does it make 
sense to try and offer parts of multiple enclaves to a single VM?

> 2.1.3 Notify domain's virtual EPC base and size to Xen
>
> Xen needs to know guest's EPC base and size in order to populate EPC pages for
> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.

I am currently in the process of reworking the Xen/Toolstack interface 
when it comes to CPUID handling.  The latest design is available here: 
https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00378.html 
but the end result will be the toolstack expressing its CPUID policy in 
terms of the architectural layout.

Therefore, I would expect that, however the setting is represented in 
the configuration file, xl/libxl would configure it with the hypervisor 
by setting CPUID.0x12[2] with the appropriate base and size.

> 2.1.4 Launch Control Support (?)
>
> Xen Launch Control Support is about to support running multiple domains with
> each running its own LE signed by different owners (if HW allows, explained
> below). As explained in 1.4 SGX Launch Control, EINIT for LE (Launch Enclave)
> only succeeds when SHA256(SIGSTRUCT.modulus) matches IA32_SGXLEPUBKEYHASHn,
> and EINIT for other enclaves will derive EINITTOKEN key according to
> IA32_SGXLEPUBKEYHASHn. Therefore, to support this, guest's virtual
> IA32_SGXLEPUBKEYHASHn must be updated to phyiscal MSRs before EINIT (which
> also means the physical IA32_SGXLEPUBKEYHASHn need to be *unlocked* in BIOS
> before booting to OS).
>
> For physical machine, it is BIOS's writer's decision that whether BIOS would
> provide interface for user to specify customerized IA32_SGXLEPUBKEYHASHn (it
> is default to digest of Intel's signing key after reset). In reality, OS's SGX
> driver may require BIOS to make MSRs *unlocked* and actively write the hash
> value to MSRs in order to run EINIT successfully, as in this case, the driver
> will not depend on BIOS's capability (whether it allows user to customerize
> IA32_SGXLEPUBKEYHASHn value).
>
> The problem is for Xen, do we need a new parameter, such as 'lehash=<SHA256>'
> to specify the default value of guset's virtual IA32_SGXLEPUBKEYHASHn? And do
> we need a new parameter, such as 'lewr' to specify whether guest's virtual MSRs
> are locked or not before handling to guest's OS?
>
> I tends to not introduce 'lehash', as it seems SGX driver would actively update
> the MSRs. And new parameter would add additional changes for upper layer
> software (such as openstack). And 'lewr' is not needed either as Xen can always
> *unlock* the MSRs to guest.
>
> Please give comments?
>
> Currently in my RFC patches above two parameters are not implemented.
> Xen hypervisor will always *unlock* the MSRs. Whether there is 'lehash'
> parameter or not doesn't impact Xen hypervisor's emulation of
> IA32_SGXLEPUBKEYHASHn. See below Xen hypervisor changes for details.

Reading around, am I correct with the following?

1) Some processors have no launch control.  There is no restriction on 
which enclaves can boot.

2) Some Skylake client processors claim to have launch control, but the 
MSRs are unavailable (is this an erratum?).  These are limited to 
booting enclaves matching the Intel public key.

3) Launch control may be locked by the BIOS.  There may be a custom 
hash, or it might be the Intel default.  Xen can't adjust it at all, but 
can support running any number of VMs with matching enclaves.

4) Launch control may be unlocked by the BIOS.  In this case, Xen can 
context switch a hash per domain, and run all enclaves.

The eventual plans for CPUID and MSR levelling should allow all of these 
to be expressed in sensible ways, and I don't forsee any issues with 
supporting all of these scenarios.



> 2.2 High Level Xen Hypervisor Changes:
>
> 2.2.1 EPC Management (?)
>
> Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
> supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible
> that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on,
> until invaid EPC is reported), but this is only true on multiple-socket server
> machines. For server machines there are additional things also needs to be done,
> such as NUMA EPC, scheduling, etc. We will support server machine in the future
> but currently we only support one EPC.
>
> EPC is reported as reserved memory (so it is not reported as normal memory).
> EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each
> EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and free
> EPC pages for guest.
>
> There are two ways to manage EPC: Manage EPC separately; or Integrate it to
> existing memory management framework.
>
> It is easy to manage EPC separately, as currently EPC is pretty small (~100MB),
> and we can even put them in a single list. However it is not flexible, for
> example, you will have to write new algorithms when EPC becomes larger, ex, GB.
> And you have to write new code to support NUMA EPC (although this will not come
> in short time).
>
> Integrating EPC to existing memory management framework seems more reasonable,
> as in this way we can resume memory management data structures/algorithms, and
> it will be more flexible to support larger EPC and potentially NUMA EPC. But
> modifying MM framework has a higher risk to break existing memory management
> code (potentially more bugs).
>
> In my RFC patches currently we choose to manage EPC separately. A new
> structure epc_page is added to represent a single 4K EPC page. A whole array
> of struct epc_page will be allocated during EPC initialization, so that given
> the other, one of PFN of EPC page and 'struct epc_page' can be got by adding
> offset.
>
> But maybe integrating EPC to MM framework is more reasonable. Comments?
>
> 2.2.2 EPC Virtualization (?)

It looks like managing the EPC is very similar to managing the NVDIMM 
ranges.  We have a (set of) physical address ranges which need 4k 
ownership granularity to different domains.

I think integrating this into struct page_struct is the better way to go.

>
> This part is how to populate EPC for guests. We have 3 choices:
>      - Static Partitioning
>      - Oversubscription
>      - Ballooning
>
> Static Partitioning means all EPC pages will be allocated and mapped to guest
> when it is created, and there's no runtime change of page table mappings for EPC
> pages. Oversubscription means Xen hypervisor supports EPC page swapping between
> domains, meaning Xen is able to evict EPC page from another domain and assign it
> to the domain that needs the EPC. With oversubscription, EPC can be assigned to
> domain on demand, when EPT violation happens. Ballooning is similar to memory
> ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest.
>
> Static Partitioning is the easiest way in terms of implementation, and there
> will be no hypervisor overhead (except EPT overhead of course), because in
> "Static partitioning", there is no EPT violation for EPC, and Xen doesn't need
> to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode.
>
> Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static
> Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't
> have EPT violation for EPC either. To support ballooning, we need ballooning
> driver in guest to issue hypercall to give up or reclaim EPC pages. In terms of
> hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2)
> Using existing XENMEM_{increase/decrease}_reservation with new memory flag, ie,
> XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not
> later.
>
> Oversubscription looks nice but it requires more complicated implemetation.
> Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow specific
> steps to evict EPC pages, and in order to do that, basically Xen needs to trap
> ENCLS from guest and keep track of EPC page status and enclave info from all
> guest. This is because:
>      - To evict regular EPC page, Xen needs to know SECS location
>      - Xen needs to know EPC page type: evicting regular EPC and evicting SECS,
>        VA page have different steps.
>      - Xen needs to know EPC page status: whether the page is blocked or not.
>
> Those info can only be got by trapping ENCLS from guest, and parsing its
> parameters (to identify SECS page, etc). Parsing ENCLS parameters means we need
> to know which ENCLS leaf is being trapped, and we need to translate guest's
> virtual address to get physical address in order to locate EPC page. And once
> ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to
> reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's
> virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective address*
> which is able to be traslated by processor when running ENCLS.
>
>      --------------------------------------------------------------
>                  |   ENCLS   |
>      --------------------------------------------------------------
>                  |          /|\
>      ENCLS VMEXIT|           | VMENTRY
>                  |           |
>                 \|/          |
>
> 		1) parse ENCLS parameters
> 		2) reconstruct(remap) guest's ENCLS parameters
> 		3) run ENCLS on behalf of guest (and skip ENCLS)
> 		4) on success, update EPC/enclave info, or inject error
>
> And Xen needs to maintain each EPC page's status (type, blocked or not, in
> enclave or not, etc). Xen also needs to maintain all Enclave's info from all
> guests, in order to find the correct SECS for regular EPC page, and enclave's
> linear address as well.
>
> So in general, "Static Partitioning" has simplest implementation, but obviously
> not the best way to use EPC efficiently; "Ballooning" has all pros of Static
> Partitioning but requies guest balloon driver; "Oversubscription" is best in
> terms of flexibility but requires complicated hypervisor implemetation.
>
> We have implemented "Static Partitioning" in RFC patches, but needs your
> feedback on whether it is enough. If not, which one should we do at next stage
> -- Ballooning or Oversubscription. IMO Ballooning may be good enough, given fact
> that currently memory is also "Static Partitioning" + "Ballooning".
>
> Comments?

Definitely go for static partitioning to begin with.  This is far 
simpler to implement.

I can't see a pressing usecase for oversubscription or ballooning. Any 
datacenter work will be using exclusively static, and I expect static 
will fine for all (or at least, most) client usecases.

>
> 2.2.3 Populate EPC for Guest
>
> Toolstack notifies Xen about domain's EPC base and size by XEN_DOMCTL_set_cpuid,
> so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid,
> particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen
> checks the values passed from toolstack is valid, Xen will allocate all EPC
> pages and setup EPT mappings for guest.
>
> 2.2.4 New Dedicated Hypercall (?)

All this information should (eventually) be available via the 
appropriate SYSCTL_get_{cpuid,msr}_policy hypercalls.  I don't see any 
need for dedicated hypercalls.

> 2.2.9 Guest Suspend & Resume
>
> On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy
> guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by
> Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will
> destroy EPC if S State is S3-S5.
>
> Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may
> not handle EPC suspend & resume correctly, in which case physically guest's EPC
> pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC
> pages are becoming invalid. Otherwise further operation in guest on EPC may
> fault as it assumes all EPC pages are invalid after guest is resumed.
>
> For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen will
> keep this SECS page into a list, and call EREMOVE for them again after all EPC
> pages have been called with EREMOVE. This time the EREMOVE on SECS will succeed
> as all children (regular EPC pages) have already been removed.
>
> 2.2.10 Destroying Domain
>
> Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen
> will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) before
> free them, as guest may shutdown unexpected (ex, user kills guest), and in this
> case, guest's EPC may still be valid.
>
> 2.3 Additional Point: Live Migration, Snapshot Support (?)

How big is the EPC?  If we are talking MB rather than GB, movement of 
the EPC could be after the pause, which would add some latency to live 
migration but should work.  I expect that people would prefer to have 
the flexibility of migration even at the cost of extra latency.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2017-07-11 14:13 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
2017-07-12 11:09   ` Andrew Cooper
2017-07-17  6:20     ` Huang, Kai
2017-07-18 10:12   ` Andrew Cooper
2017-07-18 22:41     ` Huang, Kai
2017-07-09  8:09 ` [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT Kai Huang
2017-07-12 11:11   ` Andrew Cooper
2017-07-12 18:54     ` Jan Beulich
2017-07-13  4:57       ` Huang, Kai
2017-07-09  8:09 ` [PATCH 03/15] xen: x86: add early stage SGX feature detection Kai Huang
2017-07-19 14:23   ` Andrew Cooper
2017-07-21  9:17     ` Huang, Kai
2017-07-22  1:06       ` Huang, Kai
2017-07-09  8:09 ` [PATCH 06/15] xen: x86: add SGX basic EPC management Kai Huang
2017-07-09  8:09 ` [PATCH 07/15] xen: x86: add functions to populate and destroy EPC for domain Kai Huang
2017-07-09  8:09 ` [PATCH 09/15] xen: vmx: handle SGX related MSRs Kai Huang
2017-07-19 17:27   ` Andrew Cooper
2017-07-21  9:42     ` Huang, Kai
2017-07-22  1:37       ` Huang, Kai
2017-07-09  8:09 ` [PATCH 10/15] xen: vmx: handle ENCLS VMEXIT Kai Huang
2017-07-09  8:09 ` [PATCH 11/15] xen: vmx: handle VMEXIT from SGX enclave Kai Huang
2017-07-09  8:09 ` [PATCH 12/15] xen: x86: reset EPC when guest got suspended Kai Huang
2017-07-09  8:10 ` [PATCH 04/15] xen: mm: add ioremap_cache Kai Huang
2017-07-11 20:14   ` Julien Grall
2017-07-12  1:52     ` Huang, Kai
2017-07-12  7:13       ` Julien Grall
2017-07-13  5:01         ` Huang, Kai
2017-07-12  6:17     ` Jan Beulich
2017-07-13  4:59       ` Huang, Kai
2017-07-09  8:10 ` [PATCH 08/15] xen: x86: add SGX cpuid handling support Kai Huang
2017-07-12 10:56   ` Andrew Cooper
2017-07-13  5:42     ` Huang, Kai
2017-07-14  7:37       ` Andrew Cooper
2017-07-14 11:08         ` Jan Beulich
2017-07-17  6:16         ` Huang, Kai
2017-07-09  8:12 ` [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping Kai Huang
2017-07-12 11:01   ` Andrew Cooper
2017-07-12 12:21     ` George Dunlap
2017-07-13  5:56       ` Huang, Kai
2017-07-09  8:14 ` [PATCH 13/15] xen: tools: add new 'epc' parameter support Kai Huang
2017-07-09  8:15 ` [PATCH 14/15] xen: tools: add SGX to applying CPUID policy Kai Huang
2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
2017-07-12 11:05   ` Andrew Cooper
2017-07-13  8:23     ` Huang, Kai
2017-07-14 11:31   ` Jan Beulich
2017-07-17  6:11     ` Huang, Kai
2017-07-17 10:54   ` Roger Pau Monné
2017-07-18  8:36     ` Huang, Kai
2017-07-18 10:21       ` Roger Pau Monné
2017-07-18 22:44         ` Huang, Kai
2017-07-11 14:13 ` Andrew Cooper [this message]
2017-07-17  6:08   ` [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Huang, Kai
2017-07-21  9:04     ` Huang, Kai
2017-07-17  9:16 ` Wei Liu
2017-07-18  8:22   ` Huang, Kai
2017-07-28 13:40     ` Wei Liu
2017-07-31  8:37       ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6a7f070-687b-24fe-9d3c-2c3f74baa0f7@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=kaih.linux@gmail.com \
    --cc=kevin.tian@intel.com \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).