From: Boqun Feng <boqun.feng@intel.com>
To: xen-devel@lists.xen.org
Cc: Kevin Tian <kevin.tian@intel.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
kai.huang@linux.intel.com, Jan Beulich <jbeulich@suse.com>
Subject: Re: [RFC PATCH v2 00/17] RFC: SGX Virtualization design and draft patches
Date: Mon, 25 Dec 2017 13:01:19 +0800 [thread overview]
Message-ID: <20171225050119.GA748@winterfell.sh.intel.com> (raw)
In-Reply-To: <20171204001528.1342-1-boqun.feng@intel.com>
On Mon, Dec 04, 2017 at 08:15:11AM +0800, Boqun Feng wrote:
> Hi all,
>
> This is the v2 of RFC SGX Virtualization design and draft patches, you
Ping ;-)
Any comments?
Regards,
Boqun
> can find v1 at:
>
> https://lists.gt.net/xen/devel/483404
>
> In the new version, I fix a few things according to the feedbacks for
> previous version(mostly are cleanups and code movement).
>
> Besides, Kai and I redesign the SGX MSRs setting up part and introduce
> new XL parameter 'lehash' and 'lewr'.
>
> Another big change is that I modify the EPC management to fit EPC pages
> in 'struct page_info', and in patch #6 and #7, unscrubbable pages,
> 'PGC_epc', 'MEMF_epc' and 'XENZONE_EPC' are introduced, so that EPC
> management is fully integrated into existing memory management of xen.
> This might be the controversial bit, so patch 6~8 are simply to show the
> idea and drive deep discussion.
>
> Detailed changes since v1: (modifications with tag "[New]" is totally
> new in this series, reviews and comments are highly welcome for those
> parts)
>
> * Make SGX related mostly common for x86 by: 1) moving sgx.[ch] to
> arch/x86/ and include/asm-x86/ and 2) renaming EPC related functions
> with domain_* prefix.
>
> * Rename ioremap_cache() with ioremap_wb() and make it x86-specific as
> suggested by Jan Beulich.
>
> * Remove percpu sgx_cpudata, during bootup secondary CPUs now check
> whether they read different value than boot CPU, if so SGX is
> disabled.
>
> * Remove domain_has_sgx_{,launch_control}, and make sure we can
> rely on domain's arch.cpuid->feat.sgx{_lc} for setting checks.
>
> * Cleanup the code for CPUID handling as suggested by Andrew Cooper.
>
> * Adjust to msr_policy framework for SGX MSRs handling, and remove
> unnecessary fields like 'readable' and 'writable'
>
> * Use 'page_info' to maintain EPC pages, and [NEW] add an draft
> implementation for employing xenheap for EPC page management. Please
> see patch 6~8
>
> * [New] Modify the XL parameter for SGX, please see section 2.1.1 in
> the updated design doc.
>
> * [New] Use _set_vcpu_msrs hypercall in the toolstack to set the SGX
> related. Please see patch #17.
>
> * ACPI related tool changes are temporarily dropped in this patchset,
> as I need more time to resolve the comments and do related tests.
>
> And the update design doc is as follow, as the previous version in the
> design there are some particualr points that we don't know which
> implementation is better. For those a question mark (?) is added at the
> right of the menu. And for SGX live migration, thanks to Wei Liu for
> providing comments that it's nice to support if we can in previous
> version review, but we'd like hear more from you guys so we still put a
> question mark fot this item. Your comments on those "question mark (?)"
> parts (and other comments as well, of course) are highly appreciated.
>
> ===================================================================
> 1. SGX Introduction
> 1.1 Overview
> 1.1.1 Enclave
> 1.1.2 EPC (Enclave Paage Cache)
> 1.1.3 ENCLS and ENCLU
> 1.2 Discovering SGX Capability
> 1.2.1 Enumerate SGX via CPUID
> 1.2.2 Intel SGX Opt-in Configuration
> 1.3 Enclave Life Cycle
> 1.3.1 Constructing & Destroying Enclave
> 1.3.2 Enclave Entry and Exit
> 1.3.2.1 Synchonous Entry and Exit
> 1.3.2.2 Asynchounous Enclave Exit
> 1.3.3 EPC Eviction and Reload
> 1.4 SGX Launch Control
> 1.5 SGX Interaction with IA32 and IA64 Architecture
> 2. SGX Virtualization Design
> 2.1 High Level Toolstack Changes
> 2.1.1 New 'sgx' XL configure file parameter
> 2.1.2 New XL commands (?)
> 2.1.3 Notify domain's virtual EPC base and size to Xen
> 2.2 High Level Hypervisor Changes
> 2.2.1 EPC Management
> 2.2.2 EPC Virtualization
> 2.2.3 Populate EPC for Guest
> 2.2.4 Launch Control Support
> 2.2.5 CPUID Emulation
> 2.2.6 EPT Violation & ENCLS Trapping Handling
> 2.2.7 Guest Suspend & Resume
> 2.2.8 Destroying Domain
> 2.3 Additional Point: Live Migration, Snapshot Support (?)
> 3. Reference
>
> 1. SGX Introduction
>
> 1.1 Overview
>
> 1.1.1 Enclave
>
> Intel Software Guard Extensions (SGX) is a set of instructions and mechanisms
> for memory accesses in order to provide security accesses for sensitive
> applications and data. SGX allows an application to use it's pariticular address
> space as an *enclave*, which is a protected area provides confidentiality and
> integrity even in the presence of privileged malware. Accesses to the enclave
> memory area from any software not resident in the enclave are prevented,
> including those from privileged software. Below diagram illustrates the presence
> of Enclave in application.
>
> |-----------------------|
> | |
> | |---------------| |
> | | OS kernel | | |-----------------------|
> | |---------------| | | |
> | | | | | |---------------| |
> | |---------------| | | | Entry table | |
> | | Enclave |---|-----> | |---------------| |
> | |---------------| | | | Enclave stack | |
> | | App code | | | |---------------| |
> | |---------------| | | | Enclave heap | |
> | | Enclave | | | |---------------| |
> | |---------------| | | | Enclave code | |
> | | App code | | | |---------------| |
> | |---------------| | | |
> | | | |-----------------------|
> |-----------------------|
>
> SGX supports SGX1 and SGX2 extensions. SGX1 provides basic enclave support,
> and SGX2 allows additional flexibility in runtime management of enclave
> resources and thread execution within an enclave.
>
> 1.1.2 EPC (Enclave Page Cache)
>
> Just like normal application memory management, enclave memory management can be
> devided into two parts: address space allocation and memory commitment. Address
> space allocation is allocating particular range of linear address space for
> enclave. Memory commitment is assigning actual resource for the enclave.
>
> Enclave Page Cache (EPC) is the physical resource used to commit to enclave.
> EPC is divided to 4K pages. An EPC page is 4K in size and always aligned to 4K
> boundary. Hardware performs additional access control checks to restrict access
> to the EPC page. The Enclave Page Cache Map (EPCM) is a secure structure which
> holds one entry for each EPC page, and is used by hardware to track the status
> of each EPC page (invisibe to software). Typically EPC and EPCM are reserved
> by BIOS as Processor Reserved Memory but the actual amount, size, and layout
> of EPC are model-specific, and dependent on BIOS settings. EPC is enumerated
> via new SGX CPUID, and is reported as reserved memory.
>
> EPC pages can either be invalid or valid. There are 4 valid EPC types in SGX1:
> regular EPC page, SGX Enclave Control Structure (SECS) page, Thread Control
> Structure (TCS) page, and Version Array (VA) page. SGX2 adds Trimmed EPC page.
> Each enclave is associated with one SECS page. Each thread in enclave is
> associated with one TCS page. VA page is used in EPC page eviction and reload.
> Trimmed EPC page is introduced in SGX2 when particular 4K page in enclave is
> going to be freed (trimmed) at runtime after enclave is initialized.
>
> 1.1.3 ENCLS and ENCLU
>
> Two new instructions ENCLS and ENCLU are introduced to manage enclave and EPC.
> ENCLS can only run in ring 0, while ENCLU can only run in ring 3. Both ENCLS and
> ENCLU have multiple leaf functions, with EAX indicating the specific leaf
> function.
>
> SGX1 supports below ENCLS and ENCLU leaves:
>
> ENCLS:
> - ECREATE, EADD, EEXTEND, EINIT, EREMOVE (Enclave build and destroy)
> - EPA, EBLOCK, ETRACK, EWB, ELDU/ELDB (EPC eviction & reload)
>
> ENCLU:
> - EENTER, EEXIT, ERESUME (Enclave entry, exit, re-enter)
> - EGETKEY, EREPORT (SGX key derivation, attestation)
>
> Additionally, SGX2 supports below ENCLS and ENCLU leaves for runtime add/remove
> EPC page to enclave after enclave is initialized, along with permission change.
>
> ENCLS:
> - EAUG, EMODT, EMODPR
>
> ENCLU:
> - EACCEPT, EACCEPTCOPY, EMODPE
>
> VMM is able to interfere with ENCLS running in guest (see 1.2.x SGX interaction
> with VMX) but is unable to interfere with ENCLU.
>
> 1.2 Discovering SGX Capability
>
> 1.2.1 Enumerate SGX via CPUID
>
> If CPUID.0x7.0:EBX.SGX (bit 2) is 1, then processor supports SGX and SGX
> capability and resource can be enumerated via new SGX CPUID (0x12).
> CPUID.0x12.0x0 reports SGX capability, such as the presence of SGX1, SGX2,
> enclave's maximum size for both 32-bit and 64-bit application. CPUID.0x12.0x1
> reports the availability of bits that can be set for SECS.ATTRIBUTES.
> CPUID.0x12.0x2 reports the EPC resource's base and size. Platform may support
> multiple EPC sections, and CPUID.0x12.0x3 and further sub-leaves can be used
> to detect the existence of multiple EPC sections (until CPUID reports invalid
> EPC).
>
> Refer to 37.7.2 Intel SGX Resource Enumeration Leaves for full description of
> SGX CPUID 0x12.
>
> 1.2.2 Intel SGX Opt-in Configuration
>
> On processors that support Intel SGX, IA32_FEATURE_CONTROL also provides the
> SGX_ENABLE bit (bit 18) to turn on/off SGX. Before system software can enable
> and use SGX, BIOS is required to set IA32_FEATURE_CONTROL.SGX_ENABLE = 1 to
> opt-in SGX.
>
> Setting SGX_ENABLE follows the rules of IA32_FEATURE_CONTROL.LOCK (bit 0).
> Software is considered to have opted into Intel SGX if and only if
> IA32_FEATURE_CONTROL.SGX_ENABLE and IA32_FEATURE_CONTROL.LOCK are set to 1.
>
> The setting of IA32_FEATURE_CONTROL.SGX_ENABLE (bit 18) is not reflected by
> SGX CPUID. Enclave instructions will behavior differently according to value
> of CPUID.0x7.0x0:EBX.SGX and whether BIOS has opted-in SGX.
>
> Refer to 37.7.1 Intel SGX Opt-in Configuration for more information.
>
> 1.3 Enclave Life Cycle
>
> 1.3.1 Constructing & Destroying Enclave
>
> Enclave is created via ENCLS[ECREATE] leaf by previleged software. Basically
> ECREATE converts an invalid EPC page into SECS page, according to a source SECS
> structure resides in normal memory. The source SECS contains enclave's info
> such as base (linear) address, size, enclave attributes, enclave's measurement,
> etc.
>
> After ECREATE, for each 4K linear address space page, priviledged software uses
> EADD and EEXTEND to add one EPC page to it. Enclave code/data (resides in normal
> memory) is loaded to enclave during EADD for enclave's each 4K page. After all
> EPC pages are added to enclave, priviledged software calls EINIT to initialize
> the enclave, and then enclave is ready to run.
>
> During enclave is constructed, enclave measurement, which is a SHA256 hash
> value, is also built according to enclave's size, code/data itself and its
> location in enclave, etc. The measurement can be used to uniquely identify the
> enclave. SIGSTRUCT in EINIT leaf also contains the measurement specified by
> untrusted software, via MRENCLAVE. EINIT will check the two measurements and
> will only succeed when the two matches.
>
> Enclave is destroyed by running EREMOVE for all Enclave's EPC page, and then
> for enclave's SECS. EREMOVE will report SGX_CHILD_PRESENT error if it is called
> for SECS when there's still regular EPC pages that haven't been removed from
> enclave.
>
> Please refer to SDM chapter 39.1 Constructing an Enclave for more infomation.
>
> 1.3.2 Enclave Entry and Exit
>
> 1.3.2.1 Synchonous Entry and Exit
>
> After enclave is constructed, non-priviledged software use ENCLU[EENTER] to
> enter enclave to run. While process runs in enclave, non-priviledged software
> can use ENCLU[EEXIT] to exit from enclave and return to normal mode.
>
> 1.3.2.2 Asynchounous Enclave Exit
>
> Asynchronous and synchronous events, such as exceptions, interrupts, traps,
> SMIs, and VM exits may occur while executing inside an enclave. These events
> are referred to as Enclave Exiting Events (EEE). Upon an EEE, the processor
> state is securely saved inside the enclave and then replaced by a synthetic
> state to prevent leakage of secrets. The process of securely saving state and
> establishing the synthetic state is called an Asynchronous Enclave Exit (AEX).
>
> After AEX, non-priviledged software uses ENCLU[ERESUME] to re-enter enclave.
> The SGX userspace software maintains a small piece of code (resides in normal
> memory) which basically calls ERESUME to re-enter enclave. The address of this
> piece of code is called Asynchronous Exit Pointer (AEP). AEP is specified as
> parameter in EENTER and will be kept internally in enclave. Upon AEX, AEP will
> be pushed to stack and upon returning from EEE handling, such as IRET, AEP will
> be loaded to RIP and ERESUME will be called subsequently to re-enter enclave.
>
> During AEX the processor will do context saving and restore automatically
> therefore no change to interrupt handling of OS kernel and VMM is required. It
> is SGX userspace software's responsibility to setup AEP correctly.
>
> Please refer to SDM chapter 39.2 Enclave Entry and Exit for more infomation.
>
> 1.3.3 EPC Eviction and Reload
>
> SGX also allows priviledged software to evict any EPC pages that are used by
> enclave. The idea is the same as normal memory swapping. Below is the detail
> info of how to evict EPC pages.
>
> Below is the sequence to evict regular EPC page:
>
> 1) Select one or multiple regular EPC pages from one enclave
> 2) Remove EPT/PT mapping for selected EPC pages
> 3) Send IPIs to remote CPUs to flush TLB of selected EPC pages
> 4) EBLOCK on selected EPC pages
> 5) ETRACK on enclave's SECS page
> 6) allocate one available slot (8-byte) in VA page
> 7) EWB on selected EPC pages
>
> With EWB taking:
>
> - VA slot, to restore eviction version info.
> - one normal 4K page in memory, to store encrypted content of EPC page.
> - one struct PCMD in memory, to store meta data.
>
> (VA slot is a 8-byte slot in VA page, which is a particualr EPC page.)
>
> And below is the sequence to evict an SECS page or VA page:
>
> 1) locate SECS (or VA) page
> 2) remove EPT/PT mapping for SECS (or VA) page
> 3) Send IPIs to remote CPUs
> 6) allocate one available slot (8-byte) in VA page
> 4) EWB on SECS (or) page
>
> And for evicting SECS page, all regular EPC pages that belongs to that SECS
> must be evicted out prior, otherwise EWB returns SGX_CHILD_PRESENT error.
>
> And to reload an EPC page:
>
> 1) ELDU/ELDB on EPC page
> 2) setup EPT/PT mapping
>
> With ELDU/ELDB taking:
>
> - location of SECS page
> - linear address of enclave's 4K page (that we are going to reload to)
> - VA slot (used in EWB)
> - 4K page in memory (used in EWB)
> - struct PCMD in memory (used in EWB)
>
> Please refer to SDM chapter 39.5 EPC and Management of EPC pages for more
> information.
>
> 1.4 SGX Launch Control
>
> SGX requires running "Launch Enclave" (LE) before running any other enclaves.
> This is because LE is the only enclave that does not requires EINITTOKEN in
> EINIT. Running any other enclave requires a valid EINITTOKEN, which contains
> MAC of the (first 192 bytes) EINITTOKEN calculated by EINITTOKEN key. EINIT
> will verify the MAC via internally deriving the EINITTOKEN key, and only the
> EINITTOKEN that has matched MAC will be accepted by EINIT. The EINITTOKEN key
> derivation depends on some info from LE. The typical process is LE generates
> EINITTOKEN for other enclave according to LE itself and the target enclave,
> and calcualtes the MAC by using ENCLU[EGETKEY] to get the EINITTOKEN key. Only
> LE is able to get the EINITTOKEN key.
>
> Running LE requies the SHA256 hash of LE signer's RSA public key (SHA256 of
> sigstruct->modulus) to equal to IA32_SGXLEPUBKEYHASH[0-3] MSRs (the 4 MSRs
> together makes up 256-bit SHA256 hash value).
>
> If CPUID.0x7.0x0:EBX.SGX and CPUID.0x7.0x0:ECX.SGX_LAUNCH_CONTROL[bit 30] is
> set, then IA32_FEATURE_CONTROL is available, and IA32_FEATURE_CONTROL MSR has
> SGX_LAUNCH_CONTROL_ENABLE bit (bit 17) available. 1-setting of
> SGX_LAUNCH_CONTROL_ENABLE bit enables runtime change of IA32_SGXLEPUBKEYHASHn
> after IA32_FEATURE_CONTROL is locked. Otherwise, IA32_SGXLEPUBKEYHASHn are
> read-only after IA32_FEATURE_CONTROL is locked. After reset,
> IA32_SGXLEPUBKEYHASHn will be set to hash of Intel's default key. On system
> that has only CPUID.0x7.0x0:EBX.SGX set, IA32_SGXLEPUBKEYHASHn are not
> available. On such system EINIT will always treat IA32_SGXLEPUBKEYHASHn as
> Intel's default value thus only Intel's LE is able to run.
>
> On system with IA32_SGXLEPUBKEYHASHn available, it is BIOS's implementation to
> decide whether to provide configurations to user to set IA32_SGXLEPUBKEYHASHn
> in *locked* (IA32_SGXLEPUBKEYHASHn are read-only after IA32_FEATURE_CONTROL is
> locked) or *unlocked* mode (IA32_SGXLEPUBKEYHASHn are writable to kernel at
> runtime). Also BIOS may or may not provide configurations to allow user to set
> custom value of IA32_SGXLEPUBKEYHASHn.
>
> 1.5 SGX Interaction with IA32 and IA64 Architecture
>
> SDM Chapter 42 describes SGX interaction with various features in IA32 and IA64
> architecture. Below outlines the major ones. Refer to Chapter 42 for full
> description of SGX interaction with various IA32 and IA64 features.
>
> 1.5.1 VMX Changes for Supporting SGX Virtualization
>
> A new 64-bit ENCLS-exiting bitmap control field is added to VMCS (encoding
> 0202EH) to control VMEXIT on ENCLS leaf functions. And a new "Enable ENCLS
> exiting" control bit (bit 15) is defined in secondary processor based vm
> execution control. 1-Setting of "Enable ENCLS exiting" enables ENCLS-exiting
> bitmap control. ENCLS-exiting bitmap controls which ENCLS leaves will trigger
> VMEXIT.
>
> Additionally two new bits are added to indicate whether VMEXIT (any) is from
> enclave. Below two bits will be set if VMEXIT is from enclave:
> - Bit 27 in the Exit reason filed of Basic VM-exit information.
> - Bit 4 in the Interruptibility State of Guest Non-Register State of VMCS.
>
> Refer to 42.5 Interactions with VMX, 27.2.1 Basic VM-Exit Information, and
> 27.3.4 Saving Non-Register.
>
> 1.5.2 Interaction with XSAVE
>
> SGX defines a sub-field called X-Feature Request Mask (XFRM) in the attributes
> field of SECS. On enclave entry, SGX HW verifies XFRM in SECS.ATTRIBUTES are
> already enabled in XCR0.
>
> Upon AEX, SGX saves the processor extended state and miscellaneous state to
> enclave's state-save area (SSA), and clear the secrets from processor extended
> state that is used by enclave (from leaking secrets).
>
> Refer to 42.7 Interaction with Processor Extended State and Miscellaneous State
>
> 1.5.3 Interaction with S state
>
> When processor goes into S3-S5 state, EPC is destroyed, thus all enclaves are
> destroyed as well consequently.
>
> Refer to 42.14 Interaction with S States.
>
> 2. SGX Virtualization Design
>
> 2.1 High Level Toolstack Changes:
>
> 2.1.1 New 'sgx' XL configure file parameter
>
> EPC is limited resource. In order to use EPC efficiently among all domains,
> when creating guest, administrator should be able to specify domain's virtual
> EPC size. And admin alao should be able to get all domain's virtual EPC size.
>
> For SGX Launch Control virtualization, we should allow admin to create VM with
> either VM's virtual IA32_SGXLEPUBKEYHASHn locked or unlocked, and we should
> also allow admin to create VM with custom IA32_SGXLEPUBKEYHASHn value.
>
> For above purposes, below new 'sgx' XL configure file parameter is added:
>
> sgx = 'epc=<size>,lehash=<sha256-hash>,lewr=<0|1>'
>
> In which 'epc' specifies VM's EPC size in MB and it's mandatory.
>
> When physical machine is in *locked* mode, both 'lehash' and 'lewr'
> cannot be specificed, as physical machine are unable to change
> IA32_SGXLEPUBKEYHASHn at runtime. Adding either 'lehash' and 'lewr' will
> cause failure to create VM in that case. And VM's initial
> IA32_SGXLEPUBKEYHASHn value will be set to value of physical MSRs.
>
> When physical machine is in *unlocked* mode, then VM's initial
> IA32_SGXLEPUBKEYHASHn value will be set to 'lehash' if specified, or
> Intel's default value. VM's SGX_LAUNCH_CONTROL_ENABLE bit in
> IA32_FEATURE_CONTROL will be set or cleared, depending on whether 'lewr'
> is specificied (or set to true or false expilicity).
>
> Please also refer to 2.2.4 Launch Control Support.
>
> 2.1.2 New XL commands (?)
>
> Administrator should be able to get physical EPC size, and all domain's virtual
> EPC size. For this purpose, we can introduce 2 additional commands:
>
> # xl sgxinfo
>
> Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
> etc) if necessary.
>
> # xl sgxlist <did>
>
> Which will print out particular domain's virtual EPC size, or list all virtual
> EPC sizes for all supported domains.
>
> Alternatively, we can also extend existing XL commands by adding new option
>
> # xl info -sgx
>
> Which will print out physical EPC size along with other physinfo. And
>
> # xl list <did> -sgx
>
> Which will print out domain's virtual EPC size.
>
> Comments?
>
> In this RFC the two new commands are not implemented yet.
>
> 2.1.3 Notify domain's virtual EPC base and size to Xen
>
> Xen needs to know guest's EPC base and size in order to populate EPC pages for
> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.
>
> 2.2 High Level Xen Hypervisor Changes:
>
> 2.2.1 EPC Management
>
> Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
> supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible
> that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on,
> until invaid EPC is reported), but this is typically on MP-socket server on
> which each package would have its own EPC.
>
> EPC is reported as reserved memory (so it is not reported as normal memory).
> EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each
> EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and free
> EPC pages for guest.
>
> Although typically on physical machine (at least existing machines), EPC is
> ~100M in size at maximum, but we cannot assume EPC size, thus in terms of EPC
> management, it's better to integrate EPC management to Xen's memmory management
> framework to take advantage of existing Xen's memory management algorithms.
>
> Specifically, one 'struct page_info' will be created for each EPC page, just
> like normal memory, and a new flag will be defined to identify whether 'struct
> page_info' is EPC or normal memory. Existing memory allocation API
> alloc_domheap_pages will be resued to allocate EPC page, by adding a new memflag
> 'MEMF_epc' to indicate EPC allocation, rather than memory allocation. The new
> 'MEMF_epc' can also be used for EPC ballooning (if required in the future), as
> with the new flag, existing XENMEM_increase{decrease}_reservation,
> XENMEM_populate_physmap can be resued for EPC as well.
>
> 2.2.2 EPC Virtualization
>
> This part is how to populate EPC for guests. We have 3 choices:
> - Static Partitioning
> - Oversubscription
> - Ballooning
>
> Static Partitioning means all EPC pages will be allocated and mapped to guest
> when it is created, and there's no runtime change of page table mappings for EPC
> pages. Oversubscription means Xen hypervisor supports EPC page swapping between
> domains, meaning Xen is able to evict EPC page from another domain and assign it
> to the domain that needs the EPC. With oversubscription, EPC can be assigned to
> domain on demand, when EPT violation happens. Ballooning is similar to memory
> ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest.
>
> Static Partitioning is the easiest way in terms of implementation, and there
> will be no hypervisor overhead (except EPT overhead of course), because in
> "Static partitioning", there is no EPT violation for EPC, and Xen doesn't need
> to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode.
>
> Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static
> Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't
> have EPT violation for EPC either. To support ballooning, we need ballooning
> driver in guest to issue hypercall to give up or reclaim EPC pages. In terms of
> hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2)
> Using existing XENMEM_{increase/decrease}_reservation with new memory flag, ie,
> XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not
> later.
>
> Oversubscription looks nice but it requires more complicated implemetation.
> Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow specific
> steps to evict EPC pages, and in order to do that, basically Xen needs to trap
> ENCLS from guest and keep track of EPC page status and enclave info from all
> guest. This is because:
> - To evict regular EPC page, Xen needs to know SECS location
> - Xen needs to know EPC page type: evicting regular EPC and evicting SECS,
> VA page have different steps.
> - Xen needs to know EPC page status: whether the page is blocked or not.
>
> Those info can only be got by trapping ENCLS from guest, and parsing its
> parameters (to identify SECS page, etc). Parsing ENCLS parameters means we need
> to know which ENCLS leaf is being trapped, and we need to translate guest's
> virtual address to get physical address in order to locate EPC page. And once
> ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to
> reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's
> virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective address*
> which is able to be traslated by processor when running ENCLS.
>
> --------------------------------------------------------------
> | ENCLS |
> --------------------------------------------------------------
> | /|\
> ENCLS VMEXIT| | VMENTRY
> | |
> \|/ |
>
> 1) parse ENCLS parameters
> 2) reconstruct(remap) guest's ENCLS parameters
> 3) run ENCLS on behalf of guest (and skip ENCLS)
> 4) on success, update EPC/enclave info, or inject error
>
> And Xen needs to maintain each EPC page's status (type, blocked or not, in
> enclave or not, etc). Xen also needs to maintain all Enclave's info from all
> guests, in order to find the correct SECS for regular EPC page, and enclave's
> linear address as well.
>
> So in general, "Static Partitioning" has simplest implementation, but obviously
> not the best way to use EPC efficiently; "Ballooning" has all pros of Static
> Partitioning but requies guest balloon driver; "Oversubscription" is best in
> terms of flexibility but requires complicated hypervisor implemetation.
>
> We will start with "Static Partitioning". If "Ballooning" is required in the
> future, we will support it. "Oversubscription" should not be needed in
> forseeable future.
>
> 2.2.3 Populate EPC for Guest
>
> Toolstack notifies Xen about domain's EPC base and size by XEN_DOMCTL_set_cpuid,
> so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid,
> particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen
> checks the values passed from toolstack is valid, Xen will allocate all EPC
> pages and setup EPT mappings for guest.
>
> 2.2.4 Launch Control Support
>
> To support running multiple domains with each running its own LE signed by
> different owners, physical machine's BIOS must leave IA32_SGXLEPUBKEYHASHn
> *unlocked* before handing to Xen. Xen will trap domain's write to
> IA32_SGXLEPUBKEYHASHn and keep the value in vcpu internally, and update the
> value to physical MSRs when vcpu is scheduled in. This can guarantee that
> when EINIT runs in guest, guest's virtual IA32_SGXLEPUBKEYHASHn have been
> written to physical MSRs.
>
> SGX_LAUNCH_CONTROL_ENABLE bit in guest's IA32_FEATURE_CONTROL is controlled
> by new added 'lewr' XL parameter (see 2.1.1 New 'sgx' XL configure file
> parameter).
>
> If physical IA32_SGXLEPUBKEYHASHn are *locked* in machine's BIOS, then only MSR
> read is allowed from guest, and Xen will inject error for guest's MSR writes.
>
> In addition, if physical IA32_SGXLEPUBKEYHASHn are *locked*, then creating guest
> with 'lehash' parameter or 'lewr' will fail, as in such case Xen is not able to
> update guest's virtual IA32_SGXLEPUBKEYHASHn to physical MSRs.
>
> If physical IA32_SGXLEPUBKEYHASHn are not available
> (CPUID.0x7.0x0:ECX.SGX_LAUHCN_CONTROL is not present), then creating VM with
> 'lehash' and 'lewr' will also fail. In addition, any MSR read/write for
> IA32_SGXLEPUBKEYHASHn from guest is invalid and Xen will inject error in such
> case.
>
> 2.2.5 CPUID Emulation
>
> Most of native SGX CPUID info can be exposed to guest, expect below two parts:
> - Sub-leaf 0x2 needs to report domain's virtual EPC base and size, instead
> of physical EPC info.
> - Sub-leaf 0x1 needs to be consistent with guest's XCR0. For the reason of
> this part please refer to 1.5.2 Interaction with XSAVE.
>
> 2.2.6 EPT Violation & ENCLS Trapping Handling
>
> Only needed when Xen supports EPC Oversubscription, as explained above.
>
> 2.2.7 Guest Suspend & Resume
>
> On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy
> guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by
> Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will
> destroy EPC if S State is S3-S5.
>
> Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may
> not handle EPC suspend & resume correctly, in which case physically guest's EPC
> pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC
> pages are becoming invalid. Otherwise further operation in guest on EPC may
> fault as it assumes all EPC pages are invalid after guest is resumed.
>
> For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen will
> keep this SECS page into a list, and call EREMOVE for them again after all EPC
> pages have been called with EREMOVE. This time the EREMOVE on SECS will succeed
> as all children (regular EPC pages) have already been removed.
>
> 2.2.8 Destroying Domain
>
> Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen
> will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) before
> free them, as guest may shutdown unexpected (ex, user kills guest), and in this
> case, guest's EPC may still be valid.
>
> 2.3 Additional Point: Live Migration, Snapshot Support (?)
>
> Actually from hardware's point of view, SGX is not migratable. There are two
> reasons:
>
> - SGX key architecture cannot be virtualized.
>
> For example, some keys are bound to CPU. For example, Sealing key, EREPORT
> key, etc. If VM is migrated to another machine, the same enclave will derive
> the different keys. Taking Sealing key as an example, Sealing key is
> typically used by enclave (enclave can get sealing key by EGETKEY) to *seal*
> its secrets to outside (ex, persistent storage) for further use. If Sealing
> key changes after VM migration, then the enclave can never get the sealed
> secrets back by using sealing key, as it has changed, and old sealing key
> cannot be got back.
>
> - There's no ENCLS to evict EPC page to normal memory, but at the meaning
> time, still keep content in EPC. Currently once EPC page is evicted, the EPC
> page becomes invalid. So technically, we are unable to implement live
> migration (or check pointing, or snapshot) for enclave.
>
> But, with some workaround, and some facts of existing SGX driver, technically
> we are able to support Live migration (or even check pointing, snapshot). This
> is because:
>
> - Changing key (which is bound to CPU) is not a problem in reality
>
> Take Sealing key as an example. Losing sealed data is not a problem, because
> sealing key is only supposed to encrypt secrets that can be provisioned
> again. The typical work model is, enclave gets secrets provisioned from
> remote (service provider), and use sealing key to store it for further use.
> When enclave tries to *unseal* use sealing key, if the sealing key is
> changed, enclave will find the data is some kind of corrupted (integrity
> check failure), so it will ask secrets to be provisioned again from remote.
> Another reason is, in data center, VM's typically share lots of data, and as
> sealing key is bound to CPU, it means the data encrypted by one enclave on
> one machine cannot be shared by another enclave on another mahcine. So from
> SGX app writer's point of view, developer should treat Sealing key as a
> changeable key, and should handle lose of sealing data anyway. Sealing key
> should only be used to seal secrets that can be easily provisioned again.
>
> For other keys such as EREPORT key and provisioning key, which are used for
> local attestation and remote attestation, due to the second reason below,
> losing them is not a problem either.
>
> - Sudden lose of EPC is not a problem.
>
> On hardware, EPC will be lost if system goes to S3-S5, or reset, or
> shutdown, and SGX driver need to handle lose of EPC due to power transition.
> This is done by cooperation between SGX driver and userspace SGX SDK/apps.
> However during live migration, there may not be power transition in guest,
> so there may not be EPC lose during live migration. And technically we
> cannot *really* live migrate enclave (explained above), so looks it's not
> feasible. But the fact is that both Linux SGX driver and Windows SGX driver
> have already supported *sudden* lose of EPC (not EPC lose during power
> transition), which means both driver are able to recover in case EPC is lost
> at any runtime. With this, technically we are able to support live migration
> by simply ignoring EPC. After VM is migrated, the destination VM will only
> suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX
> driver are already able to handle.
>
> But we must point out such *sudden* lose of EPC is not hardware behavior,
> and other SGX driver for other OSes (such as FreeBSD) may not implement
> this, so for those guests, destination VM will behavior in unexpected
> manner. But I am not sure we need to care about other OSes.
>
> For the same reason, we are able to support check pointing for SGX guest (only
> Linux and Windows);
>
> For snapshot, we can support snapshot SGX guest by either:
>
> - Suspend guest before snapshot (s3-s5). This works for all guests but
> requires user to manually susppend guest.
> - Issue an hypercall to destroy guest's EPC in save_vm. This only works for
> Linux and Windows but doesn't require user intervention.
>
> What's your comments?
>
> 3. Reference
>
> - Intel SGX Homepage
> https://software.intel.com/en-us/sgx
>
> - Linux SGX SDK
> https://01.org/intel-software-guard-extensions
>
> - Linux SGX driver for upstreaming
> https://github.com/01org/linux-sgx
>
> - Intel SGX Specification (SDM Vol 3D)
> https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf
>
> - Paper: Intel SGX Explained
> https://eprint.iacr.org/2016/086.pdf
>
> - ISCA 2015 tutorial slides for Intel® SGX - Intel® Software
> https://software.intel.com/sites/default/files/332680-002.pdf
>
> Boqun Feng (5):
> xen: mm: introduce non-scrubbable pages
> xen: mm: manage EPC pages in Xen heaps
> xen: x86/mm: add SGX EPC management
> xen: x86: add functions to populate and destroy EPC for domain
> xen: tools: add SGX to applying MSR policy
>
> Kai Huang (12):
> xen: x86: expose SGX to HVM domain in CPU featureset
> xen: x86: add early stage SGX feature detection
> xen: vmx: detect ENCLS VMEXIT
> xen: x86/mm: introduce ioremap_wb()
> xen: p2m: new 'p2m_epc' type for EPC mapping
> xen: x86: add SGX cpuid handling support.
> xen: vmx: handle SGX related MSRs
> xen: vmx: handle ENCLS VMEXIT
> xen: vmx: handle VMEXIT from SGX enclave
> xen: x86: reset EPC when guest got suspended.
> xen: tools: add new 'sgx' parameter support
> xen: tools: add SGX to applying CPUID policy
>
> docs/misc/xen-command-line.markdown | 8 +
> tools/libxc/Makefile | 1 +
> tools/libxc/include/xc_dom.h | 4 +
> tools/libxc/include/xenctrl.h | 16 +
> tools/libxc/xc_cpuid_x86.c | 68 ++-
> tools/libxc/xc_msr_x86.h | 10 +
> tools/libxc/xc_sgx.c | 82 +++
> tools/libxl/libxl.h | 3 +-
> tools/libxl/libxl_cpuid.c | 15 +-
> tools/libxl/libxl_create.c | 10 +
> tools/libxl/libxl_dom.c | 65 ++-
> tools/libxl/libxl_internal.h | 2 +
> tools/libxl/libxl_nocpuid.c | 4 +-
> tools/libxl/libxl_types.idl | 11 +
> tools/libxl/libxl_x86.c | 12 +
> tools/ocaml/libs/xc/xenctrl_stubs.c | 11 +-
> tools/python/xen/lowlevel/xc/xc.c | 11 +-
> tools/xl/xl_parse.c | 86 +++
> tools/xl/xl_parse.h | 1 +
> xen/arch/x86/Makefile | 1 +
> xen/arch/x86/cpu/common.c | 15 +
> xen/arch/x86/cpuid.c | 62 ++-
> xen/arch/x86/domctl.c | 87 ++-
> xen/arch/x86/hvm/hvm.c | 3 +
> xen/arch/x86/hvm/vmx/vmcs.c | 16 +-
> xen/arch/x86/hvm/vmx/vmx.c | 68 +++
> xen/arch/x86/hvm/vmx/vvmx.c | 11 +
> xen/arch/x86/mm.c | 9 +-
> xen/arch/x86/mm/p2m-ept.c | 3 +
> xen/arch/x86/mm/p2m.c | 41 ++
> xen/arch/x86/msr.c | 6 +-
> xen/arch/x86/sgx.c | 815 ++++++++++++++++++++++++++++
> xen/common/page_alloc.c | 39 +-
> xen/include/asm-arm/mm.h | 9 +
> xen/include/asm-x86/cpufeature.h | 4 +
> xen/include/asm-x86/cpuid.h | 29 +-
> xen/include/asm-x86/hvm/hvm.h | 3 +
> xen/include/asm-x86/hvm/vmx/vmcs.h | 8 +
> xen/include/asm-x86/hvm/vmx/vmx.h | 3 +
> xen/include/asm-x86/mm.h | 19 +-
> xen/include/asm-x86/msr-index.h | 6 +
> xen/include/asm-x86/msr.h | 5 +
> xen/include/asm-x86/p2m.h | 12 +-
> xen/include/asm-x86/sgx.h | 86 +++
> xen/include/public/arch-x86/cpufeatureset.h | 3 +-
> xen/include/xen/mm.h | 2 +
> xen/tools/gen-cpuid.py | 3 +
> 47 files changed, 1757 insertions(+), 31 deletions(-)
> create mode 100644 tools/libxc/xc_sgx.c
> create mode 100644 xen/arch/x86/sgx.c
> create mode 100644 xen/include/asm-x86/sgx.h
>
> --
> 2.15.0
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
prev parent reply other threads:[~2017-12-25 5:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-04 0:15 [RFC PATCH v2 00/17] RFC: SGX Virtualization design and draft patches Boqun Feng
2017-12-04 0:15 ` [PATCH v2 01/17] xen: x86: expose SGX to HVM domain in CPU featureset Boqun Feng
2017-12-04 11:13 ` Julien Grall
2017-12-04 13:10 ` Boqun Feng
2017-12-04 14:13 ` Jan Beulich
2017-12-05 0:22 ` Boqun Feng
2017-12-04 0:15 ` [PATCH v2 02/17] xen: x86: add early stage SGX feature detection Boqun Feng
2017-12-04 0:15 ` [PATCH v2 03/17] xen: vmx: detect ENCLS VMEXIT Boqun Feng
2017-12-04 0:15 ` [PATCH v2 04/17] xen: x86/mm: introduce ioremap_wb() Boqun Feng
2017-12-04 0:15 ` [PATCH v2 05/17] xen: p2m: new 'p2m_epc' type for EPC mapping Boqun Feng
2017-12-04 0:15 ` [PATCH v2 06/17] xen: mm: introduce non-scrubbable pages Boqun Feng
2017-12-04 0:15 ` [PATCH v2 07/17] xen: mm: manage EPC pages in Xen heaps Boqun Feng
2017-12-04 0:15 ` [PATCH v2 08/17] xen: x86/mm: add SGX EPC management Boqun Feng
2017-12-04 0:15 ` [PATCH v2 09/17] xen: x86: add functions to populate and destroy EPC for domain Boqun Feng
2017-12-04 0:15 ` [PATCH v2 10/17] xen: x86: add SGX cpuid handling support Boqun Feng
2017-12-04 0:15 ` [PATCH v2 11/17] xen: vmx: handle SGX related MSRs Boqun Feng
2017-12-04 0:15 ` [PATCH v2 12/17] xen: vmx: handle ENCLS VMEXIT Boqun Feng
2017-12-04 0:15 ` [PATCH v2 13/17] xen: vmx: handle VMEXIT from SGX enclave Boqun Feng
2017-12-04 0:15 ` [PATCH v2 14/17] xen: x86: reset EPC when guest got suspended Boqun Feng
2017-12-04 0:15 ` [PATCH v2 15/17] xen: tools: add new 'sgx' parameter support Boqun Feng
2017-12-04 0:15 ` [PATCH v2 16/17] xen: tools: add SGX to applying CPUID policy Boqun Feng
2017-12-04 0:15 ` [PATCH v2 17/17] xen: tools: add SGX to applying MSR policy Boqun Feng
2017-12-25 5:01 ` Boqun Feng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171225050119.GA748@winterfell.sh.intel.com \
--to=boqun.feng@intel.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=kai.huang@linux.intel.com \
--cc=kevin.tian@intel.com \
--cc=sstabellini@kernel.org \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.