From: Boqun Feng <boqun.feng@intel.com>
To: xen-devel@lists.xen.org
Cc: Kevin Tian <kevin.tian@intel.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
kai.huang@linux.intel.com, Jan Beulich <jbeulich@suse.com>
Subject: Re: [RFC PATCH v2 00/17] RFC: SGX Virtualization design and draft patches
Date: Mon, 25 Dec 2017 13:01:19 +0800 [thread overview]
Message-ID: <20171225050119.GA748@winterfell.sh.intel.com> (raw)
In-Reply-To: <20171204001528.1342-1-boqun.feng@intel.com>
On Mon, Dec 04, 2017 at 08:15:11AM +0800, Boqun Feng wrote:
> Hi all,
>
> This is the v2 of RFC SGX Virtualization design and draft patches, you
Ping ;-)
Any comments?
Regards,
Boqun
> can find v1 at:
>
> https://lists.gt.net/xen/devel/483404
>
> In the new version, I fix a few things according to the feedbacks for
> previous version(mostly are cleanups and code movement).
>
> Besides, Kai and I redesign the SGX MSRs setting up part and introduce
> new XL parameter 'lehash' and 'lewr'.
>
> Another big change is that I modify the EPC management to fit EPC pages
> in 'struct page_info', and in patch #6 and #7, unscrubbable pages,
> 'PGC_epc', 'MEMF_epc' and 'XENZONE_EPC' are introduced, so that EPC
> management is fully integrated into existing memory management of xen.
> This might be the controversial bit, so patch 6~8 are simply to show the
> idea and drive deep discussion.
>
> Detailed changes since v1: (modifications with tag "[New]" is totally
> new in this series, reviews and comments are highly welcome for those
> parts)
>
> * Make SGX related mostly common for x86 by: 1) moving sgx.[ch] to
> arch/x86/ and include/asm-x86/ and 2) renaming EPC related functions
> with domain_* prefix.
>
> * Rename ioremap_cache() with ioremap_wb() and make it x86-specific as
> suggested by Jan Beulich.
>
> * Remove percpu sgx_cpudata, during bootup secondary CPUs now check
> whether they read different value than boot CPU, if so SGX is
> disabled.
>
> * Remove domain_has_sgx_{,launch_control}, and make sure we can
> rely on domain's arch.cpuid->feat.sgx{_lc} for setting checks.
>
> * Cleanup the code for CPUID handling as suggested by Andrew Cooper.
>
> * Adjust to msr_policy framework for SGX MSRs handling, and remove
> unnecessary fields like 'readable' and 'writable'
>
> * Use 'page_info' to maintain EPC pages, and [NEW] add an draft
> implementation for employing xenheap for EPC page management. Please
> see patch 6~8
>
> * [New] Modify the XL parameter for SGX, please see section 2.1.1 in
> the updated design doc.
>
> * [New] Use _set_vcpu_msrs hypercall in the toolstack to set the SGX
> related. Please see patch #17.
>
> * ACPI related tool changes are temporarily dropped in this patchset,
> as I need more time to resolve the comments and do related tests.
>
> And the update design doc is as follow, as the previous version in the
> design there are some particualr points that we don't know which
> implementation is better. For those a question mark (?) is added at the
> right of the menu. And for SGX live migration, thanks to Wei Liu for
> providing comments that it's nice to support if we can in previous
> version review, but we'd like hear more from you guys so we still put a
> question mark fot this item. Your comments on those "question mark (?)"
> parts (and other comments as well, of course) are highly appreciated.
>
> ===================================================================
> 1. SGX Introduction
> 1.1 Overview
> 1.1.1 Enclave
> 1.1.2 EPC (Enclave Paage Cache)
> 1.1.3 ENCLS and ENCLU
> 1.2 Discovering SGX Capability
> 1.2.1 Enumerate SGX via CPUID
> 1.2.2 Intel SGX Opt-in Configuration
> 1.3 Enclave Life Cycle
> 1.3.1 Constructing & Destroying Enclave
> 1.3.2 Enclave Entry and Exit
> 1.3.2.1 Synchonous Entry and Exit
> 1.3.2.2 Asynchounous Enclave Exit
> 1.3.3 EPC Eviction and Reload
> 1.4 SGX Launch Control
> 1.5 SGX Interaction with IA32 and IA64 Architecture
> 2. SGX Virtualization Design
> 2.1 High Level Toolstack Changes
> 2.1.1 New 'sgx' XL configure file parameter
> 2.1.2 New XL commands (?)
> 2.1.3 Notify domain's virtual EPC base and size to Xen
> 2.2 High Level Hypervisor Changes
> 2.2.1 EPC Management
> 2.2.2 EPC Virtualization
> 2.2.3 Populate EPC for Guest
> 2.2.4 Launch Control Support
> 2.2.5 CPUID Emulation
> 2.2.6 EPT Violation & ENCLS Trapping Handling
> 2.2.7 Guest Suspend & Resume
> 2.2.8 Destroying Domain
> 2.3 Additional Point: Live Migration, Snapshot Support (?)
> 3. Reference
>
> 1. SGX Introduction
>
> 1.1 Overview
>
> 1.1.1 Enclave
>
> Intel Software Guard Extensions (SGX) is a set of instructions and mechanisms
> for memory accesses in order to provide security accesses for sensitive
> applications and data. SGX allows an application to use it's pariticular address
> space as an *enclave*, which is a protected area provides confidentiality and
> integrity even in the presence of privileged malware. Accesses to the enclave
> memory area from any software not resident in the enclave are prevented,
> including those from privileged software. Below diagram illustrates the presence
> of Enclave in application.
>
> |-----------------------|
> | |
> | |---------------| |
> | | OS kernel | | |-----------------------|
> | |---------------| | | |
> | | | | | |---------------| |
> | |---------------| | | | Entry table | |
> | | Enclave |---|-----> | |---------------| |
> | |---------------| | | | Enclave stack | |
> | | App code | | | |---------------| |
> | |---------------| | | | Enclave heap | |
> | | Enclave | | | |---------------| |
> | |---------------| | | | Enclave code | |
> | | App code | | | |---------------| |
> | |---------------| | | |
> | | | |-----------------------|
> |-----------------------|
>
> SGX supports SGX1 and SGX2 extensions. SGX1 provides basic enclave support,
> and SGX2 allows additional flexibility in runtime management of enclave
> resources and thread execution within an enclave.
>
> 1.1.2 EPC (Enclave Page Cache)
>
> Just like normal application memory management, enclave memory management can be
> devided into two parts: address space allocation and memory commitment. Address
> space allocation is allocating particular range of linear address space for
> enclave. Memory commitment is assigning actual resource for the enclave.
>
> Enclave Page Cache (EPC) is the physical resource used to commit to enclave.
> EPC is divided to 4K pages. An EPC page is 4K in size and always aligned to 4K
> boundary. Hardware performs additional access control checks to restrict access
> to the EPC page. The Enclave Page Cache Map (EPCM) is a secure structure which
> holds one entry for each EPC page, and is used by hardware to track the status
> of each EPC page (invisibe to software). Typically EPC and EPCM are reserved
> by BIOS as Processor Reserved Memory but the actual amount, size, and layout
> of EPC are model-specific, and dependent on BIOS settings. EPC is enumerated
> via new SGX CPUID, and is reported as reserved memory.
>
> EPC pages can either be invalid or valid. There are 4 valid EPC types in SGX1:
> regular EPC page, SGX Enclave Control Structure (SECS) page, Thread Control
> Structure (TCS) page, and Version Array (VA) page. SGX2 adds Trimmed EPC page.
> Each enclave is associated with one SECS page. Each thread in enclave is
> associated with one TCS page. VA page is used in EPC page eviction and reload.
> Trimmed EPC page is introduced in SGX2 when particular 4K page in enclave is
> going to be freed (trimmed) at runtime after enclave is initialized.
>
> 1.1.3 ENCLS and ENCLU
>
> Two new instructions ENCLS and ENCLU are introduced to manage enclave and EPC.
> ENCLS can only run in ring 0, while ENCLU can only run in ring 3. Both ENCLS and
> ENCLU have multiple leaf functions, with EAX indicating the specific leaf
> function.
>
> SGX1 supports below ENCLS and ENCLU leaves:
>
> ENCLS:
> - ECREATE, EADD, EEXTEND, EINIT, EREMOVE (Enclave build and destroy)
> - EPA, EBLOCK, ETRACK, EWB, ELDU/ELDB (EPC eviction & reload)
>
> ENCLU:
> - EENTER, EEXIT, ERESUME (Enclave entry, exit, re-enter)
> - EGETKEY, EREPORT (SGX key derivation, attestation)
>
> Additionally, SGX2 supports below ENCLS and ENCLU leaves for runtime add/remove
> EPC page to enclave after enclave is initialized, along with permission change.
>
> ENCLS:
> - EAUG, EMODT, EMODPR
>
> ENCLU:
> - EACCEPT, EACCEPTCOPY, EMODPE
>
> VMM is able to interfere with ENCLS running in guest (see 1.2.x SGX interaction
> with VMX) but is unable to interfere with ENCLU.
>
> 1.2 Discovering SGX Capability
>
> 1.2.1 Enumerate SGX via CPUID
>
> If CPUID.0x7.0:EBX.SGX (bit 2) is 1, then processor supports SGX and SGX
> capability and resource can be enumerated via new SGX CPUID (0x12).
> CPUID.0x12.0x0 reports SGX capability, such as the presence of SGX1, SGX2,
> enclave's maximum size for both 32-bit and 64-bit application. CPUID.0x12.0x1
> reports the availability of bits that can be set for SECS.ATTRIBUTES.
> CPUID.0x12.0x2 reports the EPC resource's base and size. Platform may support
> multiple EPC sections, and CPUID.0x12.0x3 and further sub-leaves can be used
> to detect the existence of multiple EPC sections (until CPUID reports invalid
> EPC).
>
> Refer to 37.7.2 Intel SGX Resource Enumeration Leaves for full description of
> SGX CPUID 0x12.
>
> 1.2.2 Intel SGX Opt-in Configuration
>
> On processors that support Intel SGX, IA32_FEATURE_CONTROL also provides the
> SGX_ENABLE bit (bit 18) to turn on/off SGX. Before system software can enable
> and use SGX, BIOS is required to set IA32_FEATURE_CONTROL.SGX_ENABLE = 1 to
> opt-in SGX.
>
> Setting SGX_ENABLE follows the rules of IA32_FEATURE_CONTROL.LOCK (bit 0).
> Software is considered to have opted into Intel SGX if and only if
> IA32_FEATURE_CONTROL.SGX_ENABLE and IA32_FEATURE_CONTROL.LOCK are set to 1.
>
> The setting of IA32_FEATURE_CONTROL.SGX_ENABLE (bit 18) is not reflected by
> SGX CPUID. Enclave instructions will behavior differently according to value
> of CPUID.0x7.0x0:EBX.SGX and whether BIOS has opted-in SGX.
>
> Refer to 37.7.1 Intel SGX Opt-in Configuration for more information.
>
> 1.3 Enclave Life Cycle
>
> 1.3.1 Constructing & Destroying Enclave
>
> Enclave is created via ENCLS[ECREATE] leaf by previleged software. Basically
> ECREATE converts an invalid EPC page into SECS page, according to a source SECS
> structure resides in normal memory. The source SECS contains enclave's info
> such as base (linear) address, size, enclave attributes, enclave's measurement,
> etc.
>
> After ECREATE, for each 4K linear address space page, priviledged software uses
> EADD and EEXTEND to add one EPC page to it. Enclave code/data (resides in normal
> memory) is loaded to enclave during EADD for enclave's each 4K page. After all
> EPC pages are added to enclave, priviledged software calls EINIT to initialize
> the enclave, and then enclave is ready to run.
>
> During enclave is constructed, enclave measurement, which is a SHA256 hash
> value, is also built according to enclave's size, code/data itself and its
> location in enclave, etc. The measurement can be used to uniquely identify the
> enclave. SIGSTRUCT in EINIT leaf also contains the measurement specified by
> untrusted software, via MRENCLAVE. EINIT will check the two measurements and
> will only succeed when the two matches.
>
> Enclave is destroyed by running EREMOVE for all Enclave's EPC page, and then
> for enclave's SECS. EREMOVE will report SGX_CHILD_PRESENT error if it is called
> for SECS when there's still regular EPC pages that haven't been removed from
> enclave.
>
> Please refer to SDM chapter 39.1 Constructing an Enclave for more infomation.
>
> 1.3.2 Enclave Entry and Exit
>
> 1.3.2.1 Synchonous Entry and Exit
>
> After enclave is constructed, non-priviledged software use ENCLU[EENTER] to
> enter enclave to run. While process runs in enclave, non-priviledged software
> can use ENCLU[EEXIT] to exit from enclave and return to normal mode.
>
> 1.3.2.2 Asynchounous Enclave Exit
>
> Asynchronous and synchronous events, such as exceptions, interrupts, traps,
> SMIs, and VM exits may occur while executing inside an enclave. These events
> are referred to as Enclave Exiting Events (EEE). Upon an EEE, the processor
> state is securely saved inside the enclave and then replaced by a synthetic
> state to prevent leakage of secrets. The process of securely saving state and
> establishing the synthetic state is called an Asynchronous Enclave Exit (AEX).
>
> After AEX, non-priviledged software uses ENCLU[ERESUME] to re-enter enclave.
> The SGX userspace software maintains a small piece of code (resides in normal
> memory) which basically calls ERESUME to re-enter enclave. The address of this
> piece of code is called Asynchronous Exit Pointer (AEP). AEP is specified as
> parameter in EENTER and will be kept internally in enclave. Upon AEX, AEP will
> be pushed to stack and upon returning from EEE handling, such as IRET, AEP will
> be loaded to RIP and ERESUME will be called subsequently to re-enter enclave.
>
> During AEX the processor will do context saving and restore automatically
> therefore no change to interrupt handling of OS kernel and VMM is required. It
> is SGX userspace software's responsibility to setup AEP correctly.
>
> Please refer to SDM chapter 39.2 Enclave Entry and Exit for more infomation.
>
> 1.3.3 EPC Eviction and Reload
>
> SGX also allows priviledged software to evict any EPC pages that are used by
> enclave. The idea is the same as normal memory swapping. Below is the detail
> info of how to evict EPC pages.
>
> Below is the sequence to evict regular EPC page:
>
> 1) Select one or multiple regular EPC pages from one enclave
> 2) Remove EPT/PT mapping for selected EPC pages
> 3) Send IPIs to remote CPUs to flush TLB of selected EPC pages
> 4) EBLOCK on selected EPC pages
> 5) ETRACK on enclave's SECS page
> 6) allocate one available slot (8-byte) in VA page
> 7) EWB on selected EPC pages
>
> With EWB taking:
>
> - VA slot, to restore eviction version info.
> - one normal 4K page in memory, to store encrypted content of EPC page.
> - one struct PCMD in memory, to store meta data.
>
> (VA slot is a 8-byte slot in VA page, which is a particualr EPC page.)
>
> And below is the sequence to evict an SECS page or VA page:
>
> 1) locate SECS (or VA) page
> 2) remove EPT/PT mapping for SECS (or VA) page
> 3) Send IPIs to remote CPUs
> 6) allocate one available slot (8-byte) in VA page
> 4) EWB on SECS (or) page
>
> And for evicting SECS page, all regular EPC pages that belongs to that SECS
> must be evicted out prior, otherwise EWB returns SGX_CHILD_PRESENT error.
>
> And to reload an EPC page:
>
> 1) ELDU/ELDB on EPC page
> 2) setup EPT/PT mapping
>
> With ELDU/ELDB taking:
>
> - location of SECS page
> - linear address of enclave's 4K page (that we are going to reload to)
> - VA slot (used in EWB)
> - 4K page in memory (used in EWB)
> - struct PCMD in memory (used in EWB)
>
> Please refer to SDM chapter 39.5 EPC and Management of EPC pages for more
> information.
>
> 1.4 SGX Launch Control
>
> SGX requires running "Launch Enclave" (LE) before running any other enclaves.
> This is because LE is the only enclave that does not requires EINITTOKEN in
> EINIT. Running any other enclave requires a valid EINITTOKEN, which contains
> MAC of the (first 192 bytes) EINITTOKEN calculated by EINITTOKEN key. EINIT
> will verify the MAC via internally deriving the EINITTOKEN key, and only the
> EINITTOKEN that has matched MAC will be accepted by EINIT. The EINITTOKEN key
> derivation depends on some info from LE. The typical process is LE generates
> EINITTOKEN for other enclave according to LE itself and the target enclave,
> and calcualtes the MAC by using ENCLU[EGETKEY] to get the EINITTOKEN key. Only
> LE is able to get the EINITTOKEN key.
>
> Running LE requies the SHA256 hash of LE signer's RSA public key (SHA256 of
> sigstruct->modulus) to equal to IA32_SGXLEPUBKEYHASH[0-3] MSRs (the 4 MSRs
> together makes up 256-bit SHA256 hash value).
>
> If CPUID.0x7.0x0:EBX.SGX and CPUID.0x7.0x0:ECX.SGX_LAUNCH_CONTROL[bit 30] is
> set, then IA32_FEATURE_CONTROL is available, and IA32_FEATURE_CONTROL MSR has
> SGX_LAUNCH_CONTROL_ENABLE bit (bit 17) available. 1-setting of
> SGX_LAUNCH_CONTROL_ENABLE bit enables runtime change of IA32_SGXLEPUBKEYHASHn
> after IA32_FEATURE_CONTROL is locked. Otherwise, IA32_SGXLEPUBKEYHASHn are
> read-only after IA32_FEATURE_CONTROL is locked. After reset,
> IA32_SGXLEPUBKEYHASHn will be set to hash of Intel's default key. On system
> that has only CPUID.0x7.0x0:EBX.SGX set, IA32_SGXLEPUBKEYHASHn are not
> available. On such system EINIT will always treat IA32_SGXLEPUBKEYHASHn as
> Intel's default value thus only Intel's LE is able to run.
>
> On system with IA32_SGXLEPUBKEYHASHn available, it is BIOS's implementation to
> decide whether to provide configurations to user to set IA32_SGXLEPUBKEYHASHn
> in *locked* (IA32_SGXLEPUBKEYHASHn are read-only after IA32_FEATURE_CONTROL is
> locked) or *unlocked* mode (IA32_SGXLEPUBKEYHASHn are writable to kernel at
> runtime). Also BIOS may or may not provide configurations to allow user to set
> custom value of IA32_SGXLEPUBKEYHASHn.
>
> 1.5 SGX Interaction with IA32 and IA64 Architecture
>
> SDM Chapter 42 describes SGX interaction with various features in IA32 and IA64
> architecture. Below outlines the major ones. Refer to Chapter 42 for full
> description of SGX interaction with various IA32 and IA64 features.
>
> 1.5.1 VMX Changes for Supporting SGX Virtualization
>
> A new 64-bit ENCLS-exiting bitmap control field is added to VMCS (encoding
> 0202EH) to control VMEXIT on ENCLS leaf functions. And a new "Enable ENCLS
> exiting" control bit (bit 15) is defined in secondary processor based vm
> execution control. 1-Setting of "Enable ENCLS exiting" enables ENCLS-exiting
> bitmap control. ENCLS-exiting bitmap controls which ENCLS leaves will trigger
> VMEXIT.
>
> Additionally two new bits are added to indicate whether VMEXIT (any) is from
> enclave. Below two bits will be set if VMEXIT is from enclave:
> - Bit 27 in the Exit reason filed of Basic VM-exit information.
> - Bit 4 in the Interruptibility State of Guest Non-Register State of VMCS.
>
> Refer to 42.5 Interactions with VMX, 27.2.1 Basic VM-Exit Information, and
> 27.3.4 Saving Non-Register.
>
> 1.5.2 Interaction with XSAVE
>
> SGX defines a sub-field called X-Feature Request Mask (XFRM) in the attributes
> field of SECS. On enclave entry, SGX HW verifies XFRM in SECS.ATTRIBUTES are
> already enabled in XCR0.
>
> Upon AEX, SGX saves the processor extended state and miscellaneous state to
> enclave's state-save area (SSA), and clear the secrets from processor extended
> state that is used by enclave (from leaking secrets).
>
> Refer to 42.7 Interaction with Processor Extended State and Miscellaneous State
>
> 1.5.3 Interaction with S state
>
> When processor goes into S3-S5 state, EPC is destroyed, thus all enclaves are
> destroyed as well consequently.
>
> Refer to 42.14 Interaction with S States.
>
> 2. SGX Virtualization Design
>
> 2.1 High Level Toolstack Changes:
>
> 2.1.1 New 'sgx' XL configure file parameter
>
> EPC is limited resource. In order to use EPC efficiently among all domains,
> when creating guest, administrator should be able to specify domain's virtual
> EPC size. And admin alao should be able to get all domain's virtual EPC size.
>
> For SGX Launch Control virtualization, we should allow admin to create VM with
> either VM's virtual IA32_SGXLEPUBKEYHASHn locked or unlocked, and we should
> also allow admin to create VM with custom IA32_SGXLEPUBKEYHASHn value.
>
> For above purposes, below new 'sgx' XL configure file parameter is added:
>
> sgx = 'epc=<size>,lehash=<sha256-hash>,lewr=<0|1>'
>
> In which 'epc' specifies VM's EPC size in MB and it's mandatory.
>
> When physical machine is in *locked* mode, both 'lehash' and 'lewr'
> cannot be specificed, as physical machine are unable to change
> IA32_SGXLEPUBKEYHASHn at runtime. Adding either 'lehash' and 'lewr' will
> cause failure to create VM in that case. And VM's initial
> IA32_SGXLEPUBKEYHASHn value will be set to value of physical MSRs.
>
> When physical machine is in *unlocked* mode, then VM's initial
> IA32_SGXLEPUBKEYHASHn value will be set to 'lehash' if specified, or
> Intel's default value. VM's SGX_LAUNCH_CONTROL_ENABLE bit in
> IA32_FEATURE_CONTROL will be set or cleared, depending on whether 'lewr'
> is specificied (or set to true or false expilicity).
>
> Please also refer to 2.2.4 Launch Control Support.
>
> 2.1.2 New XL commands (?)
>
> Administrator should be able to get physical EPC size, and all domain's virtual
> EPC size. For this purpose, we can introduce 2 additional commands:
>
> # xl sgxinfo
>
> Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
> etc) if necessary.
>
> # xl sgxlist <did>
>
> Which will print out particular domain's virtual EPC size, or list all virtual
> EPC sizes for all supported domains.
>
> Alternatively, we can also extend existing XL commands by adding new option
>
> # xl info -sgx
>
> Which will print out physical EPC size along with other physinfo. And
>
> # xl list <did> -sgx
>
> Which will print out domain's virtual EPC size.
>
> Comments?
>
> In this RFC the two new commands are not implemented yet.
>
> 2.1.3 Notify domain's virtual EPC base and size to Xen
>
> Xen needs to know guest's EPC base and size in order to populate EPC pages for
> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.
>
> 2.2 High Level Xen Hypervisor Changes:
>
> 2.2.1 EPC Management
>
> Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
> supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible
> that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on,
> until invaid EPC is reported), but this is typically on MP-socket server on
> which each package would have its own EPC.
>
> EPC is reported as reserved memory (so it is not reported as normal memory).
> EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each
> EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and free
> EPC pages for guest.
>
> Although typically on physical machine (at least existing machines), EPC is
> ~100M in size at maximum, but we cannot assume EPC size, thus in terms of EPC
> management, it's better to integrate EPC management to Xen's memmory management
> framework to take advantage of existing Xen's memory management algorithms.
>
> Specifically, one 'struct page_info' will be created for each EPC page, just
> like normal memory, and a new flag will be defined to identify whether 'struct
> page_info' is EPC or normal memory. Existing memory allocation API
> alloc_domheap_pages will be resued to allocate EPC page, by adding a new memflag
> 'MEMF_epc' to indicate EPC allocation, rather than memory allocation. The new
> 'MEMF_epc' can also be used for EPC ballooning (if required in the future), as
> with the new flag, existing XENMEM_increase{decrease}_reservation,
> XENMEM_populate_physmap can be resued for EPC as well.
>
> 2.2.2 EPC Virtualization
>
> This part is how to populate EPC for guests. We have 3 choices:
> - Static Partitioning
> - Oversubscription
> - Ballooning
>
> Static Partitioning means all EPC pages will be allocated and mapped to guest
> when it is created, and there's no runtime change of page table mappings for EPC
> pages. Oversubscription means Xen hypervisor supports EPC page swapping between
> domains, meaning Xen is able to evict EPC page from another domain and assign it
> to the domain that needs the EPC. With oversubscription, EPC can be assigned to
> domain on demand, when EPT violation happens. Ballooning is similar to memory
> ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest.
>
> Static Partitioning is the easiest way in terms of implementation, and there
> will be no hypervisor overhead (except EPT overhead of course), because in
> "Static partitioning", there is no EPT violation for EPC, and Xen doesn't need
> to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode.
>
> Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static
> Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't
> have EPT violation for EPC either. To support ballooning, we need ballooning
> driver in guest to issue hypercall to give up or reclaim EPC pages. In terms of
> hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2)
> Using existing XENMEM_{increase/decrease}_reservation with new memory flag, ie,
> XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not
> later.
>
> Oversubscription looks nice but it requires more complicated implemetation.
> Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow specific
> steps to evict EPC pages, and in order to do that, basically Xen needs to trap
> ENCLS from guest and keep track of EPC page status and enclave info from all
> guest. This is because:
> - To evict regular EPC page, Xen needs to know SECS location
> - Xen needs to know EPC page type: evicting regular EPC and evicting SECS,
> VA page have different steps.
> - Xen needs to know EPC page status: whether the page is blocked or not.
>
> Those info can only be got by trapping ENCLS from guest, and parsing its
> parameters (to identify SECS page, etc). Parsing ENCLS parameters means we need
> to know which ENCLS leaf is being trapped, and we need to translate guest's
> virtual address to get physical address in order to locate EPC page. And once
> ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to
> reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's
> virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective address*
> which is able to be traslated by processor when running ENCLS.
>
> --------------------------------------------------------------
> | ENCLS |
> --------------------------------------------------------------
> | /|\
> ENCLS VMEXIT| | VMENTRY
> | |
> \|/ |
>
> 1) parse ENCLS parameters
> 2) reconstruct(remap) guest's ENCLS parameters
> 3) run ENCLS on behalf of guest (and skip ENCLS)
> 4) on success, update EPC/enclave info, or inject error
>
> And Xen needs to maintain each EPC page's status (type, blocked or not, in
> enclave or not, etc). Xen also needs to maintain all Enclave's info from all
> guests, in order to find the correct SECS for regular EPC page, and enclave's
> linear address as well.
>
> So in general, "Static Partitioning" has simplest implementation, but obviously
> not the best way to use EPC efficiently; "Ballooning" has all pros of Static
> Partitioning but requies guest balloon driver; "Oversubscription" is best in
> terms of flexibility but requires complicated hypervisor implemetation.
>
> We will start with "Static Partitioning". If "Ballooning" is required in the
> future, we will support it. "Oversubscription" should not be needed in
> forseeable future.
>
> 2.2.3 Populate EPC for Guest
>
> Toolstack notifies Xen about domain's EPC base and size by XEN_DOMCTL_set_cpuid,
> so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid,
> particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen
> checks the values passed from toolstack is valid, Xen will allocate all EPC
> pages and setup EPT mappings for guest.
>
> 2.2.4 Launch Control Support
>
> To support running multiple domains with each running its own LE signed by
> different owners, physical machine's BIOS must leave IA32_SGXLEPUBKEYHASHn
> *unlocked* before handing to Xen. Xen will trap domain's write to
> IA32_SGXLEPUBKEYHASHn and keep the value in vcpu internally, and update the
> value to physical MSRs when vcpu is scheduled in. This can guarantee that
> when EINIT runs in guest, guest's virtual IA32_SGXLEPUBKEYHASHn have been
> written to physical MSRs.
>
> SGX_LAUNCH_CONTROL_ENABLE bit in guest's IA32_FEATURE_CONTROL is controlled
> by new added 'lewr' XL parameter (see 2.1.1 New 'sgx' XL configure file
> parameter).
>
> If physical IA32_SGXLEPUBKEYHASHn are *locked* in machine's BIOS, then only MSR
> read is allowed from guest, and Xen will inject error for guest's MSR writes.
>
> In addition, if physical IA32_SGXLEPUBKEYHASHn are *locked*, then creating guest
> with 'lehash' parameter or 'lewr' will fail, as in such case Xen is not able to
> update guest's virtual IA32_SGXLEPUBKEYHASHn to physical MSRs.
>
> If physical IA32_SGXLEPUBKEYHASHn are not available
> (CPUID.0x7.0x0:ECX.SGX_LAUHCN_CONTROL is not present), then creating VM with
> 'lehash' and 'lewr' will also fail. In addition, any MSR read/write for
> IA32_SGXLEPUBKEYHASHn from guest is invalid and Xen will inject error in such
> case.
>
> 2.2.5 CPUID Emulation
>
> Most of native SGX CPUID info can be exposed to guest, expect below two parts:
> - Sub-leaf 0x2 needs to report domain's virtual EPC base and size, instead
> of physical EPC info.
> - Sub-leaf 0x1 needs to be consistent with guest's XCR0. For the reason of
> this part please refer to 1.5.2 Interaction with XSAVE.
>
> 2.2.6 EPT Violation & ENCLS Trapping Handling
>
> Only needed when Xen supports EPC Oversubscription, as explained above.
>
> 2.2.7 Guest Suspend & Resume
>
> On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy
> guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by
> Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will
> destroy EPC if S State is S3-S5.
>
> Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may
> not handle EPC suspend & resume correctly, in which case physically guest's EPC
> pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC
> pages are becoming invalid. Otherwise further operation in guest on EPC may
> fault as it assumes all EPC pages are invalid after guest is resumed.
>
> For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen will
> keep this SECS page into a list, and call EREMOVE for them again after all EPC
> pages have been called with EREMOVE. This time the EREMOVE on SECS will succeed
> as all children (regular EPC pages) have already been removed.
>
> 2.2.8 Destroying Domain
>
> Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen
> will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) before
> free them, as guest may shutdown unexpected (ex, user kills guest), and in this
> case, guest's EPC may still be valid.
>
> 2.3 Additional Point: Live Migration, Snapshot Support (?)
>
> Actually from hardware's point of view, SGX is not migratable. There are two
> reasons:
>
> - SGX key architecture cannot be virtualized.
>
> For example, some keys are bound to CPU. For example, Sealing key, EREPORT
> key, etc. If VM is migrated to another machine, the same enclave will derive
> the different keys. Taking Sealing key as an example, Sealing key is
> typically used by enclave (enclave can get sealing key by EGETKEY) to *seal*
> its secrets to outside (ex, persistent storage) for further use. If Sealing
> key changes after VM migration, then the enclave can never get the sealed
> secrets back by using sealing key, as it has changed, and old sealing key
> cannot be got back.
>
> - There's no ENCLS to evict EPC page to normal memory, but at the meaning
> time, still keep content in EPC. Currently once EPC page is evicted, the EPC
> page becomes invalid. So technically, we are unable to implement live
> migration (or check pointing, or snapshot) for enclave.
>
> But, with some workaround, and some facts of existing SGX driver, technically
> we are able to support Live migration (or even check pointing, snapshot). This
> is because:
>
> - Changing key (which is bound to CPU) is not a problem in reality
>
> Take Sealing key as an example. Losing sealed data is not a problem, because
> sealing key is only supposed to encrypt secrets that can be provisioned
> again. The typical work model is, enclave gets secrets provisioned from
> remote (service provider), and use sealing key to store it for further use.
> When enclave tries to *unseal* use sealing key, if the sealing key is
> changed, enclave will find the data is some kind of corrupted (integrity
> check failure), so it will ask secrets to be provisioned again from remote.
> Another reason is, in data center, VM's typically share lots of data, and as
> sealing key is bound to CPU, it means the data encrypted by one enclave on
> one machine cannot be shared by another enclave on another mahcine. So from
> SGX app writer's point of view, developer should treat Sealing key as a
> changeable key, and should handle lose of sealing data anyway. Sealing key
> should only be used to seal secrets that can be easily provisioned again.
>
> For other keys such as EREPORT key and provisioning key, which are used for
> local attestation and remote attestation, due to the second reason below,
> losing them is not a problem either.
>
> - Sudden lose of EPC is not a problem.
>
> On hardware, EPC will be lost if system goes to S3-S5, or reset, or
> shutdown, and SGX driver need to handle lose of EPC due to power transition.
> This is done by cooperation between SGX driver and userspace SGX SDK/apps.
> However during live migration, there may not be power transition in guest,
> so there may not be EPC lose during live migration. And technically we
> cannot *really* live migrate enclave (explained above), so looks it's not
> feasible. But the fact is that both Linux SGX driver and Windows SGX driver
> have already supported *sudden* lose of EPC (not EPC lose during power
> transition), which means both driver are able to recover in case EPC is lost
> at any runtime. With this, technically we are able to support live migration
> by simply ignoring EPC. After VM is migrated, the destination VM will only
> suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX
> driver are already able to handle.
>
> But we must point out such *sudden* lose of EPC is not hardware behavior,
> and other SGX driver for other OSes (such as FreeBSD) may not implement
> this, so for those guests, destination VM will behavior in unexpected
> manner. But I am not sure we need to care about other OSes.
>
> For the same reason, we are able to support check pointing for SGX guest (only
> Linux and Windows);
>
> For snapshot, we can support snapshot SGX guest by either:
>
> - Suspend guest before snapshot (s3-s5). This works for all guests but
> requires user to manually susppend guest.
> - Issue an hypercall to destroy guest's EPC in save_vm. This only works for
> Linux and Windows but doesn't require user intervention.
>
> What's your comments?
>
> 3. Reference
>
> - Intel SGX Homepage
> https://software.intel.com/en-us/sgx
>
> - Linux SGX SDK
> https://01.org/intel-software-guard-extensions
>
> - Linux SGX driver for upstreaming
> https://github.com/01org/linux-sgx
>
> - Intel SGX Specification (SDM Vol 3D)
> https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf
>
> - Paper: Intel SGX Explained
> https://eprint.iacr.org/2016/086.pdf
>
> - ISCA 2015 tutorial slides for Intel® SGX - Intel® Software
> https://software.intel.com/sites/default/files/332680-002.pdf
>
> Boqun Feng (5):
> xen: mm: introduce non-scrubbable pages
> xen: mm: manage EPC pages in Xen heaps
> xen: x86/mm: add SGX EPC management
> xen: x86: add functions to populate and destroy EPC for domain
> xen: tools: add SGX to applying MSR policy
>
> Kai Huang (12):
> xen: x86: expose SGX to HVM domain in CPU featureset
> xen: x86: add early stage SGX feature detection
> xen: vmx: detect ENCLS VMEXIT
> xen: x86/mm: introduce ioremap_wb()
> xen: p2m: new 'p2m_epc' type for EPC mapping
> xen: x86: add SGX cpuid handling support.
> xen: vmx: handle SGX related MSRs
> xen: vmx: handle ENCLS VMEXIT
> xen: vmx: handle VMEXIT from SGX enclave
> xen: x86: reset EPC when guest got suspended.
> xen: tools: add new 'sgx' parameter support
> xen: tools: add SGX to applying CPUID policy
>
> docs/misc/xen-command-line.markdown | 8 +
> tools/libxc/Makefile | 1 +
> tools/libxc/include/xc_dom.h | 4 +
> tools/libxc/include/xenctrl.h | 16 +
> tools/libxc/xc_cpuid_x86.c | 68 ++-
> tools/libxc/xc_msr_x86.h | 10 +
> tools/libxc/xc_sgx.c | 82 +++
> tools/libxl/libxl.h | 3 +-
> tools/libxl/libxl_cpuid.c | 15 +-
> tools/libxl/libxl_create.c | 10 +
> tools/libxl/libxl_dom.c | 65 ++-
> tools/libxl/libxl_internal.h | 2 +
> tools/libxl/libxl_nocpuid.c | 4 +-
> tools/libxl/libxl_types.idl | 11 +
> tools/libxl/libxl_x86.c | 12 +
> tools/ocaml/libs/xc/xenctrl_stubs.c | 11 +-
> tools/python/xen/lowlevel/xc/xc.c | 11 +-
> tools/xl/xl_parse.c | 86 +++
> tools/xl/xl_parse.h | 1 +
> xen/arch/x86/Makefile | 1 +
> xen/arch/x86/cpu/common.c | 15 +
> xen/arch/x86/cpuid.c | 62 ++-
> xen/arch/x86/domctl.c | 87 ++-
> xen/arch/x86/hvm/hvm.c | 3 +
> xen/arch/x86/hvm/vmx/vmcs.c | 16 +-
> xen/arch/x86/hvm/vmx/vmx.c | 68 +++
> xen/arch/x86/hvm/vmx/vvmx.c | 11 +
> xen/arch/x86/mm.c | 9 +-
> xen/arch/x86/mm/p2m-ept.c | 3 +
> xen/arch/x86/mm/p2m.c | 41 ++
> xen/arch/x86/msr.c | 6 +-
> xen/arch/x86/sgx.c | 815 ++++++++++++++++++++++++++++
> xen/common/page_alloc.c | 39 +-
> xen/include/asm-arm/mm.h | 9 +
> xen/include/asm-x86/cpufeature.h | 4 +
> xen/include/asm-x86/cpuid.h | 29 +-
> xen/include/asm-x86/hvm/hvm.h | 3 +
> xen/include/asm-x86/hvm/vmx/vmcs.h | 8 +
> xen/include/asm-x86/hvm/vmx/vmx.h | 3 +
> xen/include/asm-x86/mm.h | 19 +-
> xen/include/asm-x86/msr-index.h | 6 +
> xen/include/asm-x86/msr.h | 5 +
> xen/include/asm-x86/p2m.h | 12 +-
> xen/include/asm-x86/sgx.h | 86 +++
> xen/include/public/arch-x86/cpufeatureset.h | 3 +-
> xen/include/xen/mm.h | 2 +
> xen/tools/gen-cpuid.py | 3 +
> 47 files changed, 1757 insertions(+), 31 deletions(-)
> create mode 100644 tools/libxc/xc_sgx.c
> create mode 100644 xen/arch/x86/sgx.c
> create mode 100644 xen/include/asm-x86/sgx.h
>
> --
> 2.15.0
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
prev parent reply other threads:[~2017-12-25 5:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-04 0:15 [RFC PATCH v2 00/17] RFC: SGX Virtualization design and draft patches Boqun Feng
2017-12-04 0:15 ` [PATCH v2 01/17] xen: x86: expose SGX to HVM domain in CPU featureset Boqun Feng
2017-12-04 11:13 ` Julien Grall
2017-12-04 13:10 ` Boqun Feng
2017-12-04 14:13 ` Jan Beulich
2017-12-05 0:22 ` Boqun Feng
2017-12-04 0:15 ` [PATCH v2 02/17] xen: x86: add early stage SGX feature detection Boqun Feng
2017-12-04 0:15 ` [PATCH v2 03/17] xen: vmx: detect ENCLS VMEXIT Boqun Feng
2017-12-04 0:15 ` [PATCH v2 04/17] xen: x86/mm: introduce ioremap_wb() Boqun Feng
2017-12-04 0:15 ` [PATCH v2 05/17] xen: p2m: new 'p2m_epc' type for EPC mapping Boqun Feng
2017-12-04 0:15 ` [PATCH v2 06/17] xen: mm: introduce non-scrubbable pages Boqun Feng
2017-12-04 0:15 ` [PATCH v2 07/17] xen: mm: manage EPC pages in Xen heaps Boqun Feng
2017-12-04 0:15 ` [PATCH v2 08/17] xen: x86/mm: add SGX EPC management Boqun Feng
2017-12-04 0:15 ` [PATCH v2 09/17] xen: x86: add functions to populate and destroy EPC for domain Boqun Feng
2017-12-04 0:15 ` [PATCH v2 10/17] xen: x86: add SGX cpuid handling support Boqun Feng
2017-12-04 0:15 ` [PATCH v2 11/17] xen: vmx: handle SGX related MSRs Boqun Feng
2017-12-04 0:15 ` [PATCH v2 12/17] xen: vmx: handle ENCLS VMEXIT Boqun Feng
2017-12-04 0:15 ` [PATCH v2 13/17] xen: vmx: handle VMEXIT from SGX enclave Boqun Feng
2017-12-04 0:15 ` [PATCH v2 14/17] xen: x86: reset EPC when guest got suspended Boqun Feng
2017-12-04 0:15 ` [PATCH v2 15/17] xen: tools: add new 'sgx' parameter support Boqun Feng
2017-12-04 0:15 ` [PATCH v2 16/17] xen: tools: add SGX to applying CPUID policy Boqun Feng
2017-12-04 0:15 ` [PATCH v2 17/17] xen: tools: add SGX to applying MSR policy Boqun Feng
2017-12-25 5:01 ` Boqun Feng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171225050119.GA748@winterfell.sh.intel.com \
--to=boqun.feng@intel.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=kai.huang@linux.intel.com \
--cc=kevin.tian@intel.com \
--cc=sstabellini@kernel.org \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).