xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel <xen-devel@lists.xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>, Jan Beulich <JBeulich@suse.com>
Subject: [PATCH for-4.7] docs: Feature Levelling feature document
Date: Tue, 31 May 2016 18:05:45 +0100	[thread overview]
Message-ID: <1464714345-26571-1-git-send-email-andrew.cooper3@citrix.com> (raw)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/features/feature-levelling.pandoc | 211 +++++++++++++++++++++++++++++++++
 1 file changed, 211 insertions(+)
 create mode 100644 docs/features/feature-levelling.pandoc

diff --git a/docs/features/feature-levelling.pandoc b/docs/features/feature-levelling.pandoc
new file mode 100644
index 0000000..50bf099
--- /dev/null
+++ b/docs/features/feature-levelling.pandoc
@@ -0,0 +1,211 @@
+% Feature Levelling
+% Draft 1
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Supported**
+
+   Architecture: x86
+
+      Component: Hypervisor, toolstack, guest
+---------------- ----------------------------------------------------
+
+
+# Overview
+
+On native hardware, a kernel will boot, detect features, typically optimise
+certain codepaths based on the available features, and expect the features to
+remain available until it shuts down.
+
+The same expectation exists for virtual machines, and it is up to the
+hypervisor/toolstack to fulfil this expectation for the lifetime of the
+virtual machine, including across migrate/suspend/resume.
+
+
+# User details
+
+Many factors affect the featureset which a VM may use:
+
+* The CPU itself
+* The BIOS/firmware/microcode version and settings
+* The hypervisor version and command line settings
+* Further restrictions the toolstack chooses to apply
+
+A firmware or software upgrade might reduce the available set of features
+(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell
+processors), as may editing the settings.
+
+It is unsafe to make any assumption about features remaining consistent across
+a host reboot.  Xen recalculates all information from scratch each boot, and
+provides the information for the toolstack to consume.
+
+N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these
+levelling improvements.  These features are of interest to higher level
+toolstacks such as `libvirt` or `XAPI`.
+
+
+# Technical details
+
+The `CPUID` instruction is used by software to query for features.  In the
+virtualisation usecase, guest software should query Xen rather than hardware
+directly.  However, `CPUID` is an unprivileged instruction which doesn't
+fault, complicating the task of hiding hardware features from guests.
+
+Important files:
+
+* Hypervisor
+    * `xen/arch/x86/cpu/*.c`
+    * `xen/arch/x86/cpuid.c`
+    * `xen/include/asm-x86/cpuid-autogen.h`
+    * `xen/include/public/arch-x86/cpufeatureset.h`
+    * `xen/tools/gen-cpuid.py`
+* `libxc`
+    * `tools/libxc/xc_cpuid_x86.c`
+
+## Ability to control CPUID
+
+### HVM
+
+HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen
+on all `CPUID` instructions, allowing Xen full control over all information.
+
+### PV
+
+The `CPUID` instruction is unprivileged, so executing it in a PV guest will
+not trap, leaving Xen no direct ability to control the information returned.
+
+### Xen Forced Emulation Prefix
+
+Xen-aware PV software can make use of the 'Forced Emulation Prefix'
+
+> `ud2a; .ascii 'xen'; cpuid`
+
+which Xen recognises as a deliberate attempt to get the fully-controlled
+`CPUID` information rather than the hardware-reported information.  This only
+works with cooperative software.
+
+### Masking and Override MSRs
+
+AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow
+direct control of the values returned for certain `CPUID` leaves.  These MSRs
+allow any result to be returned, including the ability to advertise features
+which are not actually supported.
+
+Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of
+_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID`
+instructions requesting specific feature bitmap sets.  The exact MSRs, and
+which feature bitmap sets they affect are hardware specific.  These MSRs allow
+features to be hidden by clearing the appropriate bit in the mask, but does
+not allow unsupported features to be advertised.
+
+### CPUID Faulting
+
+Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to
+cause `CPUID` instruction executed in PV guests to fault.  This allows Xen
+full control over all information, exactly like HVM guests.
+
+## Compile time
+
+As some features depend on other features, it is important that, when
+disabling a certain feature, we disable all features which depend on it.  This
+allows runtime logic to be simplified, by being able to rely on testing only
+the single appropriate feature, rather than the entire feature dependency
+chain.
+
+To speed up runtime calculation of feature dependencies, the dependency chain
+is calculated and flattened by `xen/tools/gen-cpuid.py` to create
+`xen/include/asm-x86/cpuid-autogen.h` from
+`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to
+disable all dependent features of a specific disabled feature in constant
+time.
+
+## Host boot
+
+As Xen boots, it will enumerate the features it can see.  This is stored as
+the _raw\_featureset_.
+
+Errata checks and command line arguments are then taken into account to reduce
+the _raw\_featureset_ into the _host\_featureset_, which is the set of
+features Xen uses.  On hardware with masking/override MSRs, the default MSR
+values are picked from the _host\_featureset_.
+
+The _host\_featureset_ is then used to calculate the _pv\_featureset_ and
+_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer
+to PV and HVM guests respectively.
+
+In addition, Xen will calculate how much control it has over non-cooperative
+PV `CPUID` instructions, storing this information as _levelling\_caps_.
+
+## Domain creation
+
+The toolstack can query each of the calculated featureset via
+`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via
+`XEN_SYSCTL_get_cpu_levelling_caps`.
+
+These data should be used by the toolstack when choosing the eventual
+featureset to offer to the guest.
+
+Once a featureset has been chosen, it is set (implicitly or explicitly) via
+`XEN_DOMCTL_set_cpuid`.  Xen will clamp the toolstacks choice to the
+appropriate PV or HVM featureset.  On hardware with masking/override MSRs, the
+guest cpuid policy is reflected in the MSRs, which are context switched with
+other vcpu state.
+
+# Limitations
+
+A guest which ignores the provided feature information and manually probes for
+features will be able to find some of them.  e.g. There is no way of forcibly
+preventing a guest from using 1GB superpages if the hardware supports it.
+
+Some information simply cannot be hidden from guests.  There is no way to
+control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception
+behaviour.
+
+
+# Testing
+
+Feature levelling is a very wide area, and used all over the hypervisor.
+Please ask on xen-devel for help identifying more specific tests which could
+be of use.
+
+
+# Known issues / Areas for improvement
+
+Xen currently has no concept of per-{socket,core,thread} CPUID information.
+As a result, details such as APIC IDs, topology and cache information do not
+match real hardware, and do not match the documented expectations in the Intel
+and AMD system manuals.
+
+The CPU feature flags are the only information which the toolstack has a
+sensible interface for querying and levelling.  Other information in the CPUID
+policy is important and should be levelled (e.g. maxphysaddr).
+
+The CPUID policy is currently regenerated from scratch by the receiving side,
+once memory and vcpu content has been restored.  This means that the receiving
+Xen cannot verify the memory/vcpu content against the CPUID policy, and can
+end up running a guest which will subsequently crash.  The CPUID policy should
+be at the head of the migration stream.
+
+MSRs are another source of features for guests.  There is no general provision
+for controlling the available MSRs.  E.g. 64bit versions of Windows notice
+changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure
+Corruption)
+
+
+# References
+
+[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf)
+
+[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf)
+
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2016-05-31 1        Xen 4.7  Document written
+---------- -------- -------- -------------------------------------------
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

             reply	other threads:[~2016-05-31 17:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-31 17:05 Andrew Cooper [this message]
2016-06-01  9:29 ` [PATCH for-4.7] docs: Feature Levelling feature document Jan Beulich
2016-06-03 15:36   ` Andrew Cooper
2016-06-03 15:42     ` Jan Beulich
2016-06-03 15:53       ` Andrew Cooper
2016-06-01  9:41 ` Wei Liu
2016-06-01 10:25 ` Ian Jackson
2016-06-01 12:05   ` Andrew Cooper
2016-06-01 12:14     ` Ian Jackson
2016-06-01 13:11       ` Andrew Cooper
2016-06-03 14:59         ` [PATCH v2 " Ian Jackson
2016-06-03 15:35           ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464714345-26571-1-git-send-email-andrew.cooper3@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).