From: Haozhong Zhang <haozhong.zhang@intel.com>
To: xen-devel@lists.xen.org
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
Jun Nakajima <jun.nakajima@intel.com>,
Liu Jinsong <jinsong.liu@alibaba-inc.com>,
Christoph Egger <chegger@amazon.de>,
Ian Jackson <ian.jackson@eu.citrix.com>,
Jan Beulich <jbeulich@suse.com>,
Andrew Cooper <andrew.cooper3@citrix.com>
Subject: [PATCH 00/19] MCE code cleanup and add LMCE support
Date: Fri, 17 Feb 2017 14:39:17 +0800 [thread overview]
Message-ID: <20170217063936.13208-1-haozhong.zhang@intel.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 9332 bytes --]
This patch series adds LMCE support to Xen, although more than half
patches are for code cleanup and bug fix.
LMCE
--------------
Intel Local MCE (LMCE) is a feature on Intel Skylake Server CPU that
can deliver MCE to a single processor thread instead of broadcasting
to all threads, which can reduce software's load when processing MCE
on machines with a large number of processor threads.
The technical details of LMCE can be found in Intel SDM Vol 3, Chapter
"Machine-Check Architecture" (search for 'LMCE'). Basically,
* The capability of LMCE is indicated by bit 27 (MCG_LMCE_P) of
MSR_IA32_MCG_CAP.
* LMCE is enabled by setting bit 20 (MSR_IA32_FEATURE_CONTROL_LMCE)
of MSR_IA32_FEATURE_CONTROL and bit 0 (MCG_EXT_CTL_LMCE_EN) of
MSR_IA32_MCG_EXT_CTL.
* Software can determine if a MCE is local to the current processor
thread by checking bit 2 (MCG_STATUS_LMCE) of MSR_IA32_MCG_STATUS.
Patch Overview
--------------
In this patch series,
* Xen enables LMCE by default if it's supported by host CPU unless Xen
boot parameter "mce_fb=1" is present.
* Xen handles LMCE only on the affected CPU and does not need all CPUs
to enter MCE handler.
* A new xl config "lmce=BOOLEAN" is added to control whether LMCE is
supported for the HVM domain. It's disabled by default. If the host
CPU does not support LMCE, this config will be ignored.
* For HVM domain with LMCE support, if the vcpu affected by a host
LMCE is known, Xen will inject a vLMCE to that vcpu. If the affected
vcpu is unknown or LMCE support is disabled for a HVM domain, a MCE
will be broadcast to all vcpus of that domain as before.
This patch series is organized as below:
* Patch 1 - 8 clean up existing MCE code and make one improvement to
debugging messages. No functional change is introduced.
* Patch 9 - 11 fix two bugs in vMCE injection and MCE handling.
* Patch 12 & 13 add host-side LMCE support, including detecting,
enabling LMCE feature and handling LMCE.
* Patch 14 - 17 add guest-side LMCE support (only HVM domain so far),
including emulating LMCE feature and injecting LMCE to HVM domain.
* Patch 18 & 19 add xen-mceinj support to inject LMCE for test purpose.
How to Test
--------------
0. This patch series can be tested either on Intel CPU w/ LMCE support
(Skylake-EX), or in the nested virtualization environment on
KVM/QEMU (i.e. Xen as L1 hypervisor).
QEMU 2.7.0 and later with KVM in Linux kernel 4.8 and later can
emulate LMCE and do not require the host hardware support LMCE. You
can start a nested virtualization environment with LMCE support by
the following command:
qemu-system-x86_64 -enable-kvm \
-smp 32 -cpu kvm64,lmce=on,+vmx \
-hda PATH-TO-DISK-IMG -m 2048
1. Build, install and boot Xen with this patch series. You can include
"mce_verbosity=verbose" in Xen boot parameters to get more detailed
debugging messages about MCE.
2. At boot time, if the Xen boot parameter 'mce_fb=1' is not
present, Xen hypervisor should be able to detect and enable LMCE,
and print the following message:
(XEN) mce_intel.c:737: MCA Capability: BCAST 1 SER 1 CMCI 1 firstbank 0 extended MCE MSR LMCE 1
If 'mce_fb=1' is specified, the last segment of above message will
be "LMCE 0" which indicates Xen does not enable LMCE support.
3. Start a HVM domain with the attached config file xl.cfg. In the
config,
* "lmce = 1" enables LMCE for the domaim.
* "cpus = [ ... ]" is helpful for the following steps to figure
out which CPU should we inject to, and may be not a necessity.
Run Linux kernel 4.2 or later (which has LMCE support) in the
domain.
Run the latest mcelog (https://www.mcelog.org/) in the domain as
well to log MCEs injected in latter steps. Depending on the guest
Linux distro, the log can be in /var/log/mcelog, syslog or systemd
journal.
Compile and run the attached claim_page.c in the domain. claim_page.c
allocates a page of memory, prints its base (guest) physical address
and enters an infinite loop. For example, it may print a message like
Physical address of mmaped page = 0x36d4d000
4. Use "xl vcpu-list" to figure out the cpu number on which
claim_page on is running. For example, xl vcpu-list may output
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
lmce-l2 1 0 4 r-- 546.5 4 / all
lmce-l2 1 1 5 -b- 8.4 5 / all
lmce-l2 1 2 6 -b- 6.4 6 / all
lmce-l2 1 3 7 -b- 6.4 7 / all
As claim_page is the only workload that is actively running in
the domain, CPU 4 (VCPU 0) is very likely the one it's running on.
(You may even want to pin claim_page to a vcpu in guest Linux ... )
5. Use xen-mceinj to inject LMCE:
./xen-mceinj -c 4 -d 1 -p 0x36d4d000 -t 0 -l
^^
inject LMCE
If the injection succeeds, mcelog in the domain should generate the
log like
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 1 TSC 2218fdf1380
^^^^^
vcpu0 receives MCE
RIP !INEXACT! 10:ffffffff810591e7
MISC 86 ADDR 36d4d000
^^^^^^^^^^^^^
error address
TIME 1487302866 Fri Feb 17 11:41:06 2017
MCG status:RIPV MCIP LMCE
^^^^
LMCE is injected
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
SRAO
MCA: Generic CACHE Level-2 Eviction Error
STATUS bd2000000000017a MCGSTATUS d
MCGCAP 9000c02 APICID 1 SOCKETID 0
CPUID Vendor Intel Family 6 Model 79
Haozhong Zhang (19):
01/19 x86/mce: fix indentation style in xen-mca.h and mce.h
02/19 x86/mce: remove declarations of non-existing functions in mce.h
03/19 x86/mce: remove unnecessary braces around intel_get_extended_msrs()
04/19 xen/mce: remove unused x86_mcinfo_add()
05/19 x86/mce: merge loops to get Intel extended MC MSR
06/19 x86/mce: merge intel_default_mce_dhandler/uhandler()
07/19 x86/vmce: include domain/vcpu id in debug messages
08/19 x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()
09/19 x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case
10/19 x86/mce: always write 0 to MSR_IA32_MCG_STATUS on Intel CPU
11/19 tools/xen-mceinj: fix the type of cpu number
12/19 x86/mce: handle LMCE locally
13/19 x86/mce_intel: detect and enable LMCE on Intel host
14/19 x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL
15/19 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
16/19 x86/vmce: enable injecting LMCE to guest on Intel host
17/19 x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
18/19 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
19/19 tools/xen-mceinj: support injecting LMCE
docs/man/xl.cfg.pod.5.in | 18 ++++
tools/libxc/include/xenctrl.h | 1 +
tools/libxc/xc_misc.c | 25 ++++++
tools/libxl/libxl_create.c | 1 +
tools/libxl/libxl_dom.c | 2 +
tools/libxl/libxl_types.idl | 1 +
tools/libxl/xl_cmdimpl.c | 3 +
tools/tests/mce-test/tools/xen-mceinj.c | 70 +++++++++++++--
xen/arch/x86/cpu/mcheck/barrier.c | 4 +-
xen/arch/x86/cpu/mcheck/mcaction.c | 20 +++--
xen/arch/x86/cpu/mcheck/mce.c | 87 +++++++++++-------
xen/arch/x86/cpu/mcheck/mce.h | 51 ++++++-----
xen/arch/x86/cpu/mcheck/mce_amd.c | 4 +-
xen/arch/x86/cpu/mcheck/mce_intel.c | 86 ++++++++++--------
xen/arch/x86/cpu/mcheck/vmce.c | 153 ++++++++++++++++++++++++--------
xen/arch/x86/cpu/mcheck/vmce.h | 2 +-
xen/arch/x86/cpu/mcheck/x86_mca.h | 9 +-
xen/arch/x86/hvm/hvm.c | 7 ++
xen/arch/x86/hvm/vmx/vmx.c | 10 +++
xen/arch/x86/hvm/vmx/vvmx.c | 4 -
xen/include/asm-x86/mce.h | 3 +
xen/include/asm-x86/msr-index.h | 2 +
xen/include/public/arch-x86/hvm/save.h | 2 +
xen/include/public/arch-x86/xen-mca.h | 25 +++---
xen/include/public/hvm/params.h | 5 +-
25 files changed, 420 insertions(+), 175 deletions(-)
--
2.10.1
[-- Attachment #2: claim_page.c --]
[-- Type: text/x-csrc, Size: 1998 bytes --]
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
struct pagemaps {
unsigned long long pfn:55;
unsigned long long shift:6;
unsigned long long rsvd:1;
unsigned long long swapped:1;
unsigned long long present:1;
};
static int translate_va2pa(uint64_t va, uint64_t pagesize, uint64_t *pa)
{
int rc = 0;
static const char *pagemap_file = "/proc/self/pagemap";
struct pagemaps pinfo;
uint64_t pinfo_size = sizeof(pinfo);
uint64_t offset = va / pagesize * pinfo_size;
int fd = open(pagemap_file, O_RDONLY);
if (fd == -1) {
rc = errno;
fprintf(stderr, "Failed to open %s: %s\n", pagemap_file, strerror(rc));
goto ret;
}
if (pread(fd, (void *) &pinfo, pinfo_size, offset) != pinfo_size) {
rc = errno;
fprintf(stderr, "Failed to read offset 0x%"PRIx64": %s\n",
offset, strerror(rc));
goto ret_close;
}
*pa = (pinfo.pfn * pagesize) | (va & (pagesize - 1));
ret_close:
close(fd);
ret:
return rc;
}
int main(int argc, char **argv)
{
void *buf;
uint64_t buf_pa;
int pagesize = getpagesize();
int rc = 0;
buf = mmap(NULL, pagesize,
PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (buf == MAP_FAILED) {
rc = errno;
fprintf(stderr, "Failed to mmap a page: %s\n", strerror(rc));
goto ret;
}
memset(buf, 0xcc, pagesize);
rc = translate_va2pa((uint64_t) buf, pagesize, &buf_pa);
if (rc || !buf_pa) {
fprintf(stderr, "Failed to get physical address of mmaped page\n");
goto ret_unmap;
}
fprintf(stderr, "Physical address of mmaped page = 0x%"PRIx64"\n", buf_pa);
volatile int i = 1;
while (i++);
ret_unmap:
munmap(buf, pagesize);
ret:
return rc;
}
[-- Attachment #3: xl.cfg --]
[-- Type: text/plain, Size: 322 bytes --]
builder = "hvm"
name = "lmce-l2"
vcpus = 4
memory = 1024
disk = [ '/dev/vdb,raw,xvda,rw' ]
cpus = [ "4", "5", "6", "7" ]
lmce = 1
device_model_override = '/usr/local/lib/xen/bin/qemu-system-i386'
device_model_version = 'qemu-xen'
sdl = 0
vnc = 1
vnclisten='0.0.0.0'
stdvga = 1
serial = 'pty'
[-- Attachment #4: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next reply other threads:[~2017-02-17 6:39 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-17 6:39 Haozhong Zhang [this message]
2017-02-17 6:39 ` [PATCH 01/19] x86/mce: fix indentation style in xen-mca.h and mce.h Haozhong Zhang
2017-02-17 9:49 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 02/19] x86/mce: remove declarations of non-existing functions in mce.h Haozhong Zhang
2017-02-17 9:50 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 03/19] x86/mce: remove unnecessary braces around intel_get_extended_msrs() Haozhong Zhang
2017-02-17 9:51 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 04/19] xen/mce: remove unused x86_mcinfo_add() Haozhong Zhang
2017-02-17 9:55 ` Jan Beulich
2017-02-20 1:52 ` Haozhong Zhang
2017-02-20 9:00 ` Jan Beulich
2017-02-20 9:10 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 05/19] x86/mce: merge loops to get Intel extended MC MSR Haozhong Zhang
2017-02-17 9:58 ` Jan Beulich
2017-02-20 1:11 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 06/19] x86/mce: merge intel_default_mce_dhandler/uhandler() Haozhong Zhang
2017-02-17 10:01 ` Jan Beulich
2017-02-20 2:40 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 07/19] x86/vmce: include domain/vcpu id in debug messages Haozhong Zhang
2017-02-17 10:03 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 08/19] x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve() Haozhong Zhang
2017-02-17 10:07 ` Jan Beulich
2017-02-20 2:48 ` Haozhong Zhang
2017-02-20 9:02 ` Jan Beulich
2017-02-20 9:11 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 09/19] x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case Haozhong Zhang
2017-02-17 10:21 ` Jan Beulich
2017-02-20 4:36 ` Haozhong Zhang
2017-02-20 9:04 ` Jan Beulich
2017-02-20 9:12 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 10/19] x86/mce: always write 0 to MSR_IA32_MCG_STATUS on Intel CPU Haozhong Zhang
2017-02-17 10:26 ` Jan Beulich
2017-02-17 15:01 ` Boris Ostrovsky
2017-02-17 15:13 ` Jan Beulich
2017-02-17 15:38 ` Boris Ostrovsky
2017-02-17 6:39 ` [PATCH 11/19] tools/xen-mceinj: fix the type of cpu number Haozhong Zhang
2017-02-17 10:08 ` Jan Beulich
2017-02-20 2:49 ` Haozhong Zhang
2017-02-20 12:29 ` Wei Liu
2017-02-17 6:39 ` [PATCH 12/19] x86/mce: handle LMCE locally Haozhong Zhang
2017-02-22 13:53 ` Jan Beulich
2017-02-23 3:06 ` Haozhong Zhang
2017-02-23 7:42 ` Jan Beulich
2017-02-23 8:38 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 13/19] x86/mce_intel: detect and enable LMCE on Intel host Haozhong Zhang
2017-02-22 15:10 ` Jan Beulich
2017-02-23 3:16 ` Haozhong Zhang
2017-02-23 7:45 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 14/19] x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
2017-02-22 15:20 ` Jan Beulich
2017-02-23 4:10 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 15/19] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL Haozhong Zhang
2017-02-22 15:36 ` Jan Beulich
2017-02-23 4:26 ` Haozhong Zhang
2017-02-23 7:53 ` Jan Beulich
2017-02-23 8:54 ` Haozhong Zhang
2017-02-23 9:04 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 16/19] x86/vmce: enable injecting LMCE to guest on Intel host Haozhong Zhang
2017-02-22 15:48 ` Jan Beulich
2017-02-23 4:48 ` Haozhong Zhang
2017-02-23 8:21 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 17/19] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP Haozhong Zhang
2017-02-20 12:32 ` Wei Liu
2017-02-20 12:38 ` Jan Beulich
2017-02-20 14:12 ` Wei Liu
2017-02-20 23:55 ` Haozhong Zhang
2017-02-22 15:55 ` Jan Beulich
2017-02-23 5:07 ` Haozhong Zhang
2017-02-17 6:39 ` [PATCH 18/19] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 Haozhong Zhang
2017-02-22 15:59 ` Jan Beulich
2017-02-23 5:14 ` Haozhong Zhang
2017-02-23 8:26 ` Jan Beulich
2017-02-23 9:14 ` Haozhong Zhang
2017-02-23 9:22 ` Jan Beulich
2017-02-17 6:39 ` [PATCH 19/19] tools/xen-mceinj: support injecting LMCE Haozhong Zhang
2017-02-20 12:53 ` Wei Liu
2017-02-20 23:50 ` Haozhong Zhang
2017-02-21 9:18 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170217063936.13208-1-haozhong.zhang@intel.com \
--to=haozhong.zhang@intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=chegger@amazon.de \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=jinsong.liu@alibaba-inc.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).