From: Keir Fraser <keir@xen.org>
To: "Li, Xin" <xin.li@intel.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH v2] Enable SMEP CPU feature support for XEN hypervisor
Date: Fri, 03 Jun 2011 20:22:20 +0100 [thread overview]
Message-ID: <CA0EF2FD.2E477%keir@xen.org> (raw)
In-Reply-To: <FC2FB65B4D919844ADE4BE3C2BB739AD5AB9ED09@shsmsx501.ccr.corp.intel.com>
[-- Attachment #1: Type: text/plain, Size: 2896 bytes --]
Hi Xen,
New patch attached and comments in-line.
On 03/06/2011 19:49, "Li, Xin" <xin.li@intel.com> wrote:
>> 1. The initialisation in cpu/common.c is misguided. Features like
>> set_in_cr4(), setup_clear_cpu_cap(), and cpu_has_xxx, are designed to allow
>> the feature initialisation to happen close in code to where the feature is
>> used. Hence I have moved the initialisation into traps.c:trap_init().
>
> It's the right way to move.
>
> But trap_init() is called before Xen does leaf 7.0 detection, thus we also
> need to add leaf 7.0 detection in early_cpu_detect to get
> boot_cpu_data.x86_capability[7] initialized before trap_init().
Oooo good point. Fixed in the attached patch by moving SMEP setup into
setup.c, explicitly and immediately after identify_cpu(). I was actually in
two minds whether to fix this by extending early_cpu_detect(). Overall I
decided I don't want to grow e_c_d() if not absolutely necessary. If the CPU
setup stuff in setup.c grows too big and ugly I'd rather refactor it another
way.
>> 3. I have pushed interpretation of the pf_type enumeration out to the
>> callers of spurious_page_fault(). This is because a SMEP fault while Xen is
>> executing should *crash Xen*. So that's what I do. Further, when it is a
>
> I'm still wondering is it overkill to kill Xen? An evil guest can crash Xen?
An evil guest has to penetrate Xen before it can crash it in this way. If
Xen has been subverted, and is lucky enough to notice, what should it do?
The only sane answer is to shoot itself in the head. This kind of issue
would require immediate developer attention to fix whatever Xen bug had been
exploited to trigger SMEP.
> 32bit pv guest should be able to make use of SMEP. When it is from Xen,
> we crash Xen. When it's is from guest kernel executing user code, we
> can inject to guest to let it kill the current process. Of course such cases
> the guest should be able to do SMEP handling.
Haha, give over on this idea that unexplainable behaviour should make you
only crash the process/guest. If your behaviour is unexplainable, and you
have pretensions of security, you fail-stop.
> We can't consistently handle it for 64bit and 32bit guest.
Well yeah, but that ignores my actual question, which was...
"""I wonder whether SMEP should be enabled only for guests (even PV guests)
which detect it via CPUID and proactively enable it via their virtualised
CR4? I mean, it is off in real hardware by default for a reason: backward
compatibility. Furthermore, we only detect spurious page faults for buggy
old PV guests, the rest will get the SMEP fault punted up to them, which
seems odd if they don't understand SMEP."""
I mean, I know we may as well just hide the feature from PV 64b guests
totally. That's obvious. Let's stop talking about PV 64b guests already! The
question is: what to do about PV 32b guests?
-- Keir
> Thanks!
> -Xin
[-- Attachment #2: 00-xen-smep --]
[-- Type: application/octet-stream, Size: 9386 bytes --]
diff -r bcd2476c2e2d xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c Fri Jun 03 17:27:01 2011 +0100
+++ b/xen/arch/x86/setup.c Fri Jun 03 20:09:02 2011 +0100
@@ -57,6 +57,10 @@ integer_param("maxcpus", max_cpus);
static bool_t __initdata opt_watchdog;
boolean_param("watchdog", opt_watchdog);
+/* smep: Enable/disable Supervisor Mode Execution Protection (default on). */
+static bool_t __initdata disable_smep;
+invbool_param("smep", disable_smep);
+
/* **** Linux config option: propagated to domain0. */
/* "acpi=off": Sisables both ACPI table parsing and interpreter. */
/* "acpi=force": Override the disable blacklist. */
@@ -1200,11 +1204,17 @@ void __init __start_xen(unsigned long mb
arch_init_memory();
identify_cpu(&boot_cpu_data);
+
if ( cpu_has_fxsr )
set_in_cr4(X86_CR4_OSFXSR);
if ( cpu_has_xmm )
set_in_cr4(X86_CR4_OSXMMEXCPT);
+ if ( disable_smep )
+ setup_clear_cpu_cap(X86_FEATURE_SMEP);
+ if ( cpu_has_smep )
+ set_in_cr4(X86_CR4_SMEP);
+
local_irq_enable();
#ifdef CONFIG_X86_64
diff -r bcd2476c2e2d xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c Fri Jun 03 17:27:01 2011 +0100
+++ b/xen/arch/x86/traps.c Fri Jun 03 20:09:02 2011 +0100
@@ -1139,7 +1139,13 @@ static int handle_gdt_ldt_mapping_fault(
(((va) >= HYPERVISOR_VIRT_START))
#endif
-static int __spurious_page_fault(
+enum pf_type {
+ real_fault,
+ smep_fault,
+ spurious_fault
+};
+
+static enum pf_type __page_fault_type(
unsigned long addr, unsigned int error_code)
{
unsigned long mfn, cr3 = read_cr3();
@@ -1151,7 +1157,7 @@ static int __spurious_page_fault(
#endif
l2_pgentry_t l2e, *l2t;
l1_pgentry_t l1e, *l1t;
- unsigned int required_flags, disallowed_flags;
+ unsigned int required_flags, disallowed_flags, page_user;
/*
* We do not take spurious page faults in IRQ handlers as we do not
@@ -1159,11 +1165,11 @@ static int __spurious_page_fault(
* map_domain_page() is not IRQ-safe.
*/
if ( in_irq() )
- return 0;
+ return real_fault;
/* Reserved bit violations are never spurious faults. */
if ( error_code & PFEC_reserved_bit )
- return 0;
+ return real_fault;
required_flags = _PAGE_PRESENT;
if ( error_code & PFEC_write_access )
@@ -1175,6 +1181,8 @@ static int __spurious_page_fault(
if ( error_code & PFEC_insn_fetch )
disallowed_flags |= _PAGE_NX_BIT;
+ page_user = _PAGE_USER;
+
mfn = cr3 >> PAGE_SHIFT;
#if CONFIG_PAGING_LEVELS >= 4
@@ -1184,7 +1192,8 @@ static int __spurious_page_fault(
unmap_domain_page(l4t);
if ( ((l4e_get_flags(l4e) & required_flags) != required_flags) ||
(l4e_get_flags(l4e) & disallowed_flags) )
- return 0;
+ return real_fault;
+ page_user &= l4e_get_flags(l4e);
#endif
#if CONFIG_PAGING_LEVELS >= 3
@@ -1197,13 +1206,14 @@ static int __spurious_page_fault(
unmap_domain_page(l3t);
#if CONFIG_PAGING_LEVELS == 3
if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
- return 0;
+ return real_fault;
#else
if ( ((l3e_get_flags(l3e) & required_flags) != required_flags) ||
(l3e_get_flags(l3e) & disallowed_flags) )
- return 0;
+ return real_fault;
+ page_user &= l3e_get_flags(l3e);
if ( l3e_get_flags(l3e) & _PAGE_PSE )
- return 1;
+ goto leaf;
#endif
#endif
@@ -1213,9 +1223,10 @@ static int __spurious_page_fault(
unmap_domain_page(l2t);
if ( ((l2e_get_flags(l2e) & required_flags) != required_flags) ||
(l2e_get_flags(l2e) & disallowed_flags) )
- return 0;
+ return real_fault;
+ page_user &= l2e_get_flags(l2e);
if ( l2e_get_flags(l2e) & _PAGE_PSE )
- return 1;
+ goto leaf;
l1t = map_domain_page(mfn);
l1e = l1e_read_atomic(&l1t[l1_table_offset(addr)]);
@@ -1223,26 +1234,36 @@ static int __spurious_page_fault(
unmap_domain_page(l1t);
if ( ((l1e_get_flags(l1e) & required_flags) != required_flags) ||
(l1e_get_flags(l1e) & disallowed_flags) )
- return 0;
-
- return 1;
+ return real_fault;
+ page_user &= l1e_get_flags(l1e);
+
+leaf:
+ /*
+ * Supervisor Mode Execution Protection (SMEP):
+ * Disallow supervisor execution from user-accessible mappings
+ */
+ if ( (read_cr4() & X86_CR4_SMEP) && page_user &&
+ ((error_code & (PFEC_insn_fetch|PFEC_user_mode)) == PFEC_insn_fetch) )
+ return smep_fault;
+
+ return spurious_fault;
}
-static int spurious_page_fault(
+static enum pf_type spurious_page_fault(
unsigned long addr, unsigned int error_code)
{
unsigned long flags;
- int is_spurious;
+ enum pf_type pf_type;
/*
* Disabling interrupts prevents TLB flushing, and hence prevents
* page tables from becoming invalid under our feet during the walk.
*/
local_irq_save(flags);
- is_spurious = __spurious_page_fault(addr, error_code);
+ pf_type = __page_fault_type(addr, error_code);
local_irq_restore(flags);
- return is_spurious;
+ return pf_type;
}
static int fixup_page_fault(unsigned long addr, struct cpu_user_regs *regs)
@@ -1317,6 +1338,7 @@ asmlinkage void do_page_fault(struct cpu
{
unsigned long addr, fixup;
unsigned int error_code;
+ enum pf_type pf_type;
addr = read_cr2();
@@ -1332,7 +1354,9 @@ asmlinkage void do_page_fault(struct cpu
if ( unlikely(!guest_mode(regs)) )
{
- if ( spurious_page_fault(addr, error_code) )
+ pf_type = spurious_page_fault(addr, error_code);
+ BUG_ON(pf_type == smep_fault);
+ if ( pf_type != real_fault )
return;
if ( likely((fixup = search_exception_table(regs->eip)) != 0) )
@@ -1354,9 +1378,17 @@ asmlinkage void do_page_fault(struct cpu
error_code, _p(addr));
}
- if ( unlikely(current->domain->arch.suppress_spurious_page_faults
- && spurious_page_fault(addr, error_code)) )
- return;
+ if ( unlikely(current->domain->arch.suppress_spurious_page_faults) )
+ {
+ pf_type = spurious_page_fault(addr, error_code);
+ if ( pf_type == smep_fault )
+ {
+ gdprintk(XENLOG_ERR, "Fatal SMEP fault\n");
+ domain_crash(current->domain);
+ }
+ if ( pf_type != real_fault )
+ return;
+ }
propagate_page_fault(addr, regs->error_code);
}
diff -r bcd2476c2e2d xen/include/asm-x86/cpufeature.h
--- a/xen/include/asm-x86/cpufeature.h Fri Jun 03 17:27:01 2011 +0100
+++ b/xen/include/asm-x86/cpufeature.h Fri Jun 03 20:09:02 2011 +0100
@@ -141,8 +141,9 @@
#define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */
#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
+/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 7 */
#define X86_FEATURE_FSGSBASE (7*32+ 0) /* {RD,WR}{FS,GS}BASE instructions */
+#define X86_FEATURE_SMEP (7*32+ 7) /* Supervisor Mode Execution Protection */
#define cpu_has(c, bit) test_bit(bit, (c)->x86_capability)
#define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability)
@@ -202,6 +203,8 @@
#define cpu_has_fsgsbase boot_cpu_has(X86_FEATURE_FSGSBASE)
#endif
+#define cpu_has_smep boot_cpu_has(X86_FEATURE_SMEP)
+
#define cpu_has_ffxsr ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \
&& boot_cpu_has(X86_FEATURE_FFXSR))
diff -r bcd2476c2e2d xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h Fri Jun 03 17:27:01 2011 +0100
+++ b/xen/include/asm-x86/domain.h Fri Jun 03 20:09:02 2011 +0100
@@ -527,12 +527,14 @@ unsigned long pv_guest_cr4_fixup(const s
/* Convert between guest-visible and real CR4 values. */
#define pv_guest_cr4_to_real_cr4(v) \
(((v)->arch.pv_vcpu.ctrlreg[4] \
- | (mmu_cr4_features & (X86_CR4_PGE | X86_CR4_PSE)) \
- | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0) \
- | ((xsave_enabled(v))? X86_CR4_OSXSAVE : 0)) \
- & ~X86_CR4_DE)
-#define real_cr4_to_pv_guest_cr4(c) \
- ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD | X86_CR4_OSXSAVE))
+ | (mmu_cr4_features \
+ & (X86_CR4_PGE | X86_CR4_PSE | X86_CR4_SMEP)) \
+ | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0) \
+ | ((xsave_enabled(v))? X86_CR4_OSXSAVE : 0)) \
+ & ~X86_CR4_DE)
+#define real_cr4_to_pv_guest_cr4(c) \
+ ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD \
+ | X86_CR4_OSXSAVE | X86_CR4_SMEP))
void domain_cpuid(struct domain *d,
unsigned int input,
diff -r bcd2476c2e2d xen/include/asm-x86/processor.h
--- a/xen/include/asm-x86/processor.h Fri Jun 03 17:27:01 2011 +0100
+++ b/xen/include/asm-x86/processor.h Fri Jun 03 20:09:02 2011 +0100
@@ -85,6 +85,7 @@
#define X86_CR4_SMXE 0x4000 /* enable SMX */
#define X86_CR4_FSGSBASE 0x10000 /* enable {rd,wr}{fs,gs}base */
#define X86_CR4_OSXSAVE 0x40000 /* enable XSAVE/XRSTOR */
+#define X86_CR4_SMEP 0x100000/* enable SMEP */
/*
* Trap/fault mnemonics.
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next prev parent reply other threads:[~2011-06-03 19:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-03 13:57 [PATCH v2] Enable SMEP CPU feature support for XEN hypervisor Li, Xin
2011-06-03 17:23 ` Keir Fraser
2011-06-03 18:49 ` Li, Xin
2011-06-03 19:22 ` Keir Fraser [this message]
2011-06-03 19:37 ` Li, Xin
2011-06-03 20:15 ` Keir Fraser
-- strict thread matches above, loose matches on Subject: below --
2011-06-03 14:36 Jan Beulich
2011-06-03 16:18 ` Li, Xin
2011-06-03 17:09 ` Li, Xin
2011-06-05 7:36 Jan Beulich
2011-06-05 8:39 ` Li, Xin
2011-06-05 15:10 ` Keir Fraser
2011-06-05 15:43 ` Li, Xin
2011-06-05 17:04 ` Keir Fraser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA0EF2FD.2E477%keir@xen.org \
--to=keir@xen.org \
--cc=xen-devel@lists.xensource.com \
--cc=xin.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).