xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Cc: Juergen Gross <jgross@suse.com>,
	andrew.cooper3@citrix.com, tim@xen.org, jbeulich@suse.com
Subject: [PATCH v9 9/9] xen/x86: use PCID feature
Date: Thu, 26 Apr 2018 13:33:18 +0200	[thread overview]
Message-ID: <20180426113318.21838-10-jgross@suse.com> (raw)
In-Reply-To: <20180426113318.21838-1-jgross@suse.com>

Avoid flushing the complete TLB when switching %cr3 for mitigation of
Meltdown by using the PCID feature if available.

We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
2 values for the non-XPTI case:

- guest active and in kernel mode
- guest active and in user mode
- hypervisor active and guest in user mode (XPTI only)
- hypervisor active and guest in kernel mode (XPTI only)

We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
we disable global pages in cr4. A command line parameter controls in
which cases PCID is being used.

As the non-XPTI case has shown not to perform better with PCID at least
on some machines the default is to use PCID only for domains subject to
XPTI.

With PCID enabled we always disable global pages. This avoids having to
either flush the complete TLB or do a cycle through all PCID values
when invalidating a single global page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V6.1:
- address some minor comments (Jan Beulich)

V6:
- split off pv_guest_cr4_to_real_cr4() conversion to function into new
  patch (Andrew Cooper)
- changed some comments (Jan Beulich, Andrew Cooper)

V5:
- use X86_CR3_ADDR_MASK instead of ~X86_CR3_PCID_MASK (Jan Beulich)
- add some const qualifiers (Jan Beulich)
- mask X86_CR3_ADDR_MASK with PADDR_MASK (Jan Beulich)
- add flushing the TLB from old PCID related entries in write_cr3_cr4()
  (Jan Beulich)

V4:
- add cr3 mask for page table address and use that in dbg_pv_va2mfn()
  (Jan Beulich)
- use invpcid_flush_all_nonglobals() instead of invpcid_flush_all()
  (Jan Beulich)
- use PCIDs 0/1 when running in Xen or without XPTI, 2/3 with XPTI in
  guest (Jan Beulich)
- ASSERT cr4.pge and cr4.pcide are never active at the same time
  (Jan Beulich)
- make pv_guest_cr4_to_real_cr4() a real function

V3:
- support PCID for non-XPTI case, too
- add command line parameter for controlling usage of PCID
- check PCID active by using cr4.pcide (Jan Beulich)
---
 docs/misc/xen-command-line.markdown | 14 +++++++
 xen/arch/x86/flushtlb.c             | 47 ++++++++++++++++++++-
 xen/arch/x86/mm.c                   | 16 +++++++-
 xen/arch/x86/pv/dom0_build.c        |  3 +-
 xen/arch/x86/pv/domain.c            | 81 ++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/domain.h        |  4 +-
 xen/include/asm-x86/processor.h     |  3 ++
 xen/include/asm-x86/pv/domain.h     | 31 ++++++++++++++
 8 files changed, 192 insertions(+), 7 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index f8264d8abb..e38230d09f 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1451,6 +1451,20 @@ All numbers specified must be hexadecimal ones.
 
 This option can be specified more than once (up to 8 times at present).
 
+### pcid (x86)
+> `= <boolean> | xpti=<bool>`
+
+> Default: `xpti`
+
+> Can be modified at runtime (change takes effect only for domains created
+  afterwards)
+
+If available, control usage of the PCID feature of the processor for
+64-bit pv-domains. PCID can be used either for no domain at all (`false`),
+for all of them (`true`), only for those subject to XPTI (`xpti`) or for
+those not subject to XPTI (`no-xpti`). The feature is used only in case
+INVPCID is supported and not disabled via `invpcid=false`.
+
 ### ple\_gap
 > `= <integer>`
 
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 59884e7989..797c5d52cc 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -13,6 +13,7 @@
 #include <asm/flushtlb.h>
 #include <asm/invpcid.h>
 #include <asm/page.h>
+#include <asm/pv/domain.h>
 
 /* Debug builds: Wrap frequently to stress-test the wrap logic. */
 #ifdef NDEBUG
@@ -94,6 +95,7 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
     unsigned long flags, old_cr4;
     u32 t;
+    unsigned long old_pcid = cr3_pcid(read_cr3());
 
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
@@ -103,14 +105,34 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     old_cr4 = read_cr4();
     if ( old_cr4 & X86_CR4_PGE )
     {
+        /*
+         * X86_CR4_PGE set means PCID is inactive.
+         * We have to purge the TLB via flipping cr4.pge.
+         */
         old_cr4 = cr4 & ~X86_CR4_PGE;
         write_cr4(old_cr4);
     }
+    else if ( use_invpcid )
+        /*
+         * Flushing the TLB via INVPCID is necessary only in case PCIDs are
+         * in use, which is true only with INVPCID being available.
+         * Without PCID usage the following write_cr3() will purge the TLB
+         * (we are in the cr4.pge off path) of all entries.
+         * Using invpcid_flush_all_nonglobals() seems to be faster than
+         * invpcid_flush_all(), so use that.
+         */
+        invpcid_flush_all_nonglobals();
 
     write_cr3(cr3);
 
     if ( old_cr4 != cr4 )
         write_cr4(cr4);
+    else if ( old_pcid != cr3_pcid(cr3) )
+        /*
+         * Make sure no TLB entries related to the old PCID created between
+         * flushing the TLB and writing the new %cr3 value remain in the TLB.
+         */
+        invpcid_flush_single_context(old_pcid);
 
     post_flush(t);
 
@@ -140,8 +162,29 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
              * are various errata surrounding INVLPG usage on superpages, and
              * a full flush is in any case not *that* expensive.
              */
-            asm volatile ( "invlpg %0"
-                           : : "m" (*(const char *)(va)) : "memory" );
+            if ( read_cr4() & X86_CR4_PCIDE )
+            {
+                unsigned long addr = (unsigned long)va;
+
+                /*
+                 * Flush the addresses for all potential address spaces.
+                 * We can't check the current domain for being subject to
+                 * XPTI as current might be the idle vcpu while we still have
+                 * some XPTI domain TLB entries.
+                 * Using invpcid is okay here, as with PCID enabled we always
+                 * have global pages disabled.
+                 */
+                invpcid_flush_one(PCID_PV_PRIV, addr);
+                invpcid_flush_one(PCID_PV_USER, addr);
+                if ( !cpu_has_no_xpti )
+                {
+                    invpcid_flush_one(PCID_PV_PRIV | PCID_PV_XPTI, addr);
+                    invpcid_flush_one(PCID_PV_USER | PCID_PV_XPTI, addr);
+                }
+            }
+            else
+                asm volatile ( "invlpg %0"
+                               : : "m" (*(const char *)(va)) : "memory" );
         }
         else
             do_tlb_flush();
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4f878c8dd1..f73f43edc9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -126,6 +126,7 @@
 #include <asm/hvm/ioreq.h>
 
 #include <asm/hvm/grant_table.h>
+#include <asm/pv/domain.h>
 #include <asm/pv/grant_table.h>
 #include <asm/pv/mm.h>
 
@@ -497,7 +498,11 @@ void free_shared_domheap_page(struct page_info *page)
 
 void make_cr3(struct vcpu *v, mfn_t mfn)
 {
+    struct domain *d = v->domain;
+
     v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
+    if ( is_pv_domain(d) && d->arch.pv_domain.pcid )
+        v->arch.cr3 |= get_pcid_bits(v, false);
 }
 
 unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v)
@@ -508,7 +513,12 @@ unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v)
     cr4 = v->arch.pv_vcpu.ctrlreg[4] & ~X86_CR4_DE;
     cr4 |= mmu_cr4_features & (X86_CR4_PSE | X86_CR4_SMEP | X86_CR4_SMAP |
                                X86_CR4_OSXSAVE | X86_CR4_FSGSBASE);
-    cr4 |= d->arch.pv_domain.xpti  ? 0 : X86_CR4_PGE;
+
+    if ( d->arch.pv_domain.pcid )
+        cr4 |= X86_CR4_PCIDE;
+    else if ( !d->arch.pv_domain.xpti )
+        cr4 |= X86_CR4_PGE;
+
     cr4 |= d->arch.vtsc ? X86_CR4_TSD : 0;
 
     return cr4;
@@ -521,12 +531,14 @@ void write_ptbase(struct vcpu *v)
 
     new_cr4 = (is_pv_vcpu(v) && !is_idle_vcpu(v))
               ? pv_guest_cr4_to_real_cr4(v)
-              : ((read_cr4() & ~X86_CR4_TSD) | X86_CR4_PGE);
+              : ((read_cr4() & ~(X86_CR4_PCIDE | X86_CR4_TSD)) | X86_CR4_PGE);
 
     if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti )
     {
         cpu_info->root_pgt_changed = true;
         cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
+        if ( new_cr4 & X86_CR4_PCIDE )
+            cpu_info->pv_cr3 |= get_pcid_bits(v, true);
         switch_cr3_cr4(v->arch.cr3, new_cr4);
     }
     else
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 4465a059a8..34c77bcbe4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -388,6 +388,7 @@ int __init dom0_construct_pv(struct domain *d,
     {
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 1;
         d->arch.pv_domain.xpti = false;
+        d->arch.pv_domain.pcid = false;
         v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
         if ( setup_compat_arg_xlat(v) != 0 )
             BUG();
@@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
         update_cr3(v);
 
     /* We run on dom0's page tables for the final part of the build process. */
-    switch_cr3_cr4(v->arch.cr3, read_cr4());
+    switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4());
     mapcache_override_current(v);
 
     /* Copy the OS image and free temporary buffer. */
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index ce1a1a9d35..a4f0bd239d 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -9,9 +9,54 @@
 #include <xen/lib.h>
 #include <xen/sched.h>
 
+#include <asm/cpufeature.h>
+#include <asm/invpcid.h>
 #include <asm/spec_ctrl.h>
 #include <asm/pv/domain.h>
 
+static __read_mostly enum {
+    PCID_OFF,
+    PCID_ALL,
+    PCID_XPTI,
+    PCID_NOXPTI
+} opt_pcid = PCID_XPTI;
+
+static __init int parse_pcid(const char *s)
+{
+    int rc = 0;
+
+    switch ( parse_bool(s, NULL) )
+    {
+    case 0:
+        opt_pcid = PCID_OFF;
+        break;
+
+    case 1:
+        opt_pcid = PCID_ALL;
+        break;
+
+    default:
+        switch ( parse_boolean("xpti", s, NULL) )
+        {
+        case 0:
+            opt_pcid = PCID_NOXPTI;
+            break;
+
+        case 1:
+            opt_pcid = PCID_XPTI;
+            break;
+
+        default:
+            rc = -EINVAL;
+            break;
+        }
+        break;
+    }
+
+    return rc;
+}
+custom_runtime_param("pcid", parse_pcid);
+
 static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
@@ -77,6 +122,7 @@ int switch_compat(struct domain *d)
     d->arch.x87_fip_width = 4;
 
     d->arch.pv_domain.xpti = false;
+    d->arch.pv_domain.pcid = false;
 
     return 0;
 
@@ -211,6 +257,29 @@ int pv_domain_initialise(struct domain *d)
     d->arch.pv_domain.xpti = opt_xpti & (is_hardware_domain(d)
                                          ? OPT_XPTI_DOM0 : OPT_XPTI_DOMU);
 
+    if ( !is_pv_32bit_domain(d) && use_invpcid && cpu_has_pcid )
+        switch ( opt_pcid )
+        {
+        case PCID_OFF:
+            break;
+
+        case PCID_ALL:
+            d->arch.pv_domain.pcid = true;
+            break;
+
+        case PCID_XPTI:
+            d->arch.pv_domain.pcid = d->arch.pv_domain.xpti;
+            break;
+
+        case PCID_NOXPTI:
+            d->arch.pv_domain.pcid = !d->arch.pv_domain.xpti;
+            break;
+
+        default:
+            ASSERT_UNREACHABLE();
+            break;
+        }
+
     return 0;
 
   fail:
@@ -221,9 +290,19 @@ int pv_domain_initialise(struct domain *d)
 
 static void _toggle_guest_pt(struct vcpu *v)
 {
+    const struct domain *d = v->domain;
+
     v->arch.flags ^= TF_kernel_mode;
     update_cr3(v);
-    get_cpu_info()->root_pgt_changed = true;
+    if ( d->arch.pv_domain.xpti )
+    {
+        struct cpu_info *cpu_info = get_cpu_info();
+
+        cpu_info->root_pgt_changed = true;
+        cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)) |
+                           (d->arch.pv_domain.pcid
+                            ? get_pcid_bits(v, true) : 0);
+    }
 
     /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
     write_cr3(v->arch.cr3);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 9627058cd0..8b66096e7f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -255,6 +255,8 @@ struct pv_domain
 
     /* XPTI active? */
     bool xpti;
+    /* Use PCID feature? */
+    bool pcid;
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
@@ -620,7 +622,7 @@ unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v);
 #define real_cr4_to_pv_guest_cr4(c)                         \
     ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD |      \
              X86_CR4_OSXSAVE | X86_CR4_SMEP |               \
-             X86_CR4_FSGSBASE | X86_CR4_SMAP))
+             X86_CR4_FSGSBASE | X86_CR4_SMAP | X86_CR4_PCIDE))
 
 #define domain_max_vcpus(d) (is_hvm_domain(d) ? HVM_MAX_VCPUS : MAX_VIRT_CPUS)
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 36628459dc..c4aa385a6f 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -305,6 +305,9 @@ static inline unsigned long read_cr4(void)
 
 static inline void write_cr4(unsigned long val)
 {
+    /* No global pages in case of PCIDs enabled! */
+    ASSERT(!(val & X86_CR4_PGE) || !(val & X86_CR4_PCIDE));
+
     get_cpu_info()->cr4 = val;
     asm volatile ( "mov %0,%%cr4" : : "r" (val) );
 }
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index 5e34176939..4fea76444a 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -21,6 +21,37 @@
 #ifndef __X86_PV_DOMAIN_H__
 #define __X86_PV_DOMAIN_H__
 
+/*
+ * PCID values for the address spaces of 64-bit pv domains:
+ *
+ * We are using 4 PCID values for a 64 bit pv domain subject to XPTI:
+ * - hypervisor active and guest in kernel mode   PCID 0
+ * - hypervisor active and guest in user mode     PCID 1
+ * - guest active and in kernel mode              PCID 2
+ * - guest active and in user mode                PCID 3
+ *
+ * Without XPTI only 2 values are used:
+ * - guest in kernel mode                         PCID 0
+ * - guest in user mode                           PCID 1
+ */
+
+#define PCID_PV_PRIV      0x0000    /* Used for other domains, too. */
+#define PCID_PV_USER      0x0001
+#define PCID_PV_XPTI      0x0002    /* To be ORed to above values. */
+
+/*
+ * Return additional PCID specific cr3 bits.
+ *
+ * Note that X86_CR3_NOFLUSH will not be readable in cr3. Anyone consuming
+ * v->arch.cr3 should mask away X86_CR3_NOFLUSH and X86_CR3_PCIDMASK in case
+ * the value is used to address the root page table.
+ */
+static inline unsigned long get_pcid_bits(const struct vcpu *v, bool is_xpti)
+{
+    return X86_CR3_NOFLUSH | (is_xpti ? PCID_PV_XPTI : 0) |
+           ((v->arch.flags & TF_kernel_mode) ? PCID_PV_PRIV : PCID_PV_USER);
+}
+
 #ifdef CONFIG_PV
 
 void pv_vcpu_destroy(struct vcpu *v);
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2018-04-26 11:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-26 11:33 [PATCH v9 0/9] xen/x86: various XPTI speedups Juergen Gross
2018-04-26 11:33 ` [PATCH v9 1/9] x86/xpti: avoid copying L4 page table contents when possible Juergen Gross
2018-04-26 14:01   ` Tim Deegan
2018-04-26 11:33 ` [PATCH v9 2/9] xen/x86: add a function for modifying cr3 Juergen Gross
2018-04-26 11:33 ` [PATCH v9 3/9] xen/x86: support per-domain flag for xpti Juergen Gross
2018-04-27  7:55   ` Sergey Dyasli
2018-04-27  7:59     ` Juergen Gross
2018-04-27  8:15       ` Jan Beulich
2018-05-04 15:06         ` Wei Liu
2018-04-26 11:33 ` [PATCH v9 4/9] xen/x86: use invpcid for flushing the TLB Juergen Gross
2018-04-26 11:33 ` [PATCH v9 5/9] xen/x86: disable global pages for domains with XPTI active Juergen Gross
2018-04-26 11:33 ` [PATCH v9 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid Juergen Gross
2018-04-26 11:33 ` [PATCH v9 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function Juergen Gross
2018-04-26 11:33 ` [PATCH v9 8/9] xen/x86: add some cr3 helpers Juergen Gross
2018-04-26 11:33 ` Juergen Gross [this message]
2018-05-01  9:28 ` [PATCH v9 0/9] xen/x86: various XPTI speedups Andrew Cooper
2018-05-02 10:38   ` Juergen Gross
2018-05-03 17:41     ` Andrew Cooper
2018-05-03 18:41       ` Juergen Gross
2018-05-04 14:59 ` Wei Liu
2018-05-16  9:06 ` backporting considerations (Re: [PATCH v9 0/9] xen/x86: various XPTI speedups) Jan Beulich
2018-05-16 13:18   ` George Dunlap
2018-05-16 14:01     ` Jan Beulich
2018-05-16 14:53       ` George Dunlap
2018-05-16 16:01         ` Jan Beulich
2018-05-16 16:42           ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180426113318.21838-10-jgross@suse.com \
    --to=jgross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).