[PATCHv3 0/2] x86/ept: reduce translation invalidation impact

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv3 0/2] x86/ept: reduce translation invalidation impact
@ 2015-12-03 16:42 David Vrabel
  2015-12-03 16:42 ` [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
  2015-12-03 16:42 ` [PATCHv3 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  0 siblings, 2 replies; 9+ messages in thread
From: David Vrabel @ 2015-12-03 16:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich

This series improves the performance of EPT by further reducing the
impact of the translation invalidations (ept_sync_domain()). By:

a) Deferring invalidations until the p2m write lock is released.

Prior to this change a 16 VCPU guest could not be successfully
migrated on an (admittedly slow) 160 PCPU box because the p2m write
lock was held for such extended periods of time.  This starved the
read lock needed (by the toolstack) to map the domain's memory,
triggering the watchdog.

After this change a 64 VCPU guest could be successfully migrated.

ept_sync_domain() is very expensive because:

a) it uses on_selected_cpus() and the IPI cost can be particularly
   high for a multi-socket machine.

b) on_selected_cpus() is serialized by its own spin lock.

On this particular box, ept_sync_domain() could take ~3-5 ms.

Simply using a fair rw lock was not sufficient to resolve this (but it
was an improvement) as the cost of the ept_sync_domain calls() was
still delaying the read locks enough for the watchdog to trigger (the
toolstack maps a batch of 1024 GFNs at a time, which means trying to
acquire the p2m read lock 1024 times).

Changes in v3:
- Drop already applied "x86/ept: remove unnecessary sync after
  resolving misconfigured entries".
- Replaced "mm: don't free pages until mm locks are released" with
  "x86/ept: invalidate guest physical mappings on VMENTER".

Changes in v2:

- Use a per-p2m (not per-CPU) list for page table pages to be freed.
- Hold the write lock while updating the synced_mask.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-03 16:42 [PATCHv3 0/2] x86/ept: reduce translation invalidation impact David Vrabel
@ 2015-12-03 16:42 ` David Vrabel
  2015-12-04 11:00   ` George Dunlap
  2015-12-03 16:42 ` [PATCHv3 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
  1 sibling, 1 reply; 9+ messages in thread
From: David Vrabel @ 2015-12-03 16:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich

If a guest allocates a page and the tlbflush_timestamp on the page
indicates that a TLB flush of the previous owner is required, only the
linear and combined mappings are invalidated.  The guest-physical
mappings are not invalidated.

This is currently safe because the EPT code ensures that the
guest-physical and combined mappings are invalidated /before/ the page
is freed.  However, this prevents us from deferring the EPT invalidate
until after the page is freed (e.g., to defer the invalidate until the
p2m locks are released).

The TLB flush that may be done after allocating page already causes
the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
one is still pending.

ept_sync_domain() now marks all PCPUs as needing to be invalidated,
including PCPUs that the domain has not run on.  We still only IPI
those PCPUs that are active so this does not result in any more[1]
INVEPT calls.

We do not attempt to track when PCPUs may have cached translations
because the only safe way to clear this per-CPU state if immediately
after an invalidate the PCPU is not active (i.e., the PCPU is not in
d->domain_dirty_cpumask).  Since we only invalidate on VMENTER or by
IPIing active PCPUs this can never happen.

[1] There is one unnecessary INVEPT when the domain runs on a PCPU for
    the very first time.  But this is: a) only 1 additional invalidate
    per PCPU for the lifetime of the domain; and b) we can avoid it
    with a subsequent change.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 22 ++++++++++------------
 xen/arch/x86/mm/p2m-ept.c          | 20 ++++++++------------
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +--
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9ad6d82..2f3a9f1 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -738,24 +738,12 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
 
 static void vmx_ctxt_switch_to(struct vcpu *v)
 {
-    struct domain *d = v->domain;
     unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features;
-    struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept;
 
     /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */
     if ( old_cr4 != new_cr4 )
         write_cr4(new_cr4);
 
-    if ( paging_mode_hap(d) )
-    {
-        unsigned int cpu = smp_processor_id();
-        /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */
-        if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) &&
-             !cpumask_test_and_set_cpu(cpu,
-                                       ept_get_synced_mask(ept_data)) )
-            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
-    }
-
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
 }
@@ -3497,6 +3485,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     if ( unlikely(need_flush) )
         vpid_sync_all();
 
+    if ( paging_mode_hap(curr->domain) )
+    {
+        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
+        unsigned int cpu = smp_processor_id();
+
+        if ( cpumask_test_cpu(cpu, ept->invalidate)
+             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
+            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
+    }
+
  out:
     HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index eef0372..014e2b2 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1090,8 +1090,10 @@ static void ept_memory_type_changed(struct p2m_domain *p2m)
 static void __ept_sync_domain(void *info)
 {
     struct ept_data *ept = &((struct p2m_domain *)info)->ept;
+    unsigned int cpu = smp_processor_id();
 
-    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
+    if ( cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
+        __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
 }
 
 void ept_sync_domain(struct p2m_domain *p2m)
@@ -1107,16 +1109,10 @@ void ept_sync_domain(struct p2m_domain *p2m)
     if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
         p2m_flush_nestedp2m(d);
 
-    /*
-     * Flush active cpus synchronously. Flush others the next time this domain
-     * is scheduled onto them. We accept the race of other CPUs adding to
-     * the ept_synced mask before on_selected_cpus() reads it, resulting in
-     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
-     */
-    cpumask_and(ept_get_synced_mask(ept),
-                d->domain_dirty_cpumask, &cpu_online_map);
+    /* May need to invalidate on all PCPUs. */
+    cpumask_setall(ept->invalidate);
 
-    on_selected_cpus(ept_get_synced_mask(ept),
+    on_selected_cpus(d->domain_dirty_cpumask,
                      __ept_sync_domain, p2m, 1);
 }
 
@@ -1182,7 +1178,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
         p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
     }
 
-    if ( !zalloc_cpumask_var(&ept->synced_mask) )
+    if ( !zalloc_cpumask_var(&ept->invalidate) )
         return -ENOMEM;
 
     on_each_cpu(__ept_sync_domain, p2m, 1);
@@ -1193,7 +1189,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
 void ept_p2m_uninit(struct p2m_domain *p2m)
 {
     struct ept_data *ept = &p2m->ept;
-    free_cpumask_var(ept->synced_mask);
+    free_cpumask_var(ept->invalidate);
 }
 
 static void ept_dump_p2m_table(unsigned char key)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index b3b0946..fbcb811 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -67,7 +67,7 @@ struct ept_data {
         };
         u64 eptp;
     };
-    cpumask_var_t synced_mask;
+    cpumask_var_t invalidate;
 };
 
 #define _VMX_DOMAIN_PML_ENABLED    0
@@ -97,7 +97,6 @@ struct pi_desc {
 #define ept_get_wl(ept)   ((ept)->ept_wl)
 #define ept_get_asr(ept)  ((ept)->asr)
 #define ept_get_eptp(ept) ((ept)->eptp)
-#define ept_get_synced_mask(ept) ((ept)->synced_mask)
 
 #define NR_PML_ENTRIES   512
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCHv3 2/2] x86/ept: defer the invalidation until the p2m lock is released
  2015-12-03 16:42 [PATCHv3 0/2] x86/ept: reduce translation invalidation impact David Vrabel
  2015-12-03 16:42 ` [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
@ 2015-12-03 16:42 ` David Vrabel
  2015-12-04 11:51   ` George Dunlap
  1 sibling, 1 reply; 9+ messages in thread
From: David Vrabel @ 2015-12-03 16:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, David Vrabel, Jan Beulich

From: David Vrabel <dvrabel@cantab.net>

Holding the p2m lock while calling ept_sync_domain() is very expensive
since it does a on_selected_cpus() call.  IPIs on many socket machines
can be very slows and on_selected_cpus() is serialized.

Defer the invalidate until the p2m lock is released.  Since the processor
may cache partial translations, we also need to make sure any page table
pages to be freed are not freed until the invalidate is complete.  Such
pages are temporarily stored in a list.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v2:
- use per-p2m list for deferred pages.
- update synced_mask while holding write lock.
---
 xen/arch/x86/mm/mm-locks.h | 23 ++++++++++++++--------
 xen/arch/x86/mm/p2m-ept.c  | 48 +++++++++++++++++++++++++++++++++++++---------
 xen/arch/x86/mm/p2m.c      | 18 +++++++++++++++++
 xen/include/asm-x86/p2m.h  |  7 +++++++
 4 files changed, 79 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 76c7217..b5eb560 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -263,14 +263,21 @@ declare_mm_lock(altp2mlist)
  */
 
 declare_mm_rwlock(altp2m);
-#define p2m_lock(p)                         \
-{                                           \
-    if ( p2m_is_altp2m(p) )                 \
-        mm_write_lock(altp2m, &(p)->lock);  \
-    else                                    \
-        mm_write_lock(p2m, &(p)->lock);     \
-}
-#define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
+#define p2m_lock(p)                             \
+    do {                                        \
+        if ( p2m_is_altp2m(p) )                 \
+            mm_write_lock(altp2m, &(p)->lock);  \
+        else                                    \
+            mm_write_lock(p2m, &(p)->lock);     \
+        (p)->defer_flush++;                     \
+    } while (0)
+#define p2m_unlock(p)                                                   \
+    do {                                                                \
+        if ( --(p)->defer_flush == 0 && (p)->need_flush )               \
+            (p)->flush_and_unlock(p);                                   \
+        else                                                            \
+            mm_write_unlock(&(p)->lock);                                \
+    } while (0)
 #define gfn_lock(p,g,o)       p2m_lock(p)
 #define gfn_unlock(p,g,o)     p2m_unlock(p)
 #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 014e2b2..bec27d7 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -263,7 +263,7 @@ static void ept_free_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry, int l
         unmap_domain_page(epte);
     }
     
-    p2m_free_ptp(p2m, mfn_to_page(ept_entry->mfn));
+    p2m_free_ptp_defer(p2m, mfn_to_page(ept_entry->mfn));
 }
 
 static bool_t ept_split_super_page(struct p2m_domain *p2m,
@@ -1096,24 +1096,53 @@ static void __ept_sync_domain(void *info)
         __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
 }
 
-void ept_sync_domain(struct p2m_domain *p2m)
+static void ept_sync_domain_prepare(struct p2m_domain *p2m)
 {
     struct domain *d = p2m->domain;
     struct ept_data *ept = &p2m->ept;
-    /* Only if using EPT and this domain has some VCPUs to dirty. */
-    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
-        return;
-
-    ASSERT(local_irq_is_enabled());
 
     if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
         p2m_flush_nestedp2m(d);
 
     /* May need to invalidate on all PCPUs. */
     cpumask_setall(ept->invalidate);
+}
+
+static void ept_sync_domain_mask(struct p2m_domain *p2m, const cpumask_t *mask)
+{
+    on_selected_cpus(mask, __ept_sync_domain, p2m, 1);
+}
+
+void ept_sync_domain(struct p2m_domain *p2m)
+{
+    struct domain *d = p2m->domain;
+
+    /* Only if using EPT and this domain has some VCPUs to dirty. */
+    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
+        return;
+
+    ept_sync_domain_prepare(p2m);
+
+    if ( p2m->defer_flush )
+    {
+        p2m->need_flush = 1;
+        return;
+    }
+    p2m->need_flush = 0;
+
+    ept_sync_domain_mask(p2m, d->domain_dirty_cpumask);
+}
+
+static void ept_flush_and_unlock(struct p2m_domain *p2m)
+{
+    PAGE_LIST_HEAD(deferred_pages);
+
+    page_list_move(&deferred_pages, &p2m->deferred_pages);
+
+    mm_write_unlock(&p2m->lock);
 
-    on_selected_cpus(d->domain_dirty_cpumask,
-                     __ept_sync_domain, p2m, 1);
+    ept_sync_domain_mask(p2m, p2m->domain->domain_dirty_cpumask);
+    p2m_free_ptp_list(p2m, &deferred_pages);
 }
 
 static void ept_enable_pml(struct p2m_domain *p2m)
@@ -1164,6 +1193,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
     p2m->change_entry_type_range = ept_change_entry_type_range;
     p2m->memory_type_changed = ept_memory_type_changed;
     p2m->audit_p2m = NULL;
+    p2m->flush_and_unlock = ept_flush_and_unlock;
 
     /* Set the memory type used when accessing EPT paging structures. */
     ept->ept_mt = EPT_DEFAULT_MT;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ed0bbd7..502bdad 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -65,6 +65,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     mm_lock_init(&p2m->pod.lock);
     INIT_LIST_HEAD(&p2m->np2m_list);
     INIT_PAGE_LIST_HEAD(&p2m->pages);
+    INIT_PAGE_LIST_HEAD(&p2m->deferred_pages);
     INIT_PAGE_LIST_HEAD(&p2m->pod.super);
     INIT_PAGE_LIST_HEAD(&p2m->pod.single);
 
@@ -504,6 +505,23 @@ void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg)
     return;
 }
 
+void p2m_free_ptp_defer(struct p2m_domain *p2m, struct page_info *pg)
+{
+    page_list_del(pg, &p2m->pages);
+    page_list_add(pg, &p2m->deferred_pages);
+}
+
+void p2m_free_ptp_list(struct p2m_domain *p2m, struct page_list_head *list)
+{
+    struct page_info *pg, *tmp;
+
+    page_list_for_each_safe(pg, tmp, list)
+    {
+        page_list_del(pg, list);
+        p2m->domain->arch.paging.free_page(p2m->domain, pg);
+    }
+}
+
 /*
  * Allocate a new p2m table for a domain.
  *
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index fa46dd9..ee03997 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -261,6 +261,11 @@ struct p2m_domain {
                                           unsigned long gfn, l1_pgentry_t *p,
                                           l1_pgentry_t new, unsigned int level);
     long               (*audit_p2m)(struct p2m_domain *p2m);
+    void               (*flush_and_unlock)(struct p2m_domain *p2m);
+
+    struct page_list_head deferred_pages;
+    unsigned int defer_flush;
+    bool_t need_flush;
 
     /* Default P2M access type for each page in the the domain: new pages,
      * swapped in pages, cleared pages, and pages that are ambiguously
@@ -688,6 +693,8 @@ static inline bool_t p2m_mem_access_sanity_check(struct domain *d)
 
 struct page_info *p2m_alloc_ptp(struct p2m_domain *p2m, unsigned long type);
 void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg);
+void p2m_free_ptp_defer(struct p2m_domain *p2m, struct page_info *pg);
+void p2m_free_ptp_list(struct p2m_domain *p2m, struct page_list_head *list);
 
 /* Directly set a p2m entry: only for use by p2m code. Does not need
  * a call to put_gfn afterwards/ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-03 16:42 ` [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
@ 2015-12-04 11:00   ` George Dunlap
  2015-12-04 13:39     ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: George Dunlap @ 2015-12-04 11:00 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Jan Beulich

On 03/12/15 16:42, David Vrabel wrote:
> If a guest allocates a page and the tlbflush_timestamp on the page
> indicates that a TLB flush of the previous owner is required, only the
> linear and combined mappings are invalidated.  The guest-physical
> mappings are not invalidated.
> 
> This is currently safe because the EPT code ensures that the
> guest-physical and combined mappings are invalidated /before/ the page
> is freed.  However, this prevents us from deferring the EPT invalidate
> until after the page is freed (e.g., to defer the invalidate until the
> p2m locks are released).
> 
> The TLB flush that may be done after allocating page already causes
> the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
> one is still pending.
> 
> ept_sync_domain() now marks all PCPUs as needing to be invalidated,
> including PCPUs that the domain has not run on.  We still only IPI
> those PCPUs that are active so this does not result in any more[1]
> INVEPT calls.
> 
> We do not attempt to track when PCPUs may have cached translations
> because the only safe way to clear this per-CPU state if immediately
> after an invalidate the PCPU is not active (i.e., the PCPU is not in
> d->domain_dirty_cpumask).  Since we only invalidate on VMENTER or by
> IPIing active PCPUs this can never happen.
> 
> [1] There is one unnecessary INVEPT when the domain runs on a PCPU for
>     the very first time.  But this is: a) only 1 additional invalidate
>     per PCPU for the lifetime of the domain; and b) we can avoid it
>     with a subsequent change.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

This looks like a definite improvement.

One thing you missed is that in ept_p2m_init(), it calls
__ept_sync_domain() on all cpus; but because the "invalidate" mask is
clear at that point, nothing will actually happen.

Part of this I think comes as a result from inverting what the mask
means.  Before this patch, "not synced" is the default, and therefore
you need to invalidate unless someone has set the bit saying you don't
have to.  After this, "don't invalidate" is the default and you do
nothing unless someone has set a bit saying you do have to.

I'd think prefer it if you left the mask as "synced_mask"; then you can
actually take that on_each_cpu() out of the ept_p2m_init entirely.
(Probably wants a comment pointing that out.)

 -George

> ---
>  xen/arch/x86/hvm/vmx/vmx.c         | 22 ++++++++++------------
>  xen/arch/x86/mm/p2m-ept.c          | 20 ++++++++------------
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +--
>  3 files changed, 19 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 9ad6d82..2f3a9f1 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -738,24 +738,12 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
>  
>  static void vmx_ctxt_switch_to(struct vcpu *v)
>  {
> -    struct domain *d = v->domain;
>      unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features;
> -    struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept;
>  
>      /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */
>      if ( old_cr4 != new_cr4 )
>          write_cr4(new_cr4);
>  
> -    if ( paging_mode_hap(d) )
> -    {
> -        unsigned int cpu = smp_processor_id();
> -        /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */
> -        if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) &&
> -             !cpumask_test_and_set_cpu(cpu,
> -                                       ept_get_synced_mask(ept_data)) )
> -            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
> -    }
> -
>      vmx_restore_guest_msrs(v);
>      vmx_restore_dr(v);
>  }
> @@ -3497,6 +3485,16 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>      if ( unlikely(need_flush) )
>          vpid_sync_all();
>  
> +    if ( paging_mode_hap(curr->domain) )
> +    {
> +        struct ept_data *ept = &p2m_get_hostp2m(curr->domain)->ept;
> +        unsigned int cpu = smp_processor_id();
> +
> +        if ( cpumask_test_cpu(cpu, ept->invalidate)
> +             && cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
> +            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
> +    }
> +
>   out:
>      HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
>  
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index eef0372..014e2b2 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -1090,8 +1090,10 @@ static void ept_memory_type_changed(struct p2m_domain *p2m)
>  static void __ept_sync_domain(void *info)
>  {
>      struct ept_data *ept = &((struct p2m_domain *)info)->ept;
> +    unsigned int cpu = smp_processor_id();
>  
> -    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
> +    if ( cpumask_test_and_clear_cpu(cpu, ept->invalidate) )
> +        __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
>  }
>  
>  void ept_sync_domain(struct p2m_domain *p2m)
> @@ -1107,16 +1109,10 @@ void ept_sync_domain(struct p2m_domain *p2m)
>      if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
>          p2m_flush_nestedp2m(d);
>  
> -    /*
> -     * Flush active cpus synchronously. Flush others the next time this domain
> -     * is scheduled onto them. We accept the race of other CPUs adding to
> -     * the ept_synced mask before on_selected_cpus() reads it, resulting in
> -     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
> -     */
> -    cpumask_and(ept_get_synced_mask(ept),
> -                d->domain_dirty_cpumask, &cpu_online_map);
> +    /* May need to invalidate on all PCPUs. */
> +    cpumask_setall(ept->invalidate);
>  
> -    on_selected_cpus(ept_get_synced_mask(ept),
> +    on_selected_cpus(d->domain_dirty_cpumask,
>                       __ept_sync_domain, p2m, 1);
>  }
>  
> @@ -1182,7 +1178,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
>          p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
>      }
>  
> -    if ( !zalloc_cpumask_var(&ept->synced_mask) )
> +    if ( !zalloc_cpumask_var(&ept->invalidate) )
>          return -ENOMEM;
>  
>      on_each_cpu(__ept_sync_domain, p2m, 1);
> @@ -1193,7 +1189,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
>  void ept_p2m_uninit(struct p2m_domain *p2m)
>  {
>      struct ept_data *ept = &p2m->ept;
> -    free_cpumask_var(ept->synced_mask);
> +    free_cpumask_var(ept->invalidate);
>  }
>  
>  static void ept_dump_p2m_table(unsigned char key)
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index b3b0946..fbcb811 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -67,7 +67,7 @@ struct ept_data {
>          };
>          u64 eptp;
>      };
> -    cpumask_var_t synced_mask;
> +    cpumask_var_t invalidate;
>  };
>  
>  #define _VMX_DOMAIN_PML_ENABLED    0
> @@ -97,7 +97,6 @@ struct pi_desc {
>  #define ept_get_wl(ept)   ((ept)->ept_wl)
>  #define ept_get_asr(ept)  ((ept)->asr)
>  #define ept_get_eptp(ept) ((ept)->eptp)
> -#define ept_get_synced_mask(ept) ((ept)->synced_mask)
>  
>  #define NR_PML_ENTRIES   512
>  
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv3 2/2] x86/ept: defer the invalidation until the p2m lock is released
  2015-12-03 16:42 ` [PATCHv3 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
@ 2015-12-04 11:51   ` George Dunlap
  0 siblings, 0 replies; 9+ messages in thread
From: George Dunlap @ 2015-12-04 11:51 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich

On 03/12/15 16:42, David Vrabel wrote:
> From: David Vrabel <dvrabel@cantab.net>
> 
> Holding the p2m lock while calling ept_sync_domain() is very expensive
> since it does a on_selected_cpus() call.  IPIs on many socket machines
> can be very slows and on_selected_cpus() is serialized.
> 
> Defer the invalidate until the p2m lock is released.  Since the processor
> may cache partial translations, we also need to make sure any page table
> pages to be freed are not freed until the invalidate is complete.  Such
> pages are temporarily stored in a list.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Looks good.

One thing is that now (actually since the previous patch) the flush in
__ept_sync_domain() itself is actually entirely redundant, as the check
will happen on the vmentry path again anyway.  But I think you do want
to wait until at least all vcpus have been interrupted, and
on_selected_cpus() looks like the simplest way to do that.  (Otherwise
you could just call smp_send_event_check_mask() and get rid of
__ept_sync_domain() entirely.)

Jan / Andy, any thoughts?

 -George

> ---
> v2:
> - use per-p2m list for deferred pages.
> - update synced_mask while holding write lock.
> ---
>  xen/arch/x86/mm/mm-locks.h | 23 ++++++++++++++--------
>  xen/arch/x86/mm/p2m-ept.c  | 48 +++++++++++++++++++++++++++++++++++++---------
>  xen/arch/x86/mm/p2m.c      | 18 +++++++++++++++++
>  xen/include/asm-x86/p2m.h  |  7 +++++++
>  4 files changed, 79 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
> index 76c7217..b5eb560 100644
> --- a/xen/arch/x86/mm/mm-locks.h
> +++ b/xen/arch/x86/mm/mm-locks.h
> @@ -263,14 +263,21 @@ declare_mm_lock(altp2mlist)
>   */
>  
>  declare_mm_rwlock(altp2m);
> -#define p2m_lock(p)                         \
> -{                                           \
> -    if ( p2m_is_altp2m(p) )                 \
> -        mm_write_lock(altp2m, &(p)->lock);  \
> -    else                                    \
> -        mm_write_lock(p2m, &(p)->lock);     \
> -}
> -#define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
> +#define p2m_lock(p)                             \
> +    do {                                        \
> +        if ( p2m_is_altp2m(p) )                 \
> +            mm_write_lock(altp2m, &(p)->lock);  \
> +        else                                    \
> +            mm_write_lock(p2m, &(p)->lock);     \
> +        (p)->defer_flush++;                     \
> +    } while (0)
> +#define p2m_unlock(p)                                                   \
> +    do {                                                                \
> +        if ( --(p)->defer_flush == 0 && (p)->need_flush )               \
> +            (p)->flush_and_unlock(p);                                   \
> +        else                                                            \
> +            mm_write_unlock(&(p)->lock);                                \
> +    } while (0)
>  #define gfn_lock(p,g,o)       p2m_lock(p)
>  #define gfn_unlock(p,g,o)     p2m_unlock(p)
>  #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 014e2b2..bec27d7 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -263,7 +263,7 @@ static void ept_free_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry, int l
>          unmap_domain_page(epte);
>      }
>      
> -    p2m_free_ptp(p2m, mfn_to_page(ept_entry->mfn));
> +    p2m_free_ptp_defer(p2m, mfn_to_page(ept_entry->mfn));
>  }
>  
>  static bool_t ept_split_super_page(struct p2m_domain *p2m,
> @@ -1096,24 +1096,53 @@ static void __ept_sync_domain(void *info)
>          __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
>  }
>  
> -void ept_sync_domain(struct p2m_domain *p2m)
> +static void ept_sync_domain_prepare(struct p2m_domain *p2m)
>  {
>      struct domain *d = p2m->domain;
>      struct ept_data *ept = &p2m->ept;
> -    /* Only if using EPT and this domain has some VCPUs to dirty. */
> -    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
> -        return;
> -
> -    ASSERT(local_irq_is_enabled());
>  
>      if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
>          p2m_flush_nestedp2m(d);
>  
>      /* May need to invalidate on all PCPUs. */
>      cpumask_setall(ept->invalidate);
> +}
> +
> +static void ept_sync_domain_mask(struct p2m_domain *p2m, const cpumask_t *mask)
> +{
> +    on_selected_cpus(mask, __ept_sync_domain, p2m, 1);
> +}
> +
> +void ept_sync_domain(struct p2m_domain *p2m)
> +{
> +    struct domain *d = p2m->domain;
> +
> +    /* Only if using EPT and this domain has some VCPUs to dirty. */
> +    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
> +        return;
> +
> +    ept_sync_domain_prepare(p2m);
> +
> +    if ( p2m->defer_flush )
> +    {
> +        p2m->need_flush = 1;
> +        return;
> +    }
> +    p2m->need_flush = 0;
> +
> +    ept_sync_domain_mask(p2m, d->domain_dirty_cpumask);
> +}
> +
> +static void ept_flush_and_unlock(struct p2m_domain *p2m)
> +{
> +    PAGE_LIST_HEAD(deferred_pages);
> +
> +    page_list_move(&deferred_pages, &p2m->deferred_pages);
> +
> +    mm_write_unlock(&p2m->lock);
>  
> -    on_selected_cpus(d->domain_dirty_cpumask,
> -                     __ept_sync_domain, p2m, 1);
> +    ept_sync_domain_mask(p2m, p2m->domain->domain_dirty_cpumask);
> +    p2m_free_ptp_list(p2m, &deferred_pages);
>  }
>  
>  static void ept_enable_pml(struct p2m_domain *p2m)
> @@ -1164,6 +1193,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
>      p2m->change_entry_type_range = ept_change_entry_type_range;
>      p2m->memory_type_changed = ept_memory_type_changed;
>      p2m->audit_p2m = NULL;
> +    p2m->flush_and_unlock = ept_flush_and_unlock;
>  
>      /* Set the memory type used when accessing EPT paging structures. */
>      ept->ept_mt = EPT_DEFAULT_MT;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index ed0bbd7..502bdad 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -65,6 +65,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
>      mm_lock_init(&p2m->pod.lock);
>      INIT_LIST_HEAD(&p2m->np2m_list);
>      INIT_PAGE_LIST_HEAD(&p2m->pages);
> +    INIT_PAGE_LIST_HEAD(&p2m->deferred_pages);
>      INIT_PAGE_LIST_HEAD(&p2m->pod.super);
>      INIT_PAGE_LIST_HEAD(&p2m->pod.single);
>  
> @@ -504,6 +505,23 @@ void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg)
>      return;
>  }
>  
> +void p2m_free_ptp_defer(struct p2m_domain *p2m, struct page_info *pg)
> +{
> +    page_list_del(pg, &p2m->pages);
> +    page_list_add(pg, &p2m->deferred_pages);
> +}
> +
> +void p2m_free_ptp_list(struct p2m_domain *p2m, struct page_list_head *list)
> +{
> +    struct page_info *pg, *tmp;
> +
> +    page_list_for_each_safe(pg, tmp, list)
> +    {
> +        page_list_del(pg, list);
> +        p2m->domain->arch.paging.free_page(p2m->domain, pg);
> +    }
> +}
> +
>  /*
>   * Allocate a new p2m table for a domain.
>   *
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index fa46dd9..ee03997 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -261,6 +261,11 @@ struct p2m_domain {
>                                            unsigned long gfn, l1_pgentry_t *p,
>                                            l1_pgentry_t new, unsigned int level);
>      long               (*audit_p2m)(struct p2m_domain *p2m);
> +    void               (*flush_and_unlock)(struct p2m_domain *p2m);
> +
> +    struct page_list_head deferred_pages;
> +    unsigned int defer_flush;
> +    bool_t need_flush;
>  
>      /* Default P2M access type for each page in the the domain: new pages,
>       * swapped in pages, cleared pages, and pages that are ambiguously
> @@ -688,6 +693,8 @@ static inline bool_t p2m_mem_access_sanity_check(struct domain *d)
>  
>  struct page_info *p2m_alloc_ptp(struct p2m_domain *p2m, unsigned long type);
>  void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg);
> +void p2m_free_ptp_defer(struct p2m_domain *p2m, struct page_info *pg);
> +void p2m_free_ptp_list(struct p2m_domain *p2m, struct page_list_head *list);
>  
>  /* Directly set a p2m entry: only for use by p2m code. Does not need
>   * a call to put_gfn afterwards/ */
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-04 11:00   ` George Dunlap
@ 2015-12-04 13:39     ` David Vrabel
  2015-12-07 10:25       ` George Dunlap
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2015-12-04 13:39 UTC (permalink / raw)
  To: George Dunlap, David Vrabel, xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Jan Beulich

On 04/12/15 11:00, George Dunlap wrote:
> On 03/12/15 16:42, David Vrabel wrote:
>> If a guest allocates a page and the tlbflush_timestamp on the page
>> indicates that a TLB flush of the previous owner is required, only the
>> linear and combined mappings are invalidated.  The guest-physical
>> mappings are not invalidated.
>>
>> This is currently safe because the EPT code ensures that the
>> guest-physical and combined mappings are invalidated /before/ the page
>> is freed.  However, this prevents us from deferring the EPT invalidate
>> until after the page is freed (e.g., to defer the invalidate until the
>> p2m locks are released).
>>
>> The TLB flush that may be done after allocating page already causes
>> the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
>> one is still pending.
>>
>> ept_sync_domain() now marks all PCPUs as needing to be invalidated,
>> including PCPUs that the domain has not run on.  We still only IPI
>> those PCPUs that are active so this does not result in any more[1]
>> INVEPT calls.
>>
>> We do not attempt to track when PCPUs may have cached translations
>> because the only safe way to clear this per-CPU state if immediately
>> after an invalidate the PCPU is not active (i.e., the PCPU is not in
>> d->domain_dirty_cpumask).  Since we only invalidate on VMENTER or by
>> IPIing active PCPUs this can never happen.
>>
>> [1] There is one unnecessary INVEPT when the domain runs on a PCPU for
>>     the very first time.  But this is: a) only 1 additional invalidate
>>     per PCPU for the lifetime of the domain; and b) we can avoid it
>>     with a subsequent change.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> This looks like a definite improvement.
> 
> One thing you missed is that in ept_p2m_init(), it calls
> __ept_sync_domain() on all cpus; but because the "invalidate" mask is
> clear at that point, nothing will actually happen.

Good point.  I'd missed this because I'd planned to replace this initial
invalidate by initializing ept->invalidate to all ones (as I alluded to
in the [1] footnote).

> Part of this I think comes as a result from inverting what the mask
> means.  Before this patch, "not synced" is the default, and therefore
> you need to invalidate unless someone has set the bit saying you don't
> have to.  After this, "don't invalidate" is the default and you do
> nothing unless someone has set a bit saying you do have to.
> 
> I'd think prefer it if you left the mask as "synced_mask"; then you can
> actually take that on_each_cpu() out of the ept_p2m_init entirely.
> (Probably wants a comment pointing that out.)

I changed its name because it's old use as synced_mask (i.e., the set of
CPUs needing to be notified of required invalidation) did not match its
name.  I rather not keep the name and invert its use.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-04 13:39     ` David Vrabel
@ 2015-12-07 10:25       ` George Dunlap
  2015-12-14 14:41         ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: George Dunlap @ 2015-12-07 10:25 UTC (permalink / raw)
  To: David Vrabel
  Cc: Kevin Tian, Jan Beulich, Andrew Cooper, Tim Deegan, George Dunlap,
	Jun Nakajima, xen-devel

On Fri, Dec 4, 2015 at 1:39 PM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 04/12/15 11:00, George Dunlap wrote:
>> On 03/12/15 16:42, David Vrabel wrote:
>>> If a guest allocates a page and the tlbflush_timestamp on the page
>>> indicates that a TLB flush of the previous owner is required, only the
>>> linear and combined mappings are invalidated.  The guest-physical
>>> mappings are not invalidated.
>>>
>>> This is currently safe because the EPT code ensures that the
>>> guest-physical and combined mappings are invalidated /before/ the page
>>> is freed.  However, this prevents us from deferring the EPT invalidate
>>> until after the page is freed (e.g., to defer the invalidate until the
>>> p2m locks are released).
>>>
>>> The TLB flush that may be done after allocating page already causes
>>> the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
>>> one is still pending.
>>>
>>> ept_sync_domain() now marks all PCPUs as needing to be invalidated,
>>> including PCPUs that the domain has not run on.  We still only IPI
>>> those PCPUs that are active so this does not result in any more[1]
>>> INVEPT calls.
>>>
>>> We do not attempt to track when PCPUs may have cached translations
>>> because the only safe way to clear this per-CPU state if immediately
>>> after an invalidate the PCPU is not active (i.e., the PCPU is not in
>>> d->domain_dirty_cpumask).  Since we only invalidate on VMENTER or by
>>> IPIing active PCPUs this can never happen.
>>>
>>> [1] There is one unnecessary INVEPT when the domain runs on a PCPU for
>>>     the very first time.  But this is: a) only 1 additional invalidate
>>>     per PCPU for the lifetime of the domain; and b) we can avoid it
>>>     with a subsequent change.
>>>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>
>> This looks like a definite improvement.
>>
>> One thing you missed is that in ept_p2m_init(), it calls
>> __ept_sync_domain() on all cpus; but because the "invalidate" mask is
>> clear at that point, nothing will actually happen.
>
> Good point.  I'd missed this because I'd planned to replace this initial
> invalidate by initializing ept->invalidate to all ones (as I alluded to
> in the [1] footnote).
>
>> Part of this I think comes as a result from inverting what the mask
>> means.  Before this patch, "not synced" is the default, and therefore
>> you need to invalidate unless someone has set the bit saying you don't
>> have to.  After this, "don't invalidate" is the default and you do
>> nothing unless someone has set a bit saying you do have to.
>>
>> I'd think prefer it if you left the mask as "synced_mask"; then you can
>> actually take that on_each_cpu() out of the ept_p2m_init entirely.
>> (Probably wants a comment pointing that out.)
>
> I changed its name because it's old use as synced_mask (i.e., the set of
> CPUs needing to be notified of required invalidation) did not match its
> name.  I rather not keep the name and invert its use.

I took the past tense ("synced") to mean, "These CPUs have been
brought into sync (or are no longer out of sync)".  So they start out
not-synced, so you initialize the bit to be clear; when an INVEPT is
executed, they become synced, so you set the bit; and when you change
the EPT tables, they are no longer synced so you clear the bit.

I still think defaulting to zero and "not-synced" will minimize the
risk of people making a mistake later (e.g., by forgetting to
initialize them to 1 instead of 0), but it's not a big deal either
way.

 -George

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-07 10:25       ` George Dunlap
@ 2015-12-14 14:41         ` David Vrabel
  2015-12-16 16:04           ` George Dunlap
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2015-12-14 14:41 UTC (permalink / raw)
  To: George Dunlap, David Vrabel
  Cc: Kevin Tian, Jan Beulich, Andrew Cooper, Tim Deegan, George Dunlap,
	Jun Nakajima, xen-devel

On 07/12/15 10:25, George Dunlap wrote:
> 
> I took the past tense ("synced") to mean, "These CPUs have been
> brought into sync (or are no longer out of sync)".  So they start out
> not-synced, so you initialize the bit to be clear; when an INVEPT is
> executed, they become synced, so you set the bit; and when you change
> the EPT tables, they are no longer synced so you clear the bit.

It didn't work like that though.  I have retained the changed name and
meaning.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER
  2015-12-14 14:41         ` David Vrabel
@ 2015-12-16 16:04           ` George Dunlap
  0 siblings, 0 replies; 9+ messages in thread
From: George Dunlap @ 2015-12-16 16:04 UTC (permalink / raw)
  To: David Vrabel, George Dunlap
  Cc: Kevin Tian, Jan Beulich, Andrew Cooper, Tim Deegan, Jun Nakajima,
	xen-devel

On 14/12/15 14:41, David Vrabel wrote:
> On 07/12/15 10:25, George Dunlap wrote:
>>
>> I took the past tense ("synced") to mean, "These CPUs have been
>> brought into sync (or are no longer out of sync)".  So they start out
>> not-synced, so you initialize the bit to be clear; when an INVEPT is
>> executed, they become synced, so you set the bit; and when you change
>> the EPT tables, they are no longer synced so you clear the bit.
> 
> It didn't work like that though.  I have retained the changed name and
> meaning.

I'm not sure what you mean here.  vmx_ctx_switch_to() would check to see
if the bit corresponding to the current cpu was set; if it wasn't, it
would set the bit and call INVEPT flush.  The mask started out zero
(which would cause a sync on all cpus); and in the original version of
ept_sync_domain(), all the cpus which were about to be synced by the
on_selected_cpus() call are set while the other ones are clear.

 -George

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-16 16:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-03 16:42 [PATCHv3 0/2] x86/ept: reduce translation invalidation impact David Vrabel
2015-12-03 16:42 ` [PATCHv3 1/2] x86/ept: invalidate guest physical mappings on VMENTER David Vrabel
2015-12-04 11:00   ` George Dunlap
2015-12-04 13:39     ` David Vrabel
2015-12-07 10:25       ` George Dunlap
2015-12-14 14:41         ` David Vrabel
2015-12-16 16:04           ` George Dunlap
2015-12-03 16:42 ` [PATCHv3 2/2] x86/ept: defer the invalidation until the p2m lock is released David Vrabel
2015-12-04 11:51   ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).