xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
@ 2014-09-29 10:28 Paul Durrant
  2014-10-10 11:55 ` Egger, Christoph
  2014-10-10 16:36 ` Matt Wilson
  0 siblings, 2 replies; 14+ messages in thread
From: Paul Durrant @ 2014-09-29 10:28 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Christoph Egger,
	Ian Jackson, Paul Durrant, Jan Beulich

The presence of the partition reference time enlightenment persuades newer
versions of Windows to prefer the TSC as their primary time source. Hence,
if rdtsc is not being emulated and is invariant then many vmexits (for
alternative time sources such as the HPET or reference counter MSR) can
be avoided.

The implementation is not yet complete as no attempt is made to prevent
emulation of rdtsc if the enlightenment is active and guest and host
TSC frequencies differ. To do that requires invasive changes in the core
x86 time code and hence a lot more testing.

This patch avoids the issue by disabling the enlightenment if rdtsc is
being emulated, causing Windows to choose another time source. This is
safe, but may cause a big variation in performance of guests migrated
between hosts of differing TSC frequency. Thus the enlightenment is not
enabled in the default set, but may be enabled to improve guest performance
where such migrations are not a concern.

See section 15.4 of the Microsoft Hypervisor Top Level Functional
Specification v4.0a for details.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Christoph Egger <chegger@amazon.de>
---
v2: - Addressed comments from Jan:
      - Leave enlightenment out of default set and add a comment explaining
        why.
      - De-dup code surrounding adjustment of sequence number on resume.
      - Remove accidental code deletion.

 docs/man/xl.cfg.pod.5                  |    6 ++
 tools/libxl/libxl_dom.c                |    3 +
 tools/libxl/libxl_types.idl            |    1 +
 xen/arch/x86/hvm/viridian.c            |  113 +++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/viridian.h     |   35 ++++++++++
 xen/include/public/arch-x86/hvm/save.h |   11 ++++
 xen/include/public/hvm/params.h        |    7 +-
 7 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8bba21c..925adbb 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1163,6 +1163,12 @@ This group incorporates Partition Time Reference Counter MSR. This
 enlightenment can improve performance of Windows 8 and Windows
 Server 2012 onwards.
 
+=item B<reference_tsc>
+
+This set incorporates the Partition Reference TSC MSR. This
+enlightenment can improve performance of Windows 7 and Windows
+Server 2008 R2 onwards.
+
 =item B<defaults>
 
 This is a special value that enables the default set of groups, which
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bd21841..d04fc0d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
     if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT))
         mask |= HVMPV_time_ref_count;
 
+    if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC))
+        mask |= HVMPV_reference_tsc;
+
     if (mask != 0 &&
         xc_hvm_param_set(CTX->xch,
                          domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 90d152f..da75a40 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -183,6 +183,7 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
     (0, "base"),
     (1, "freq"),
     (2, "time_ref_count"),
+    (3, "reference_tsc"),
     ])
 
 #
diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
index 6726168..a7d3283 100644
--- a/xen/arch/x86/hvm/viridian.c
+++ b/xen/arch/x86/hvm/viridian.c
@@ -21,6 +21,7 @@
 #define VIRIDIAN_MSR_HYPERCALL                  0x40000001
 #define VIRIDIAN_MSR_VP_INDEX                   0x40000002
 #define VIRIDIAN_MSR_TIME_REF_COUNT             0x40000020
+#define VIRIDIAN_MSR_REFERENCE_TSC              0x40000021
 #define VIRIDIAN_MSR_TSC_FREQUENCY              0x40000022
 #define VIRIDIAN_MSR_APIC_FREQUENCY             0x40000023
 #define VIRIDIAN_MSR_EOI                        0x40000070
@@ -40,6 +41,7 @@
 #define CPUID3A_MSR_APIC_ACCESS    (1 << 4)
 #define CPUID3A_MSR_HYPERCALL      (1 << 5)
 #define CPUID3A_MSR_VP_INDEX       (1 << 6)
+#define CPUID3A_MSR_REFERENCE_TSC  (1 << 9)
 #define CPUID3A_MSR_FREQ           (1 << 11)
 
 /* Viridian CPUID 4000004, Implementation Recommendations. */
@@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned int *eax,
             *eax |= CPUID3A_MSR_FREQ;
         if ( viridian_feature_mask(d) & HVMPV_time_ref_count )
             *eax |= CPUID3A_MSR_TIME_REF_COUNT;
+        if ( viridian_feature_mask(d) & HVMPV_reference_tsc )
+            *eax |= CPUID3A_MSR_REFERENCE_TSC;
         break;
     case 4:
         /* Recommended hypercall usage. */
@@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu *v)
            v, aa->fields.enabled, (unsigned long)aa->fields.pfn);
 }
 
+static void dump_reference_tsc(const struct domain *d)
+{
+    const union viridian_reference_tsc *rt;
+
+    rt = &d->arch.hvm_domain.viridian.reference_tsc;
+    
+    printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: %x pfn: %lx\n",
+           d->domain_id,
+           rt->fields.enabled, (unsigned long)rt->fields.pfn);
+}
+
 static void enable_hypercall_page(struct domain *d)
 {
     unsigned long gmfn = d->arch.hvm_domain.viridian.hypercall_gpa.fields.pfn;
@@ -224,6 +239,78 @@ static void initialize_apic_assist(struct vcpu *v)
     put_page_and_type(page);
 }
 
+static void update_reference_tsc(struct domain *d, bool_t initialize)
+{
+    unsigned long gmfn = d->arch.hvm_domain.viridian.reference_tsc.fields.pfn;
+    struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    HV_REFERENCE_TSC_PAGE *p;
+
+    if ( !page || !get_page_type(page, PGT_writable_page) )
+    {
+        if ( page )
+            put_page(page);
+        gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn,
+                 page ? page_to_mfn(page) : INVALID_MFN);
+        return;
+    }
+
+    p = __map_domain_page(page);
+
+    if ( initialize )
+        clear_page(p);
+
+    /*
+     * This enlightenment must be disabled is the host TSC is not invariant.
+     * However it is also disabled if vtsc is true (which means rdtsc is being
+     * emulated). This generally happens when guest TSC freq and host TSC freq
+     * don't match. The TscScale value could be adjusted to cope with this,
+     * allowing vtsc to be turned off, but support for this is not yet present
+     * in the hypervisor. Thus is it is possible that migrating a Windows VM
+     * between hosts of differing TSC frequencies may result in large
+     * differences in guest performance.
+     */
+    if ( !host_tsc_is_safe() || d->arch.vtsc )
+    {
+        /*
+         * The specification states that valid values of TscSequence range
+         * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate
+         * this mechanism is no longer a reliable source of time and that
+         * the VM should fall back to a different source.
+         *
+         * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually violate
+         * the spec. and rely on a value of 0 to indicate that this
+         * enlightenment should no longer be used. These two kernel
+         * versions are currently the only ones to make use of this
+         * enlightenment, so just use 0 here.
+         */
+        p->TscSequence = 0;
+
+        printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: invalidated\n",
+               d->domain_id);
+        return;
+    }
+
+    /*
+     * The guest will calculate reference time according to the following
+     * formula:
+     *
+     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
+     *
+     * Windows uses a 100ns tick, so we need a scale which is cpu
+     * ticks per 100ns shifted left by 64.
+     */
+    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
+
+    do {
+        p->TscSequence++;
+    } while ( p->TscSequence == 0xFFFFFFFF ||
+              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
+
+    unmap_domain_page(p);
+
+    put_page_and_type(page);
+}
+
 int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
 {
     struct vcpu *v = current;
@@ -282,6 +369,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
             initialize_apic_assist(v);
         break;
 
+    case VIRIDIAN_MSR_REFERENCE_TSC:
+        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
+            return 0;
+
+        perfc_incr(mshv_wrmsr_tsc_msr);
+        d->arch.hvm_domain.viridian.reference_tsc.raw = val;
+        dump_reference_tsc(d);
+        if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
+            update_reference_tsc(d, 1);
+        break;
+
     default:
         return 0;
     }
@@ -346,6 +444,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t *val)
         *val = v->arch.hvm_vcpu.viridian.apic_assist.raw;
         break;
 
+    case VIRIDIAN_MSR_REFERENCE_TSC:
+        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
+            return 0;
+
+        perfc_incr(mshv_rdmsr_tsc_msr);
+        *val = d->arch.hvm_domain.viridian.reference_tsc.raw;
+        break;
+
     case VIRIDIAN_MSR_TIME_REF_COUNT:
     {
         uint64_t tsc;
@@ -452,6 +558,7 @@ static int viridian_save_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
 
     ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw;
     ctxt.guest_os_id   = d->arch.hvm_domain.viridian.guest_os_id.raw;
+    ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw;
 
     return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0);
 }
@@ -460,11 +567,15 @@ static int viridian_load_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_viridian_domain_context ctxt;
 
-    if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
+    if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
         return -EINVAL;
 
     d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa;
     d->arch.hvm_domain.viridian.guest_os_id.raw   = ctxt.guest_os_id;
+    d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc;
+
+    if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
+        update_reference_tsc(d, 0);
 
     return 0;
 }
diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-x86/hvm/viridian.h
index 496da33..35a22f5 100644
--- a/xen/include/asm-x86/hvm/viridian.h
+++ b/xen/include/asm-x86/hvm/viridian.h
@@ -48,10 +48,35 @@ union viridian_hypercall_gpa
     } fields;
 };
 
+union viridian_reference_tsc
+{
+    uint64_t raw;
+    struct
+    {
+        uint64_t enabled:1;
+        uint64_t reserved_preserved:11;
+        uint64_t pfn:48;
+    } fields;
+};
+
+/*
+ * Type defintion as in Microsoft Hypervisor Top-Level Functional
+ * Specification v4.0a, section 15.4.2.
+ */
+typedef struct _HV_REFERENCE_TSC_PAGE
+{
+    uint32_t TscSequence;
+    uint32_t Reserved1;
+    uint64_t TscScale;
+    int64_t  TscOffset;
+    uint64_t Reserved2[509];
+} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
+
 struct viridian_domain
 {
     union viridian_guest_os_id guest_os_id;
     union viridian_hypercall_gpa hypercall_gpa;
+    union viridian_reference_tsc reference_tsc;
 };
 
 int
@@ -76,3 +101,13 @@ int
 viridian_hypercall(struct cpu_user_regs *regs);
 
 #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 16d85a3..006924b 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave {
 struct hvm_viridian_domain_context {
     uint64_t hypercall_gpa;
     uint64_t guest_os_id;
+    uint64_t reference_tsc;
 };
 
 DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct hvm_viridian_domain_context);
@@ -616,3 +617,13 @@ struct hvm_msr {
 #define HVM_SAVE_CODE_MAX 20
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 3c51072..a2d43bc 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -92,10 +92,15 @@
 #define _HVMPV_time_ref_count 2
 #define HVMPV_time_ref_count  (1 << _HVMPV_time_ref_count)
 
+/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */
+#define _HVMPV_reference_tsc 3
+#define HVMPV_reference_tsc  (1 << _HVMPV_reference_tsc)
+
 #define HVMPV_feature_mask \
 	(HVMPV_base_freq | \
 	 HVMPV_no_freq | \
-	 HVMPV_time_ref_count)
+	 HVMPV_time_ref_count | \
+	 HVMPV_reference_tsc)
 
 #endif
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-09-29 10:28 [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment Paul Durrant
@ 2014-10-10 11:55 ` Egger, Christoph
  2014-10-13  8:53   ` Egger, Christoph
  2014-10-10 16:36 ` Matt Wilson
  1 sibling, 1 reply; 14+ messages in thread
From: Egger, Christoph @ 2014-10-10 11:55 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Ian Jackson, Keir Fraser, Ian Campbell, Jan Beulich,
	Stefano Stabellini

On 29.09.14 12:28, Paul Durrant wrote:
> The presence of the partition reference time enlightenment persuades newer
> versions of Windows to prefer the TSC as their primary time source. Hence,
> if rdtsc is not being emulated and is invariant then many vmexits (for
> alternative time sources such as the HPET or reference counter MSR) can
> be avoided.
> 
> The implementation is not yet complete as no attempt is made to prevent
> emulation of rdtsc if the enlightenment is active and guest and host
> TSC frequencies differ. To do that requires invasive changes in the core
> x86 time code and hence a lot more testing.
> 
> This patch avoids the issue by disabling the enlightenment if rdtsc is
> being emulated, causing Windows to choose another time source. This is
> safe, but may cause a big variation in performance of guests migrated
> between hosts of differing TSC frequency. Thus the enlightenment is not
> enabled in the default set, but may be enabled to improve guest performance
> where such migrations are not a concern.
> 
> See section 15.4 of the Microsoft Hypervisor Top Level Functional
> Specification v4.0a for details.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Christoph Egger <chegger@amazon.de>

Reviewed-by: Christoph Egger <chegger@amazon.de>

Christoph

> ---
> v2: - Addressed comments from Jan:
>       - Leave enlightenment out of default set and add a comment explaining
>         why.
>       - De-dup code surrounding adjustment of sequence number on resume.
>       - Remove accidental code deletion.
> 
>  docs/man/xl.cfg.pod.5                  |    6 ++
>  tools/libxl/libxl_dom.c                |    3 +
>  tools/libxl/libxl_types.idl            |    1 +
>  xen/arch/x86/hvm/viridian.c            |  113 +++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/hvm/viridian.h     |   35 ++++++++++
>  xen/include/public/arch-x86/hvm/save.h |   11 ++++
>  xen/include/public/hvm/params.h        |    7 +-
>  7 files changed, 174 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 8bba21c..925adbb 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1163,6 +1163,12 @@ This group incorporates Partition Time Reference Counter MSR. This
>  enlightenment can improve performance of Windows 8 and Windows
>  Server 2012 onwards.
>  
> +=item B<reference_tsc>
> +
> +This set incorporates the Partition Reference TSC MSR. This
> +enlightenment can improve performance of Windows 7 and Windows
> +Server 2008 R2 onwards.
> +
>  =item B<defaults>
>  
>  This is a special value that enables the default set of groups, which
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index bd21841..d04fc0d 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT))
>          mask |= HVMPV_time_ref_count;
>  
> +    if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC))
> +        mask |= HVMPV_reference_tsc;
> +
>      if (mask != 0 &&
>          xc_hvm_param_set(CTX->xch,
>                           domid,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 90d152f..da75a40 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -183,6 +183,7 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
>      (0, "base"),
>      (1, "freq"),
>      (2, "time_ref_count"),
> +    (3, "reference_tsc"),
>      ])
>  
>  #
> diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
> index 6726168..a7d3283 100644
> --- a/xen/arch/x86/hvm/viridian.c
> +++ b/xen/arch/x86/hvm/viridian.c
> @@ -21,6 +21,7 @@
>  #define VIRIDIAN_MSR_HYPERCALL                  0x40000001
>  #define VIRIDIAN_MSR_VP_INDEX                   0x40000002
>  #define VIRIDIAN_MSR_TIME_REF_COUNT             0x40000020
> +#define VIRIDIAN_MSR_REFERENCE_TSC              0x40000021
>  #define VIRIDIAN_MSR_TSC_FREQUENCY              0x40000022
>  #define VIRIDIAN_MSR_APIC_FREQUENCY             0x40000023
>  #define VIRIDIAN_MSR_EOI                        0x40000070
> @@ -40,6 +41,7 @@
>  #define CPUID3A_MSR_APIC_ACCESS    (1 << 4)
>  #define CPUID3A_MSR_HYPERCALL      (1 << 5)
>  #define CPUID3A_MSR_VP_INDEX       (1 << 6)
> +#define CPUID3A_MSR_REFERENCE_TSC  (1 << 9)
>  #define CPUID3A_MSR_FREQ           (1 << 11)
>  
>  /* Viridian CPUID 4000004, Implementation Recommendations. */
> @@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned int *eax,
>              *eax |= CPUID3A_MSR_FREQ;
>          if ( viridian_feature_mask(d) & HVMPV_time_ref_count )
>              *eax |= CPUID3A_MSR_TIME_REF_COUNT;
> +        if ( viridian_feature_mask(d) & HVMPV_reference_tsc )
> +            *eax |= CPUID3A_MSR_REFERENCE_TSC;
>          break;
>      case 4:
>          /* Recommended hypercall usage. */
> @@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu *v)
>             v, aa->fields.enabled, (unsigned long)aa->fields.pfn);
>  }
>  
> +static void dump_reference_tsc(const struct domain *d)
> +{
> +    const union viridian_reference_tsc *rt;
> +
> +    rt = &d->arch.hvm_domain.viridian.reference_tsc;
> +    
> +    printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: %x pfn: %lx\n",
> +           d->domain_id,
> +           rt->fields.enabled, (unsigned long)rt->fields.pfn);
> +}
> +
>  static void enable_hypercall_page(struct domain *d)
>  {
>      unsigned long gmfn = d->arch.hvm_domain.viridian.hypercall_gpa.fields.pfn;
> @@ -224,6 +239,78 @@ static void initialize_apic_assist(struct vcpu *v)
>      put_page_and_type(page);
>  }
>  
> +static void update_reference_tsc(struct domain *d, bool_t initialize)
> +{
> +    unsigned long gmfn = d->arch.hvm_domain.viridian.reference_tsc.fields.pfn;
> +    struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
> +    HV_REFERENCE_TSC_PAGE *p;
> +
> +    if ( !page || !get_page_type(page, PGT_writable_page) )
> +    {
> +        if ( page )
> +            put_page(page);
> +        gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn,
> +                 page ? page_to_mfn(page) : INVALID_MFN);
> +        return;
> +    }
> +
> +    p = __map_domain_page(page);
> +
> +    if ( initialize )
> +        clear_page(p);
> +
> +    /*
> +     * This enlightenment must be disabled is the host TSC is not invariant.
> +     * However it is also disabled if vtsc is true (which means rdtsc is being
> +     * emulated). This generally happens when guest TSC freq and host TSC freq
> +     * don't match. The TscScale value could be adjusted to cope with this,
> +     * allowing vtsc to be turned off, but support for this is not yet present
> +     * in the hypervisor. Thus is it is possible that migrating a Windows VM
> +     * between hosts of differing TSC frequencies may result in large
> +     * differences in guest performance.
> +     */
> +    if ( !host_tsc_is_safe() || d->arch.vtsc )
> +    {
> +        /*
> +         * The specification states that valid values of TscSequence range
> +         * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate
> +         * this mechanism is no longer a reliable source of time and that
> +         * the VM should fall back to a different source.
> +         *
> +         * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually violate
> +         * the spec. and rely on a value of 0 to indicate that this
> +         * enlightenment should no longer be used. These two kernel
> +         * versions are currently the only ones to make use of this
> +         * enlightenment, so just use 0 here.
> +         */
> +        p->TscSequence = 0;
> +
> +        printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: invalidated\n",
> +               d->domain_id);
> +        return;
> +    }
> +
> +    /*
> +     * The guest will calculate reference time according to the following
> +     * formula:
> +     *
> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> +     *
> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> +     * ticks per 100ns shifted left by 64.
> +     */
> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> +
> +    do {
> +        p->TscSequence++;
> +    } while ( p->TscSequence == 0xFFFFFFFF ||
> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> +
> +    unmap_domain_page(p);
> +
> +    put_page_and_type(page);
> +}
> +
>  int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
>  {
>      struct vcpu *v = current;
> @@ -282,6 +369,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
>              initialize_apic_assist(v);
>          break;
>  
> +    case VIRIDIAN_MSR_REFERENCE_TSC:
> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
> +            return 0;
> +
> +        perfc_incr(mshv_wrmsr_tsc_msr);
> +        d->arch.hvm_domain.viridian.reference_tsc.raw = val;
> +        dump_reference_tsc(d);
> +        if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
> +            update_reference_tsc(d, 1);
> +        break;
> +
>      default:
>          return 0;
>      }
> @@ -346,6 +444,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t *val)
>          *val = v->arch.hvm_vcpu.viridian.apic_assist.raw;
>          break;
>  
> +    case VIRIDIAN_MSR_REFERENCE_TSC:
> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
> +            return 0;
> +
> +        perfc_incr(mshv_rdmsr_tsc_msr);
> +        *val = d->arch.hvm_domain.viridian.reference_tsc.raw;
> +        break;
> +
>      case VIRIDIAN_MSR_TIME_REF_COUNT:
>      {
>          uint64_t tsc;
> @@ -452,6 +558,7 @@ static int viridian_save_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
>  
>      ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw;
>      ctxt.guest_os_id   = d->arch.hvm_domain.viridian.guest_os_id.raw;
> +    ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw;
>  
>      return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0);
>  }
> @@ -460,11 +567,15 @@ static int viridian_load_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
>  {
>      struct hvm_viridian_domain_context ctxt;
>  
> -    if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
> +    if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
>          return -EINVAL;
>  
>      d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa;
>      d->arch.hvm_domain.viridian.guest_os_id.raw   = ctxt.guest_os_id;
> +    d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc;
> +
> +    if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
> +        update_reference_tsc(d, 0);
>  
>      return 0;
>  }
> diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-x86/hvm/viridian.h
> index 496da33..35a22f5 100644
> --- a/xen/include/asm-x86/hvm/viridian.h
> +++ b/xen/include/asm-x86/hvm/viridian.h
> @@ -48,10 +48,35 @@ union viridian_hypercall_gpa
>      } fields;
>  };
>  
> +union viridian_reference_tsc
> +{
> +    uint64_t raw;
> +    struct
> +    {
> +        uint64_t enabled:1;
> +        uint64_t reserved_preserved:11;
> +        uint64_t pfn:48;
> +    } fields;
> +};
> +
> +/*
> + * Type defintion as in Microsoft Hypervisor Top-Level Functional
> + * Specification v4.0a, section 15.4.2.
> + */
> +typedef struct _HV_REFERENCE_TSC_PAGE
> +{
> +    uint32_t TscSequence;
> +    uint32_t Reserved1;
> +    uint64_t TscScale;
> +    int64_t  TscOffset;
> +    uint64_t Reserved2[509];
> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> +
>  struct viridian_domain
>  {
>      union viridian_guest_os_id guest_os_id;
>      union viridian_hypercall_gpa hypercall_gpa;
> +    union viridian_reference_tsc reference_tsc;
>  };
>  
>  int
> @@ -76,3 +101,13 @@ int
>  viridian_hypercall(struct cpu_user_regs *regs);
>  
>  #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
> index 16d85a3..006924b 100644
> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave {
>  struct hvm_viridian_domain_context {
>      uint64_t hypercall_gpa;
>      uint64_t guest_os_id;
> +    uint64_t reference_tsc;
>  };
>  
>  DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct hvm_viridian_domain_context);
> @@ -616,3 +617,13 @@ struct hvm_msr {
>  #define HVM_SAVE_CODE_MAX 20
>  
>  #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index 3c51072..a2d43bc 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -92,10 +92,15 @@
>  #define _HVMPV_time_ref_count 2
>  #define HVMPV_time_ref_count  (1 << _HVMPV_time_ref_count)
>  
> +/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */
> +#define _HVMPV_reference_tsc 3
> +#define HVMPV_reference_tsc  (1 << _HVMPV_reference_tsc)
> +
>  #define HVMPV_feature_mask \
>  	(HVMPV_base_freq | \
>  	 HVMPV_no_freq | \
> -	 HVMPV_time_ref_count)
> +	 HVMPV_time_ref_count | \
> +	 HVMPV_reference_tsc)
>  
>  #endif
>  
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-09-29 10:28 [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment Paul Durrant
  2014-10-10 11:55 ` Egger, Christoph
@ 2014-10-10 16:36 ` Matt Wilson
  2014-10-13  8:10   ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Matt Wilson @ 2014-10-10 16:36 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Christoph Egger,
	Ian Jackson, xen-devel, Jan Beulich, Anthony Liguori

On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
> The presence of the partition reference time enlightenment persuades newer
> versions of Windows to prefer the TSC as their primary time source. Hence,
> if rdtsc is not being emulated and is invariant then many vmexits (for
> alternative time sources such as the HPET or reference counter MSR) can
> be avoided.
> 
> The implementation is not yet complete as no attempt is made to prevent
> emulation of rdtsc if the enlightenment is active and guest and host
> TSC frequencies differ. To do that requires invasive changes in the core
> x86 time code and hence a lot more testing.
> 
> This patch avoids the issue by disabling the enlightenment if rdtsc is
> being emulated, causing Windows to choose another time source. This is
> safe, but may cause a big variation in performance of guests migrated
> between hosts of differing TSC frequency. Thus the enlightenment is not
> enabled in the default set, but may be enabled to improve guest performance
> where such migrations are not a concern.
> 
> See section 15.4 of the Microsoft Hypervisor Top Level Functional
> Specification v4.0a for details.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Christoph Egger <chegger@amazon.de>

[...]

> +    /*
> +     * The guest will calculate reference time according to the following
> +     * formula:
> +     *
> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> +     *
> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> +     * ticks per 100ns shifted left by 64.
> +     */
> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> +
> +    do {
> +        p->TscSequence++;
> +    } while ( p->TscSequence == 0xFFFFFFFF ||
> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */

Anthony Liguori and I were looking this over today and he pointed
something out: couldn't a second vCPU of the guest write 0 or
0xffffffff in a tight loop to cause a hypervisor DoS?

--msw

> +    unmap_domain_page(p);
> +
> +    put_page_and_type(page);
> +}
> +
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-10 16:36 ` Matt Wilson
@ 2014-10-13  8:10   ` Jan Beulich
  2014-10-14  7:45     ` Ian Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-10-13  8:10 UTC (permalink / raw)
  To: Paul Durrant, Matt Wilson
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Christoph Egger,
	Ian Jackson, xen-devel, Anthony Liguori

>>> On 10.10.14 at 18:36, <msw@linux.com> wrote:
> On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
>> +    /*
>> +     * The guest will calculate reference time according to the following
>> +     * formula:
>> +     *
>> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
>> +     *
>> +     * Windows uses a 100ns tick, so we need a scale which is cpu
>> +     * ticks per 100ns shifted left by 64.
>> +     */
>> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
>> +
>> +    do {
>> +        p->TscSequence++;
>> +    } while ( p->TscSequence == 0xFFFFFFFF ||
>> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> 
> Anthony Liguori and I were looking this over today and he pointed
> something out: couldn't a second vCPU of the guest write 0 or
> 0xffffffff in a tight loop to cause a hypervisor DoS?

Yes, this is at least a theoretical issue that should be fixed. I don't
think it's a practical issue though: I'd expect the compiler to eliminate
the two reads of the field and instead directly use the result of the
increment.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-10 11:55 ` Egger, Christoph
@ 2014-10-13  8:53   ` Egger, Christoph
  2014-10-13  9:31     ` Jan Beulich
  2014-10-13 10:33     ` Paul Durrant
  0 siblings, 2 replies; 14+ messages in thread
From: Egger, Christoph @ 2014-10-13  8:53 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Keir Fraser, Ian Jackson, Ian Campbell, Jan Beulich,
	Stefano Stabellini

On 10.10.14 13:55, Egger, Christoph wrote:
> On 29.09.14 12:28, Paul Durrant wrote:
>> The presence of the partition reference time enlightenment persuades newer
>> versions of Windows to prefer the TSC as their primary time source. Hence,
>> if rdtsc is not being emulated and is invariant then many vmexits (for
>> alternative time sources such as the HPET or reference counter MSR) can
>> be avoided.
>>
>> The implementation is not yet complete as no attempt is made to prevent
>> emulation of rdtsc if the enlightenment is active and guest and host
>> TSC frequencies differ. To do that requires invasive changes in the core
>> x86 time code and hence a lot more testing.
>>
>> This patch avoids the issue by disabling the enlightenment if rdtsc is
>> being emulated, causing Windows to choose another time source. This is
>> safe, but may cause a big variation in performance of guests migrated
>> between hosts of differing TSC frequency. Thus the enlightenment is not
>> enabled in the default set, but may be enabled to improve guest performance
>> where such migrations are not a concern.
>>
>> See section 15.4 of the Microsoft Hypervisor Top Level Functional
>> Specification v4.0a for details.
>>
>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>> Cc: Keir Fraser <keir@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Ian Campbell <ian.campbell@citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Cc: Christoph Egger <chegger@amazon.de>
> 
> Reviewed-by: Christoph Egger <chegger@amazon.de>

I found one issue in update_reference_tsc(). See below.

Christoph

> 
> Christoph
> 
>> ---
>> v2: - Addressed comments from Jan:
>>       - Leave enlightenment out of default set and add a comment explaining
>>         why.
>>       - De-dup code surrounding adjustment of sequence number on resume.
>>       - Remove accidental code deletion.
>>
>>  docs/man/xl.cfg.pod.5                  |    6 ++
>>  tools/libxl/libxl_dom.c                |    3 +
>>  tools/libxl/libxl_types.idl            |    1 +
>>  xen/arch/x86/hvm/viridian.c            |  113 +++++++++++++++++++++++++++++++-
>>  xen/include/asm-x86/hvm/viridian.h     |   35 ++++++++++
>>  xen/include/public/arch-x86/hvm/save.h |   11 ++++
>>  xen/include/public/hvm/params.h        |    7 +-
>>  7 files changed, 174 insertions(+), 2 deletions(-)
>>
>> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
>> index 8bba21c..925adbb 100644
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -1163,6 +1163,12 @@ This group incorporates Partition Time Reference Counter MSR. This
>>  enlightenment can improve performance of Windows 8 and Windows
>>  Server 2012 onwards.
>>  
>> +=item B<reference_tsc>
>> +
>> +This set incorporates the Partition Reference TSC MSR. This
>> +enlightenment can improve performance of Windows 7 and Windows
>> +Server 2008 R2 onwards.
>> +
>>  =item B<defaults>
>>  
>>  This is a special value that enables the default set of groups, which
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index bd21841..d04fc0d 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
>>      if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT))
>>          mask |= HVMPV_time_ref_count;
>>  
>> +    if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC))
>> +        mask |= HVMPV_reference_tsc;
>> +
>>      if (mask != 0 &&
>>          xc_hvm_param_set(CTX->xch,
>>                           domid,
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 90d152f..da75a40 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -183,6 +183,7 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
>>      (0, "base"),
>>      (1, "freq"),
>>      (2, "time_ref_count"),
>> +    (3, "reference_tsc"),
>>      ])
>>  
>>  #
>> diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
>> index 6726168..a7d3283 100644
>> --- a/xen/arch/x86/hvm/viridian.c
>> +++ b/xen/arch/x86/hvm/viridian.c
>> @@ -21,6 +21,7 @@
>>  #define VIRIDIAN_MSR_HYPERCALL                  0x40000001
>>  #define VIRIDIAN_MSR_VP_INDEX                   0x40000002
>>  #define VIRIDIAN_MSR_TIME_REF_COUNT             0x40000020
>> +#define VIRIDIAN_MSR_REFERENCE_TSC              0x40000021
>>  #define VIRIDIAN_MSR_TSC_FREQUENCY              0x40000022
>>  #define VIRIDIAN_MSR_APIC_FREQUENCY             0x40000023
>>  #define VIRIDIAN_MSR_EOI                        0x40000070
>> @@ -40,6 +41,7 @@
>>  #define CPUID3A_MSR_APIC_ACCESS    (1 << 4)
>>  #define CPUID3A_MSR_HYPERCALL      (1 << 5)
>>  #define CPUID3A_MSR_VP_INDEX       (1 << 6)
>> +#define CPUID3A_MSR_REFERENCE_TSC  (1 << 9)
>>  #define CPUID3A_MSR_FREQ           (1 << 11)
>>  
>>  /* Viridian CPUID 4000004, Implementation Recommendations. */
>> @@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned int *eax,
>>              *eax |= CPUID3A_MSR_FREQ;
>>          if ( viridian_feature_mask(d) & HVMPV_time_ref_count )
>>              *eax |= CPUID3A_MSR_TIME_REF_COUNT;
>> +        if ( viridian_feature_mask(d) & HVMPV_reference_tsc )
>> +            *eax |= CPUID3A_MSR_REFERENCE_TSC;
>>          break;
>>      case 4:
>>          /* Recommended hypercall usage. */
>> @@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu *v)
>>             v, aa->fields.enabled, (unsigned long)aa->fields.pfn);
>>  }
>>  
>> +static void dump_reference_tsc(const struct domain *d)
>> +{
>> +    const union viridian_reference_tsc *rt;
>> +
>> +    rt = &d->arch.hvm_domain.viridian.reference_tsc;
>> +    
>> +    printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: %x pfn: %lx\n",
>> +           d->domain_id,
>> +           rt->fields.enabled, (unsigned long)rt->fields.pfn);
>> +}
>> +
>>  static void enable_hypercall_page(struct domain *d)
>>  {
>>      unsigned long gmfn = d->arch.hvm_domain.viridian.hypercall_gpa.fields.pfn;
>> @@ -224,6 +239,78 @@ static void initialize_apic_assist(struct vcpu *v)
>>      put_page_and_type(page);
>>  }
>>  
>> +static void update_reference_tsc(struct domain *d, bool_t initialize)
>> +{
>> +    unsigned long gmfn = d->arch.hvm_domain.viridian.reference_tsc.fields.pfn;
>> +    struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
>> +    HV_REFERENCE_TSC_PAGE *p;
>> +
>> +    if ( !page || !get_page_type(page, PGT_writable_page) )
>> +    {
>> +        if ( page )
>> +            put_page(page);
>> +        gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn,
>> +                 page ? page_to_mfn(page) : INVALID_MFN);
>> +        return;
>> +    }
>> +
>> +    p = __map_domain_page(page);
>> +
>> +    if ( initialize )
>> +        clear_page(p);
>> +
>> +    /*
>> +     * This enlightenment must be disabled is the host TSC is not invariant.
>> +     * However it is also disabled if vtsc is true (which means rdtsc is being
>> +     * emulated). This generally happens when guest TSC freq and host TSC freq
>> +     * don't match. The TscScale value could be adjusted to cope with this,
>> +     * allowing vtsc to be turned off, but support for this is not yet present
>> +     * in the hypervisor. Thus is it is possible that migrating a Windows VM
>> +     * between hosts of differing TSC frequencies may result in large
>> +     * differences in guest performance.
>> +     */
>> +    if ( !host_tsc_is_safe() || d->arch.vtsc )
>> +    {
>> +        /*
>> +         * The specification states that valid values of TscSequence range
>> +         * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate
>> +         * this mechanism is no longer a reliable source of time and that
>> +         * the VM should fall back to a different source.
>> +         *
>> +         * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually violate
>> +         * the spec. and rely on a value of 0 to indicate that this
>> +         * enlightenment should no longer be used. These two kernel
>> +         * versions are currently the only ones to make use of this
>> +         * enlightenment, so just use 0 here.
>> +         */
>> +        p->TscSequence = 0;
>> +
>> +        printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: invalidated\n",
>> +               d->domain_id);
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * The guest will calculate reference time according to the following
>> +     * formula:
>> +     *
>> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
>> +     *
>> +     * Windows uses a 100ns tick, so we need a scale which is cpu
>> +     * ticks per 100ns shifted left by 64.
>> +     */
>> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
>> +
>> +    do {
>> +        p->TscSequence++;
>> +    } while ( p->TscSequence == 0xFFFFFFFF ||
>> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */

This is reading guest memory so a malicious guest can spin writing 0 and
cause a DoS.

Better do here:

        p->TscSequence++;
        if ( p->TscSequence == 0xFFFFFFFF || p->TscSequence == 0 )
            p->TscSequence = 1;

Christoph

>> +
>> +    unmap_domain_page(p);
>> +
>> +    put_page_and_type(page);
>> +}
>> +
>>  int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
>>  {
>>      struct vcpu *v = current;
>> @@ -282,6 +369,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
>>              initialize_apic_assist(v);
>>          break;
>>  
>> +    case VIRIDIAN_MSR_REFERENCE_TSC:
>> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
>> +            return 0;
>> +
>> +        perfc_incr(mshv_wrmsr_tsc_msr);
>> +        d->arch.hvm_domain.viridian.reference_tsc.raw = val;
>> +        dump_reference_tsc(d);
>> +        if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
>> +            update_reference_tsc(d, 1);
>> +        break;
>> +
>>      default:
>>          return 0;
>>      }
>> @@ -346,6 +444,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t *val)
>>          *val = v->arch.hvm_vcpu.viridian.apic_assist.raw;
>>          break;
>>  
>> +    case VIRIDIAN_MSR_REFERENCE_TSC:
>> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
>> +            return 0;
>> +
>> +        perfc_incr(mshv_rdmsr_tsc_msr);
>> +        *val = d->arch.hvm_domain.viridian.reference_tsc.raw;
>> +        break;
>> +
>>      case VIRIDIAN_MSR_TIME_REF_COUNT:
>>      {
>>          uint64_t tsc;
>> @@ -452,6 +558,7 @@ static int viridian_save_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
>>  
>>      ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw;
>>      ctxt.guest_os_id   = d->arch.hvm_domain.viridian.guest_os_id.raw;
>> +    ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw;
>>  
>>      return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0);
>>  }
>> @@ -460,11 +567,15 @@ static int viridian_load_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
>>  {
>>      struct hvm_viridian_domain_context ctxt;
>>  
>> -    if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
>> +    if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
>>          return -EINVAL;
>>  
>>      d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa;
>>      d->arch.hvm_domain.viridian.guest_os_id.raw   = ctxt.guest_os_id;
>> +    d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc;
>> +
>> +    if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
>> +        update_reference_tsc(d, 0);
>>  
>>      return 0;
>>  }
>> diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-x86/hvm/viridian.h
>> index 496da33..35a22f5 100644
>> --- a/xen/include/asm-x86/hvm/viridian.h
>> +++ b/xen/include/asm-x86/hvm/viridian.h
>> @@ -48,10 +48,35 @@ union viridian_hypercall_gpa
>>      } fields;
>>  };
>>  
>> +union viridian_reference_tsc
>> +{
>> +    uint64_t raw;
>> +    struct
>> +    {
>> +        uint64_t enabled:1;
>> +        uint64_t reserved_preserved:11;
>> +        uint64_t pfn:48;
>> +    } fields;
>> +};
>> +
>> +/*
>> + * Type defintion as in Microsoft Hypervisor Top-Level Functional
>> + * Specification v4.0a, section 15.4.2.
>> + */
>> +typedef struct _HV_REFERENCE_TSC_PAGE
>> +{
>> +    uint32_t TscSequence;
>> +    uint32_t Reserved1;
>> +    uint64_t TscScale;
>> +    int64_t  TscOffset;
>> +    uint64_t Reserved2[509];
>> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
>> +
>>  struct viridian_domain
>>  {
>>      union viridian_guest_os_id guest_os_id;
>>      union viridian_hypercall_gpa hypercall_gpa;
>> +    union viridian_reference_tsc reference_tsc;
>>  };
>>  
>>  int
>> @@ -76,3 +101,13 @@ int
>>  viridian_hypercall(struct cpu_user_regs *regs);
>>  
>>  #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
>> index 16d85a3..006924b 100644
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave {
>>  struct hvm_viridian_domain_context {
>>      uint64_t hypercall_gpa;
>>      uint64_t guest_os_id;
>> +    uint64_t reference_tsc;
>>  };
>>  
>>  DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct hvm_viridian_domain_context);
>> @@ -616,3 +617,13 @@ struct hvm_msr {
>>  #define HVM_SAVE_CODE_MAX 20
>>  
>>  #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
>> index 3c51072..a2d43bc 100644
>> --- a/xen/include/public/hvm/params.h
>> +++ b/xen/include/public/hvm/params.h
>> @@ -92,10 +92,15 @@
>>  #define _HVMPV_time_ref_count 2
>>  #define HVMPV_time_ref_count  (1 << _HVMPV_time_ref_count)
>>  
>> +/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */
>> +#define _HVMPV_reference_tsc 3
>> +#define HVMPV_reference_tsc  (1 << _HVMPV_reference_tsc)
>> +
>>  #define HVMPV_feature_mask \
>>  	(HVMPV_base_freq | \
>>  	 HVMPV_no_freq | \
>> -	 HVMPV_time_ref_count)
>> +	 HVMPV_time_ref_count | \
>> +	 HVMPV_reference_tsc)
>>  
>>  #endif
>>  
>>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-13  8:53   ` Egger, Christoph
@ 2014-10-13  9:31     ` Jan Beulich
  2014-10-13 10:33     ` Paul Durrant
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-10-13  9:31 UTC (permalink / raw)
  To: Christoph Egger, Paul Durrant, xen-devel
  Cc: Keir Fraser, Ian Jackson, IanCampbell, Stefano Stabellini

>>> On 13.10.14 at 10:53, <chegger@amazon.de> wrote:
> On 10.10.14 13:55, Egger, Christoph wrote:
>> On 29.09.14 12:28, Paul Durrant wrote:
>> Reviewed-by: Christoph Egger <chegger@amazon.de>
> 
> I found one issue in update_reference_tsc(). See below.

As pointed out before.

>>> +    /*
>>> +     * The guest will calculate reference time according to the following
>>> +     * formula:
>>> +     *
>>> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
>>> +     *
>>> +     * Windows uses a 100ns tick, so we need a scale which is cpu
>>> +     * ticks per 100ns shifted left by 64.
>>> +     */
>>> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
>>> +
>>> +    do {
>>> +        p->TscSequence++;
>>> +    } while ( p->TscSequence == 0xFFFFFFFF ||
>>> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> 
> This is reading guest memory so a malicious guest can spin writing 0 and
> cause a DoS.
> 
> Better do here:
> 
>         p->TscSequence++;
>         if ( p->TscSequence == 0xFFFFFFFF || p->TscSequence == 0 )
>             p->TscSequence = 1;

	switch ( ++p->TscSequence )
	{
	case 0: case 0xffffffff:
		p->TscSequence = 1;
	}

And then again I would think it's wrong to store an "invalid" value
here even temporarily. I.e. we'd probably be better off computing
old value plus one into a local variable, skip the "invalid" values if
necessary, and only then store the new sequence number.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-13  8:53   ` Egger, Christoph
  2014-10-13  9:31     ` Jan Beulich
@ 2014-10-13 10:33     ` Paul Durrant
  1 sibling, 0 replies; 14+ messages in thread
From: Paul Durrant @ 2014-10-13 10:33 UTC (permalink / raw)
  To: Egger, Christoph, xen-devel@lists.xen.org
  Cc: Ian Jackson, Stefano Stabellini, Keir (Xen.org), Ian Campbell,
	Jan Beulich

> -----Original Message-----
> From: Egger, Christoph [mailto:chegger@amazon.de]
> Sent: 13 October 2014 09:54
> To: Paul Durrant; xen-devel@lists.xen.org
> Cc: Ian Jackson; Keir (Xen.org); Ian Campbell; Jan Beulich; Stefano Stabellini
> Subject: Re: [Xen-devel] [PATCH v2] x86/viridian: Add Partition Reference
> Time enlightenment
> 
> On 10.10.14 13:55, Egger, Christoph wrote:
> > On 29.09.14 12:28, Paul Durrant wrote:
> >> The presence of the partition reference time enlightenment persuades
> newer
> >> versions of Windows to prefer the TSC as their primary time source.
> Hence,
> >> if rdtsc is not being emulated and is invariant then many vmexits (for
> >> alternative time sources such as the HPET or reference counter MSR) can
> >> be avoided.
> >>
> >> The implementation is not yet complete as no attempt is made to prevent
> >> emulation of rdtsc if the enlightenment is active and guest and host
> >> TSC frequencies differ. To do that requires invasive changes in the core
> >> x86 time code and hence a lot more testing.
> >>
> >> This patch avoids the issue by disabling the enlightenment if rdtsc is
> >> being emulated, causing Windows to choose another time source. This is
> >> safe, but may cause a big variation in performance of guests migrated
> >> between hosts of differing TSC frequency. Thus the enlightenment is not
> >> enabled in the default set, but may be enabled to improve guest
> performance
> >> where such migrations are not a concern.
> >>
> >> See section 15.4 of the Microsoft Hypervisor Top Level Functional
> >> Specification v4.0a for details.
> >>
> >> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >> Cc: Keir Fraser <keir@xen.org>
> >> Cc: Jan Beulich <jbeulich@suse.com>
> >> Cc: Ian Campbell <ian.campbell@citrix.com>
> >> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> >> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> >> Cc: Christoph Egger <chegger@amazon.de>
> >
> > Reviewed-by: Christoph Egger <chegger@amazon.de>
> 
> I found one issue in update_reference_tsc(). See below.
> 
> Christoph
> 
> >
> > Christoph
> >
> >> ---
> >> v2: - Addressed comments from Jan:
> >>       - Leave enlightenment out of default set and add a comment
> explaining
> >>         why.
> >>       - De-dup code surrounding adjustment of sequence number on
> resume.
> >>       - Remove accidental code deletion.
> >>
> >>  docs/man/xl.cfg.pod.5                  |    6 ++
> >>  tools/libxl/libxl_dom.c                |    3 +
> >>  tools/libxl/libxl_types.idl            |    1 +
> >>  xen/arch/x86/hvm/viridian.c            |  113
> +++++++++++++++++++++++++++++++-
> >>  xen/include/asm-x86/hvm/viridian.h     |   35 ++++++++++
> >>  xen/include/public/arch-x86/hvm/save.h |   11 ++++
> >>  xen/include/public/hvm/params.h        |    7 +-
> >>  7 files changed, 174 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> >> index 8bba21c..925adbb 100644
> >> --- a/docs/man/xl.cfg.pod.5
> >> +++ b/docs/man/xl.cfg.pod.5
> >> @@ -1163,6 +1163,12 @@ This group incorporates Partition Time
> Reference Counter MSR. This
> >>  enlightenment can improve performance of Windows 8 and Windows
> >>  Server 2012 onwards.
> >>
> >> +=item B<reference_tsc>
> >> +
> >> +This set incorporates the Partition Reference TSC MSR. This
> >> +enlightenment can improve performance of Windows 7 and Windows
> >> +Server 2008 R2 onwards.
> >> +
> >>  =item B<defaults>
> >>
> >>  This is a special value that enables the default set of groups, which
> >> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> >> index bd21841..d04fc0d 100644
> >> --- a/tools/libxl/libxl_dom.c
> >> +++ b/tools/libxl/libxl_dom.c
> >> @@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc,
> uint32_t domid,
> >>      if (libxl_bitmap_test(&enlightenments,
> LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT))
> >>          mask |= HVMPV_time_ref_count;
> >>
> >> +    if (libxl_bitmap_test(&enlightenments,
> LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC))
> >> +        mask |= HVMPV_reference_tsc;
> >> +
> >>      if (mask != 0 &&
> >>          xc_hvm_param_set(CTX->xch,
> >>                           domid,
> >> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> >> index 90d152f..da75a40 100644
> >> --- a/tools/libxl/libxl_types.idl
> >> +++ b/tools/libxl/libxl_types.idl
> >> @@ -183,6 +183,7 @@ libxl_viridian_enlightenment =
> Enumeration("viridian_enlightenment", [
> >>      (0, "base"),
> >>      (1, "freq"),
> >>      (2, "time_ref_count"),
> >> +    (3, "reference_tsc"),
> >>      ])
> >>
> >>  #
> >> diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
> >> index 6726168..a7d3283 100644
> >> --- a/xen/arch/x86/hvm/viridian.c
> >> +++ b/xen/arch/x86/hvm/viridian.c
> >> @@ -21,6 +21,7 @@
> >>  #define VIRIDIAN_MSR_HYPERCALL                  0x40000001
> >>  #define VIRIDIAN_MSR_VP_INDEX                   0x40000002
> >>  #define VIRIDIAN_MSR_TIME_REF_COUNT             0x40000020
> >> +#define VIRIDIAN_MSR_REFERENCE_TSC              0x40000021
> >>  #define VIRIDIAN_MSR_TSC_FREQUENCY              0x40000022
> >>  #define VIRIDIAN_MSR_APIC_FREQUENCY             0x40000023
> >>  #define VIRIDIAN_MSR_EOI                        0x40000070
> >> @@ -40,6 +41,7 @@
> >>  #define CPUID3A_MSR_APIC_ACCESS    (1 << 4)
> >>  #define CPUID3A_MSR_HYPERCALL      (1 << 5)
> >>  #define CPUID3A_MSR_VP_INDEX       (1 << 6)
> >> +#define CPUID3A_MSR_REFERENCE_TSC  (1 << 9)
> >>  #define CPUID3A_MSR_FREQ           (1 << 11)
> >>
> >>  /* Viridian CPUID 4000004, Implementation Recommendations. */
> >> @@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned
> int *eax,
> >>              *eax |= CPUID3A_MSR_FREQ;
> >>          if ( viridian_feature_mask(d) & HVMPV_time_ref_count )
> >>              *eax |= CPUID3A_MSR_TIME_REF_COUNT;
> >> +        if ( viridian_feature_mask(d) & HVMPV_reference_tsc )
> >> +            *eax |= CPUID3A_MSR_REFERENCE_TSC;
> >>          break;
> >>      case 4:
> >>          /* Recommended hypercall usage. */
> >> @@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu
> *v)
> >>             v, aa->fields.enabled, (unsigned long)aa->fields.pfn);
> >>  }
> >>
> >> +static void dump_reference_tsc(const struct domain *d)
> >> +{
> >> +    const union viridian_reference_tsc *rt;
> >> +
> >> +    rt = &d->arch.hvm_domain.viridian.reference_tsc;
> >> +
> >> +    printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled:
> %x pfn: %lx\n",
> >> +           d->domain_id,
> >> +           rt->fields.enabled, (unsigned long)rt->fields.pfn);
> >> +}
> >> +
> >>  static void enable_hypercall_page(struct domain *d)
> >>  {
> >>      unsigned long gmfn = d-
> >arch.hvm_domain.viridian.hypercall_gpa.fields.pfn;
> >> @@ -224,6 +239,78 @@ static void initialize_apic_assist(struct vcpu *v)
> >>      put_page_and_type(page);
> >>  }
> >>
> >> +static void update_reference_tsc(struct domain *d, bool_t initialize)
> >> +{
> >> +    unsigned long gmfn = d-
> >arch.hvm_domain.viridian.reference_tsc.fields.pfn;
> >> +    struct page_info *page = get_page_from_gfn(d, gmfn, NULL,
> P2M_ALLOC);
> >> +    HV_REFERENCE_TSC_PAGE *p;
> >> +
> >> +    if ( !page || !get_page_type(page, PGT_writable_page) )
> >> +    {
> >> +        if ( page )
> >> +            put_page(page);
> >> +        gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn,
> >> +                 page ? page_to_mfn(page) : INVALID_MFN);
> >> +        return;
> >> +    }
> >> +
> >> +    p = __map_domain_page(page);
> >> +
> >> +    if ( initialize )
> >> +        clear_page(p);
> >> +
> >> +    /*
> >> +     * This enlightenment must be disabled is the host TSC is not invariant.
> >> +     * However it is also disabled if vtsc is true (which means rdtsc is being
> >> +     * emulated). This generally happens when guest TSC freq and host
> TSC freq
> >> +     * don't match. The TscScale value could be adjusted to cope with this,
> >> +     * allowing vtsc to be turned off, but support for this is not yet present
> >> +     * in the hypervisor. Thus is it is possible that migrating a Windows VM
> >> +     * between hosts of differing TSC frequencies may result in large
> >> +     * differences in guest performance.
> >> +     */
> >> +    if ( !host_tsc_is_safe() || d->arch.vtsc )
> >> +    {
> >> +        /*
> >> +         * The specification states that valid values of TscSequence range
> >> +         * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate
> >> +         * this mechanism is no longer a reliable source of time and that
> >> +         * the VM should fall back to a different source.
> >> +         *
> >> +         * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually violate
> >> +         * the spec. and rely on a value of 0 to indicate that this
> >> +         * enlightenment should no longer be used. These two kernel
> >> +         * versions are currently the only ones to make use of this
> >> +         * enlightenment, so just use 0 here.
> >> +         */
> >> +        p->TscSequence = 0;
> >> +
> >> +        printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC:
> invalidated\n",
> >> +               d->domain_id);
> >> +        return;
> >> +    }
> >> +
> >> +    /*
> >> +     * The guest will calculate reference time according to the following
> >> +     * formula:
> >> +     *
> >> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> >> +     *
> >> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> >> +     * ticks per 100ns shifted left by 64.
> >> +     */
> >> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> >> +
> >> +    do {
> >> +        p->TscSequence++;
> >> +    } while ( p->TscSequence == 0xFFFFFFFF ||
> >> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> 
> This is reading guest memory so a malicious guest can spin writing 0 and
> cause a DoS.
> 

Good point. That could only happen on the initial setting up of the page though, as the domain is paused in the other case. There's already a flag that indicates the initial case (since that clears the page). In this case the loop could simply be avoided and a value of 1 written in.

  Paul

> Better do here:
> 
>         p->TscSequence++;
>         if ( p->TscSequence == 0xFFFFFFFF || p->TscSequence == 0 )
>             p->TscSequence = 1;
> 
> Christoph
> 
> >> +
> >> +    unmap_domain_page(p);
> >> +
> >> +    put_page_and_type(page);
> >> +}
> >> +
> >>  int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
> >>  {
> >>      struct vcpu *v = current;
> >> @@ -282,6 +369,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t
> val)
> >>              initialize_apic_assist(v);
> >>          break;
> >>
> >> +    case VIRIDIAN_MSR_REFERENCE_TSC:
> >> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
> >> +            return 0;
> >> +
> >> +        perfc_incr(mshv_wrmsr_tsc_msr);
> >> +        d->arch.hvm_domain.viridian.reference_tsc.raw = val;
> >> +        dump_reference_tsc(d);
> >> +        if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
> >> +            update_reference_tsc(d, 1);
> >> +        break;
> >> +
> >>      default:
> >>          return 0;
> >>      }
> >> @@ -346,6 +444,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t
> *val)
> >>          *val = v->arch.hvm_vcpu.viridian.apic_assist.raw;
> >>          break;
> >>
> >> +    case VIRIDIAN_MSR_REFERENCE_TSC:
> >> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
> >> +            return 0;
> >> +
> >> +        perfc_incr(mshv_rdmsr_tsc_msr);
> >> +        *val = d->arch.hvm_domain.viridian.reference_tsc.raw;
> >> +        break;
> >> +
> >>      case VIRIDIAN_MSR_TIME_REF_COUNT:
> >>      {
> >>          uint64_t tsc;
> >> @@ -452,6 +558,7 @@ static int viridian_save_domain_ctxt(struct domain
> *d, hvm_domain_context_t *h)
> >>
> >>      ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw;
> >>      ctxt.guest_os_id   = d->arch.hvm_domain.viridian.guest_os_id.raw;
> >> +    ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw;
> >>
> >>      return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0);
> >>  }
> >> @@ -460,11 +567,15 @@ static int viridian_load_domain_ctxt(struct
> domain *d, hvm_domain_context_t *h)
> >>  {
> >>      struct hvm_viridian_domain_context ctxt;
> >>
> >> -    if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
> >> +    if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
> >>          return -EINVAL;
> >>
> >>      d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa;
> >>      d->arch.hvm_domain.viridian.guest_os_id.raw   = ctxt.guest_os_id;
> >> +    d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc;
> >> +
> >> +    if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
> >> +        update_reference_tsc(d, 0);
> >>
> >>      return 0;
> >>  }
> >> diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-
> x86/hvm/viridian.h
> >> index 496da33..35a22f5 100644
> >> --- a/xen/include/asm-x86/hvm/viridian.h
> >> +++ b/xen/include/asm-x86/hvm/viridian.h
> >> @@ -48,10 +48,35 @@ union viridian_hypercall_gpa
> >>      } fields;
> >>  };
> >>
> >> +union viridian_reference_tsc
> >> +{
> >> +    uint64_t raw;
> >> +    struct
> >> +    {
> >> +        uint64_t enabled:1;
> >> +        uint64_t reserved_preserved:11;
> >> +        uint64_t pfn:48;
> >> +    } fields;
> >> +};
> >> +
> >> +/*
> >> + * Type defintion as in Microsoft Hypervisor Top-Level Functional
> >> + * Specification v4.0a, section 15.4.2.
> >> + */
> >> +typedef struct _HV_REFERENCE_TSC_PAGE
> >> +{
> >> +    uint32_t TscSequence;
> >> +    uint32_t Reserved1;
> >> +    uint64_t TscScale;
> >> +    int64_t  TscOffset;
> >> +    uint64_t Reserved2[509];
> >> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> >> +
> >>  struct viridian_domain
> >>  {
> >>      union viridian_guest_os_id guest_os_id;
> >>      union viridian_hypercall_gpa hypercall_gpa;
> >> +    union viridian_reference_tsc reference_tsc;
> >>  };
> >>
> >>  int
> >> @@ -76,3 +101,13 @@ int
> >>  viridian_hypercall(struct cpu_user_regs *regs);
> >>
> >>  #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
> >> +
> >> +/*
> >> + * Local variables:
> >> + * mode: C
> >> + * c-file-style: "BSD"
> >> + * c-basic-offset: 4
> >> + * tab-width: 4
> >> + * indent-tabs-mode: nil
> >> + * End:
> >> + */
> >> diff --git a/xen/include/public/arch-x86/hvm/save.h
> b/xen/include/public/arch-x86/hvm/save.h
> >> index 16d85a3..006924b 100644
> >> --- a/xen/include/public/arch-x86/hvm/save.h
> >> +++ b/xen/include/public/arch-x86/hvm/save.h
> >> @@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave {
> >>  struct hvm_viridian_domain_context {
> >>      uint64_t hypercall_gpa;
> >>      uint64_t guest_os_id;
> >> +    uint64_t reference_tsc;
> >>  };
> >>
> >>  DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct
> hvm_viridian_domain_context);
> >> @@ -616,3 +617,13 @@ struct hvm_msr {
> >>  #define HVM_SAVE_CODE_MAX 20
> >>
> >>  #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
> >> +
> >> +/*
> >> + * Local variables:
> >> + * mode: C
> >> + * c-file-style: "BSD"
> >> + * c-basic-offset: 4
> >> + * tab-width: 4
> >> + * indent-tabs-mode: nil
> >> + * End:
> >> + */
> >> diff --git a/xen/include/public/hvm/params.h
> b/xen/include/public/hvm/params.h
> >> index 3c51072..a2d43bc 100644
> >> --- a/xen/include/public/hvm/params.h
> >> +++ b/xen/include/public/hvm/params.h
> >> @@ -92,10 +92,15 @@
> >>  #define _HVMPV_time_ref_count 2
> >>  #define HVMPV_time_ref_count  (1 << _HVMPV_time_ref_count)
> >>
> >> +/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */
> >> +#define _HVMPV_reference_tsc 3
> >> +#define HVMPV_reference_tsc  (1 << _HVMPV_reference_tsc)
> >> +
> >>  #define HVMPV_feature_mask \
> >>  	(HVMPV_base_freq | \
> >>  	 HVMPV_no_freq | \
> >> -	 HVMPV_time_ref_count)
> >> +	 HVMPV_time_ref_count | \
> >> +	 HVMPV_reference_tsc)
> >>
> >>  #endif
> >>
> >>
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
@ 2014-10-13 14:24 Paul Durrant
  2014-10-13 14:25 ` Paul Durrant
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Durrant @ 2014-10-13 14:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Christoph Egger,
	Ian Jackson, Paul Durrant, Jan Beulich

The presence of the partition reference time enlightenment persuades newer
versions of Windows to prefer the TSC as their primary time source. Hence,
if rdtsc is not being emulated and is invariant then many vmexits (for
alternative time sources such as the HPET or reference counter MSR) can
be avoided.

The implementation is not yet complete as no attempt is made to prevent
emulation of rdtsc if the enlightenment is active and guest and host
TSC frequencies differ. To do that requires invasive changes in the core
x86 time code and hence a lot more testing.

This patch avoids the issue by disabling the enlightenment if rdtsc is
being emulated, causing Windows to choose another time source. This is
safe, but may cause a big variation in performance of guests migrated
between hosts of differing TSC frequency. Thus the enlightenment is not
enabled in the default set, but may be enabled to improve guest performance
where such migrations are not a concern.

See section 15.4 of the Microsoft Hypervisor Top Level Functional
Specification v4.0a for details.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Christoph Egger <chegger@amazon.de>
---
v2:
 - Prevent TscSequence increment loop from being exposed to a running
   domain

 docs/man/xl.cfg.pod.5                  |    6 ++
 tools/libxl/libxl_dom.c                |    3 +
 tools/libxl/libxl_types.idl            |    1 +
 xen/arch/x86/hvm/viridian.c            |  116 +++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/viridian.h     |   35 ++++++++++
 xen/include/public/arch-x86/hvm/save.h |   11 +++
 xen/include/public/hvm/params.h        |    7 +-
 7 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8bba21c..925adbb 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1163,6 +1163,12 @@ This group incorporates Partition Time Reference Counter MSR. This
 enlightenment can improve performance of Windows 8 and Windows
 Server 2012 onwards.
 
+=item B<reference_tsc>
+
+This set incorporates the Partition Reference TSC MSR. This
+enlightenment can improve performance of Windows 7 and Windows
+Server 2008 R2 onwards.
+
 =item B<defaults>
 
 This is a special value that enables the default set of groups, which
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bd21841..d04fc0d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, uint32_t domid,
     if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT))
         mask |= HVMPV_time_ref_count;
 
+    if (libxl_bitmap_test(&enlightenments, LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC))
+        mask |= HVMPV_reference_tsc;
+
     if (mask != 0 &&
         xc_hvm_param_set(CTX->xch,
                          domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 90d152f..da75a40 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -183,6 +183,7 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
     (0, "base"),
     (1, "freq"),
     (2, "time_ref_count"),
+    (3, "reference_tsc"),
     ])
 
 #
diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
index 6726168..528bb8b 100644
--- a/xen/arch/x86/hvm/viridian.c
+++ b/xen/arch/x86/hvm/viridian.c
@@ -21,6 +21,7 @@
 #define VIRIDIAN_MSR_HYPERCALL                  0x40000001
 #define VIRIDIAN_MSR_VP_INDEX                   0x40000002
 #define VIRIDIAN_MSR_TIME_REF_COUNT             0x40000020
+#define VIRIDIAN_MSR_REFERENCE_TSC              0x40000021
 #define VIRIDIAN_MSR_TSC_FREQUENCY              0x40000022
 #define VIRIDIAN_MSR_APIC_FREQUENCY             0x40000023
 #define VIRIDIAN_MSR_EOI                        0x40000070
@@ -40,6 +41,7 @@
 #define CPUID3A_MSR_APIC_ACCESS    (1 << 4)
 #define CPUID3A_MSR_HYPERCALL      (1 << 5)
 #define CPUID3A_MSR_VP_INDEX       (1 << 6)
+#define CPUID3A_MSR_REFERENCE_TSC  (1 << 9)
 #define CPUID3A_MSR_FREQ           (1 << 11)
 
 /* Viridian CPUID 4000004, Implementation Recommendations. */
@@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned int *eax,
             *eax |= CPUID3A_MSR_FREQ;
         if ( viridian_feature_mask(d) & HVMPV_time_ref_count )
             *eax |= CPUID3A_MSR_TIME_REF_COUNT;
+        if ( viridian_feature_mask(d) & HVMPV_reference_tsc )
+            *eax |= CPUID3A_MSR_REFERENCE_TSC;
         break;
     case 4:
         /* Recommended hypercall usage. */
@@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu *v)
            v, aa->fields.enabled, (unsigned long)aa->fields.pfn);
 }
 
+static void dump_reference_tsc(const struct domain *d)
+{
+    const union viridian_reference_tsc *rt;
+
+    rt = &d->arch.hvm_domain.viridian.reference_tsc;
+    
+    printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: %x pfn: %lx\n",
+           d->domain_id,
+           rt->fields.enabled, (unsigned long)rt->fields.pfn);
+}
+
 static void enable_hypercall_page(struct domain *d)
 {
     unsigned long gmfn = d->arch.hvm_domain.viridian.hypercall_gpa.fields.pfn;
@@ -224,6 +239,81 @@ static void initialize_apic_assist(struct vcpu *v)
     put_page_and_type(page);
 }
 
+static void update_reference_tsc(struct domain *d, bool_t initialize)
+{
+    unsigned long gmfn = d->arch.hvm_domain.viridian.reference_tsc.fields.pfn;
+    struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    HV_REFERENCE_TSC_PAGE *p;
+
+    if ( !page || !get_page_type(page, PGT_writable_page) )
+    {
+        if ( page )
+            put_page(page);
+        gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn,
+                 page ? page_to_mfn(page) : INVALID_MFN);
+        return;
+    }
+
+    p = __map_domain_page(page);
+
+    if ( initialize )
+        clear_page(p);
+
+    /*
+     * This enlightenment must be disabled is the host TSC is not invariant.
+     * However it is also disabled if vtsc is true (which means rdtsc is being
+     * emulated). This generally happens when guest TSC freq and host TSC freq
+     * don't match. The TscScale value could be adjusted to cope with this,
+     * allowing vtsc to be turned off, but support for this is not yet present
+     * in the hypervisor. Thus is it is possible that migrating a Windows VM
+     * between hosts of differing TSC frequencies may result in large
+     * differences in guest performance.
+     */
+    if ( !host_tsc_is_safe() || d->arch.vtsc )
+    {
+        /*
+         * The specification states that valid values of TscSequence range
+         * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate
+         * this mechanism is no longer a reliable source of time and that
+         * the VM should fall back to a different source.
+         *
+         * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually violate
+         * the spec. and rely on a value of 0 to indicate that this
+         * enlightenment should no longer be used. These two kernel
+         * versions are currently the only ones to make use of this
+         * enlightenment, so just use 0 here.
+         */
+        p->TscSequence = 0;
+
+        printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: invalidated\n",
+               d->domain_id);
+        return;
+    }
+
+    /*
+     * The guest will calculate reference time according to the following
+     * formula:
+     *
+     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
+     *
+     * Windows uses a 100ns tick, so we need a scale which is cpu
+     * ticks per 100ns shifted left by 64.
+     */
+    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
+
+    if ( initialize )
+        p->TscSequence = 1;
+    else /* guest is paused */
+        do {
+            p->TscSequence++;
+        } while ( p->TscSequence == 0xFFFFFFFF ||
+                  p->TscSequence == 0 ); /* Avoid both 'invalid' values */
+
+    unmap_domain_page(p);
+
+    put_page_and_type(page);
+}
+
 int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
 {
     struct vcpu *v = current;
@@ -282,6 +372,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
             initialize_apic_assist(v);
         break;
 
+    case VIRIDIAN_MSR_REFERENCE_TSC:
+        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
+            return 0;
+
+        perfc_incr(mshv_wrmsr_tsc_msr);
+        d->arch.hvm_domain.viridian.reference_tsc.raw = val;
+        dump_reference_tsc(d);
+        if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
+            update_reference_tsc(d, 1);
+        break;
+
     default:
         return 0;
     }
@@ -346,6 +447,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t *val)
         *val = v->arch.hvm_vcpu.viridian.apic_assist.raw;
         break;
 
+    case VIRIDIAN_MSR_REFERENCE_TSC:
+        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
+            return 0;
+
+        perfc_incr(mshv_rdmsr_tsc_msr);
+        *val = d->arch.hvm_domain.viridian.reference_tsc.raw;
+        break;
+
     case VIRIDIAN_MSR_TIME_REF_COUNT:
     {
         uint64_t tsc;
@@ -452,6 +561,7 @@ static int viridian_save_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
 
     ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw;
     ctxt.guest_os_id   = d->arch.hvm_domain.viridian.guest_os_id.raw;
+    ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw;
 
     return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0);
 }
@@ -460,11 +570,15 @@ static int viridian_load_domain_ctxt(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_viridian_domain_context ctxt;
 
-    if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
+    if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
         return -EINVAL;
 
     d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa;
     d->arch.hvm_domain.viridian.guest_os_id.raw   = ctxt.guest_os_id;
+    d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc;
+
+    if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
+        update_reference_tsc(d, 0);
 
     return 0;
 }
diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-x86/hvm/viridian.h
index 496da33..35a22f5 100644
--- a/xen/include/asm-x86/hvm/viridian.h
+++ b/xen/include/asm-x86/hvm/viridian.h
@@ -48,10 +48,35 @@ union viridian_hypercall_gpa
     } fields;
 };
 
+union viridian_reference_tsc
+{
+    uint64_t raw;
+    struct
+    {
+        uint64_t enabled:1;
+        uint64_t reserved_preserved:11;
+        uint64_t pfn:48;
+    } fields;
+};
+
+/*
+ * Type defintion as in Microsoft Hypervisor Top-Level Functional
+ * Specification v4.0a, section 15.4.2.
+ */
+typedef struct _HV_REFERENCE_TSC_PAGE
+{
+    uint32_t TscSequence;
+    uint32_t Reserved1;
+    uint64_t TscScale;
+    int64_t  TscOffset;
+    uint64_t Reserved2[509];
+} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
+
 struct viridian_domain
 {
     union viridian_guest_os_id guest_os_id;
     union viridian_hypercall_gpa hypercall_gpa;
+    union viridian_reference_tsc reference_tsc;
 };
 
 int
@@ -76,3 +101,13 @@ int
 viridian_hypercall(struct cpu_user_regs *regs);
 
 #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 16d85a3..006924b 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave {
 struct hvm_viridian_domain_context {
     uint64_t hypercall_gpa;
     uint64_t guest_os_id;
+    uint64_t reference_tsc;
 };
 
 DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct hvm_viridian_domain_context);
@@ -616,3 +617,13 @@ struct hvm_msr {
 #define HVM_SAVE_CODE_MAX 20
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 3c51072..a2d43bc 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -92,10 +92,15 @@
 #define _HVMPV_time_ref_count 2
 #define HVMPV_time_ref_count  (1 << _HVMPV_time_ref_count)
 
+/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */
+#define _HVMPV_reference_tsc 3
+#define HVMPV_reference_tsc  (1 << _HVMPV_reference_tsc)
+
 #define HVMPV_feature_mask \
 	(HVMPV_base_freq | \
 	 HVMPV_no_freq | \
-	 HVMPV_time_ref_count)
+	 HVMPV_time_ref_count | \
+	 HVMPV_reference_tsc)
 
 #endif
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-13 14:24 Paul Durrant
@ 2014-10-13 14:25 ` Paul Durrant
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Durrant @ 2014-10-13 14:25 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xenproject.org
  Cc: Keir (Xen.org), Ian Campbell, Christoph Egger, Stefano Stabellini,
	Jan Beulich, Ian Jackson

Sorry, this should have been v3. I'll re-post.

  Paul

> -----Original Message-----
> From: Paul Durrant [mailto:paul.durrant@citrix.com]
> Sent: 13 October 2014 15:24
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant; Keir (Xen.org); Jan Beulich; Ian Campbell; Ian Jackson;
> Stefano Stabellini; Christoph Egger
> Subject: [PATCH v2] x86/viridian: Add Partition Reference Time
> enlightenment
> 
> The presence of the partition reference time enlightenment persuades
> newer
> versions of Windows to prefer the TSC as their primary time source. Hence,
> if rdtsc is not being emulated and is invariant then many vmexits (for
> alternative time sources such as the HPET or reference counter MSR) can
> be avoided.
> 
> The implementation is not yet complete as no attempt is made to prevent
> emulation of rdtsc if the enlightenment is active and guest and host
> TSC frequencies differ. To do that requires invasive changes in the core
> x86 time code and hence a lot more testing.
> 
> This patch avoids the issue by disabling the enlightenment if rdtsc is
> being emulated, causing Windows to choose another time source. This is
> safe, but may cause a big variation in performance of guests migrated
> between hosts of differing TSC frequency. Thus the enlightenment is not
> enabled in the default set, but may be enabled to improve guest
> performance
> where such migrations are not a concern.
> 
> See section 15.4 of the Microsoft Hypervisor Top Level Functional
> Specification v4.0a for details.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Christoph Egger <chegger@amazon.de>
> ---
> v2:
>  - Prevent TscSequence increment loop from being exposed to a running
>    domain
> 
>  docs/man/xl.cfg.pod.5                  |    6 ++
>  tools/libxl/libxl_dom.c                |    3 +
>  tools/libxl/libxl_types.idl            |    1 +
>  xen/arch/x86/hvm/viridian.c            |  116
> +++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/hvm/viridian.h     |   35 ++++++++++
>  xen/include/public/arch-x86/hvm/save.h |   11 +++
>  xen/include/public/hvm/params.h        |    7 +-
>  7 files changed, 177 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 8bba21c..925adbb 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1163,6 +1163,12 @@ This group incorporates Partition Time Reference
> Counter MSR. This
>  enlightenment can improve performance of Windows 8 and Windows
>  Server 2012 onwards.
> 
> +=item B<reference_tsc>
> +
> +This set incorporates the Partition Reference TSC MSR. This
> +enlightenment can improve performance of Windows 7 and Windows
> +Server 2008 R2 onwards.
> +
>  =item B<defaults>
> 
>  This is a special value that enables the default set of groups, which
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index bd21841..d04fc0d 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc,
> uint32_t domid,
>      if (libxl_bitmap_test(&enlightenments,
> LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT))
>          mask |= HVMPV_time_ref_count;
> 
> +    if (libxl_bitmap_test(&enlightenments,
> LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC))
> +        mask |= HVMPV_reference_tsc;
> +
>      if (mask != 0 &&
>          xc_hvm_param_set(CTX->xch,
>                           domid,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 90d152f..da75a40 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -183,6 +183,7 @@ libxl_viridian_enlightenment =
> Enumeration("viridian_enlightenment", [
>      (0, "base"),
>      (1, "freq"),
>      (2, "time_ref_count"),
> +    (3, "reference_tsc"),
>      ])
> 
>  #
> diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
> index 6726168..528bb8b 100644
> --- a/xen/arch/x86/hvm/viridian.c
> +++ b/xen/arch/x86/hvm/viridian.c
> @@ -21,6 +21,7 @@
>  #define VIRIDIAN_MSR_HYPERCALL                  0x40000001
>  #define VIRIDIAN_MSR_VP_INDEX                   0x40000002
>  #define VIRIDIAN_MSR_TIME_REF_COUNT             0x40000020
> +#define VIRIDIAN_MSR_REFERENCE_TSC              0x40000021
>  #define VIRIDIAN_MSR_TSC_FREQUENCY              0x40000022
>  #define VIRIDIAN_MSR_APIC_FREQUENCY             0x40000023
>  #define VIRIDIAN_MSR_EOI                        0x40000070
> @@ -40,6 +41,7 @@
>  #define CPUID3A_MSR_APIC_ACCESS    (1 << 4)
>  #define CPUID3A_MSR_HYPERCALL      (1 << 5)
>  #define CPUID3A_MSR_VP_INDEX       (1 << 6)
> +#define CPUID3A_MSR_REFERENCE_TSC  (1 << 9)
>  #define CPUID3A_MSR_FREQ           (1 << 11)
> 
>  /* Viridian CPUID 4000004, Implementation Recommendations. */
> @@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned int
> *eax,
>              *eax |= CPUID3A_MSR_FREQ;
>          if ( viridian_feature_mask(d) & HVMPV_time_ref_count )
>              *eax |= CPUID3A_MSR_TIME_REF_COUNT;
> +        if ( viridian_feature_mask(d) & HVMPV_reference_tsc )
> +            *eax |= CPUID3A_MSR_REFERENCE_TSC;
>          break;
>      case 4:
>          /* Recommended hypercall usage. */
> @@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu *v)
>             v, aa->fields.enabled, (unsigned long)aa->fields.pfn);
>  }
> 
> +static void dump_reference_tsc(const struct domain *d)
> +{
> +    const union viridian_reference_tsc *rt;
> +
> +    rt = &d->arch.hvm_domain.viridian.reference_tsc;
> +
> +    printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: %x
> pfn: %lx\n",
> +           d->domain_id,
> +           rt->fields.enabled, (unsigned long)rt->fields.pfn);
> +}
> +
>  static void enable_hypercall_page(struct domain *d)
>  {
>      unsigned long gmfn = d-
> >arch.hvm_domain.viridian.hypercall_gpa.fields.pfn;
> @@ -224,6 +239,81 @@ static void initialize_apic_assist(struct vcpu *v)
>      put_page_and_type(page);
>  }
> 
> +static void update_reference_tsc(struct domain *d, bool_t initialize)
> +{
> +    unsigned long gmfn = d-
> >arch.hvm_domain.viridian.reference_tsc.fields.pfn;
> +    struct page_info *page = get_page_from_gfn(d, gmfn, NULL,
> P2M_ALLOC);
> +    HV_REFERENCE_TSC_PAGE *p;
> +
> +    if ( !page || !get_page_type(page, PGT_writable_page) )
> +    {
> +        if ( page )
> +            put_page(page);
> +        gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn,
> +                 page ? page_to_mfn(page) : INVALID_MFN);
> +        return;
> +    }
> +
> +    p = __map_domain_page(page);
> +
> +    if ( initialize )
> +        clear_page(p);
> +
> +    /*
> +     * This enlightenment must be disabled is the host TSC is not invariant.
> +     * However it is also disabled if vtsc is true (which means rdtsc is being
> +     * emulated). This generally happens when guest TSC freq and host TSC
> freq
> +     * don't match. The TscScale value could be adjusted to cope with this,
> +     * allowing vtsc to be turned off, but support for this is not yet present
> +     * in the hypervisor. Thus is it is possible that migrating a Windows VM
> +     * between hosts of differing TSC frequencies may result in large
> +     * differences in guest performance.
> +     */
> +    if ( !host_tsc_is_safe() || d->arch.vtsc )
> +    {
> +        /*
> +         * The specification states that valid values of TscSequence range
> +         * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate
> +         * this mechanism is no longer a reliable source of time and that
> +         * the VM should fall back to a different source.
> +         *
> +         * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually violate
> +         * the spec. and rely on a value of 0 to indicate that this
> +         * enlightenment should no longer be used. These two kernel
> +         * versions are currently the only ones to make use of this
> +         * enlightenment, so just use 0 here.
> +         */
> +        p->TscSequence = 0;
> +
> +        printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC:
> invalidated\n",
> +               d->domain_id);
> +        return;
> +    }
> +
> +    /*
> +     * The guest will calculate reference time according to the following
> +     * formula:
> +     *
> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> +     *
> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> +     * ticks per 100ns shifted left by 64.
> +     */
> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> +
> +    if ( initialize )
> +        p->TscSequence = 1;
> +    else /* guest is paused */
> +        do {
> +            p->TscSequence++;
> +        } while ( p->TscSequence == 0xFFFFFFFF ||
> +                  p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> +
> +    unmap_domain_page(p);
> +
> +    put_page_and_type(page);
> +}
> +
>  int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
>  {
>      struct vcpu *v = current;
> @@ -282,6 +372,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t val)
>              initialize_apic_assist(v);
>          break;
> 
> +    case VIRIDIAN_MSR_REFERENCE_TSC:
> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
> +            return 0;
> +
> +        perfc_incr(mshv_wrmsr_tsc_msr);
> +        d->arch.hvm_domain.viridian.reference_tsc.raw = val;
> +        dump_reference_tsc(d);
> +        if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
> +            update_reference_tsc(d, 1);
> +        break;
> +
>      default:
>          return 0;
>      }
> @@ -346,6 +447,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t *val)
>          *val = v->arch.hvm_vcpu.viridian.apic_assist.raw;
>          break;
> 
> +    case VIRIDIAN_MSR_REFERENCE_TSC:
> +        if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) )
> +            return 0;
> +
> +        perfc_incr(mshv_rdmsr_tsc_msr);
> +        *val = d->arch.hvm_domain.viridian.reference_tsc.raw;
> +        break;
> +
>      case VIRIDIAN_MSR_TIME_REF_COUNT:
>      {
>          uint64_t tsc;
> @@ -452,6 +561,7 @@ static int viridian_save_domain_ctxt(struct domain
> *d, hvm_domain_context_t *h)
> 
>      ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw;
>      ctxt.guest_os_id   = d->arch.hvm_domain.viridian.guest_os_id.raw;
> +    ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw;
> 
>      return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0);
>  }
> @@ -460,11 +570,15 @@ static int viridian_load_domain_ctxt(struct domain
> *d, hvm_domain_context_t *h)
>  {
>      struct hvm_viridian_domain_context ctxt;
> 
> -    if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
> +    if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 )
>          return -EINVAL;
> 
>      d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa;
>      d->arch.hvm_domain.viridian.guest_os_id.raw   = ctxt.guest_os_id;
> +    d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc;
> +
> +    if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled )
> +        update_reference_tsc(d, 0);
> 
>      return 0;
>  }
> diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-
> x86/hvm/viridian.h
> index 496da33..35a22f5 100644
> --- a/xen/include/asm-x86/hvm/viridian.h
> +++ b/xen/include/asm-x86/hvm/viridian.h
> @@ -48,10 +48,35 @@ union viridian_hypercall_gpa
>      } fields;
>  };
> 
> +union viridian_reference_tsc
> +{
> +    uint64_t raw;
> +    struct
> +    {
> +        uint64_t enabled:1;
> +        uint64_t reserved_preserved:11;
> +        uint64_t pfn:48;
> +    } fields;
> +};
> +
> +/*
> + * Type defintion as in Microsoft Hypervisor Top-Level Functional
> + * Specification v4.0a, section 15.4.2.
> + */
> +typedef struct _HV_REFERENCE_TSC_PAGE
> +{
> +    uint32_t TscSequence;
> +    uint32_t Reserved1;
> +    uint64_t TscScale;
> +    int64_t  TscOffset;
> +    uint64_t Reserved2[509];
> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> +
>  struct viridian_domain
>  {
>      union viridian_guest_os_id guest_os_id;
>      union viridian_hypercall_gpa hypercall_gpa;
> +    union viridian_reference_tsc reference_tsc;
>  };
> 
>  int
> @@ -76,3 +101,13 @@ int
>  viridian_hypercall(struct cpu_user_regs *regs);
> 
>  #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/arch-x86/hvm/save.h
> b/xen/include/public/arch-x86/hvm/save.h
> index 16d85a3..006924b 100644
> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave {
>  struct hvm_viridian_domain_context {
>      uint64_t hypercall_gpa;
>      uint64_t guest_os_id;
> +    uint64_t reference_tsc;
>  };
> 
>  DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct
> hvm_viridian_domain_context);
> @@ -616,3 +617,13 @@ struct hvm_msr {
>  #define HVM_SAVE_CODE_MAX 20
> 
>  #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/hvm/params.h
> b/xen/include/public/hvm/params.h
> index 3c51072..a2d43bc 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -92,10 +92,15 @@
>  #define _HVMPV_time_ref_count 2
>  #define HVMPV_time_ref_count  (1 << _HVMPV_time_ref_count)
> 
> +/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */
> +#define _HVMPV_reference_tsc 3
> +#define HVMPV_reference_tsc  (1 << _HVMPV_reference_tsc)
> +
>  #define HVMPV_feature_mask \
>  	(HVMPV_base_freq | \
>  	 HVMPV_no_freq | \
> -	 HVMPV_time_ref_count)
> +	 HVMPV_time_ref_count | \
> +	 HVMPV_reference_tsc)
> 
>  #endif
> 
> --
> 1.7.10.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-13  8:10   ` Jan Beulich
@ 2014-10-14  7:45     ` Ian Campbell
  2014-10-14  9:56       ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2014-10-14  7:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Stefano Stabellini, Matt Wilson, Christoph Egger,
	Ian Jackson, xen-devel, Paul Durrant, Anthony Liguori

On Mon, 2014-10-13 at 09:10 +0100, Jan Beulich wrote:
> >>> On 10.10.14 at 18:36, <msw@linux.com> wrote:
> > On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
> >> +    /*
> >> +     * The guest will calculate reference time according to the following
> >> +     * formula:
> >> +     *
> >> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> >> +     *
> >> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> >> +     * ticks per 100ns shifted left by 64.
> >> +     */
> >> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> >> +
> >> +    do {
> >> +        p->TscSequence++;
> >> +    } while ( p->TscSequence == 0xFFFFFFFF ||
> >> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> > 
> > Anthony Liguori and I were looking this over today and he pointed
> > something out: couldn't a second vCPU of the guest write 0 or
> > 0xffffffff in a tight loop to cause a hypervisor DoS?
> 
> Yes, this is at least a theoretical issue that should be fixed. I don't
> think it's a practical issue though: I'd expect the compiler to eliminate
> the two reads of the field and instead directly use the result of the
> increment.

Wouldn't that just mean the attacker needs to write fffffffe or ffffffff
instead?

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-14  7:45     ` Ian Campbell
@ 2014-10-14  9:56       ` Jan Beulich
  2014-10-14 10:04         ` Ian Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-10-14  9:56 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Keir Fraser, Stefano Stabellini, Matt Wilson, Christoph Egger,
	Ian Jackson, xen-devel, Paul Durrant, Anthony Liguori

>>> On 14.10.14 at 09:45, <ian.campbell@citrix.com> wrote:
> On Mon, 2014-10-13 at 09:10 +0100, Jan Beulich wrote:
>> >>> On 10.10.14 at 18:36, <msw@linux.com> wrote:
>> > On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
>> >> +    /*
>> >> +     * The guest will calculate reference time according to the following
>> >> +     * formula:
>> >> +     *
>> >> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
>> >> +     *
>> >> +     * Windows uses a 100ns tick, so we need a scale which is cpu
>> >> +     * ticks per 100ns shifted left by 64.
>> >> +     */
>> >> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
>> >> +
>> >> +    do {
>> >> +        p->TscSequence++;
>> >> +    } while ( p->TscSequence == 0xFFFFFFFF ||
>> >> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
>> > 
>> > Anthony Liguori and I were looking this over today and he pointed
>> > something out: couldn't a second vCPU of the guest write 0 or
>> > 0xffffffff in a tight loop to cause a hypervisor DoS?
>> 
>> Yes, this is at least a theoretical issue that should be fixed. I don't
>> think it's a practical issue though: I'd expect the compiler to eliminate
>> the two reads of the field and instead directly use the result of the
>> increment.
> 
> Wouldn't that just mean the attacker needs to write fffffffe or ffffffff
> instead?

No. The effect of what I said would amount to

	x = p->TscSequence;
	do {
		x++;
	} while ( !(x + 1) || !x )
	p->TscSequence = x;

(or something equivalent without using a loop).

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-14  9:56       ` Jan Beulich
@ 2014-10-14 10:04         ` Ian Campbell
  2014-10-14 10:12           ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2014-10-14 10:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Stefano Stabellini, Matt Wilson, Christoph Egger,
	Ian Jackson, xen-devel, Paul Durrant, Anthony Liguori

On Tue, 2014-10-14 at 10:56 +0100, Jan Beulich wrote:
> >>> On 14.10.14 at 09:45, <ian.campbell@citrix.com> wrote:
> > On Mon, 2014-10-13 at 09:10 +0100, Jan Beulich wrote:
> >> >>> On 10.10.14 at 18:36, <msw@linux.com> wrote:
> >> > On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
> >> >> +    /*
> >> >> +     * The guest will calculate reference time according to the following
> >> >> +     * formula:
> >> >> +     *
> >> >> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> >> >> +     *
> >> >> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> >> >> +     * ticks per 100ns shifted left by 64.
> >> >> +     */
> >> >> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> >> >> +
> >> >> +    do {
> >> >> +        p->TscSequence++;
> >> >> +    } while ( p->TscSequence == 0xFFFFFFFF ||
> >> >> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> >> > 
> >> > Anthony Liguori and I were looking this over today and he pointed
> >> > something out: couldn't a second vCPU of the guest write 0 or
> >> > 0xffffffff in a tight loop to cause a hypervisor DoS?
> >> 
> >> Yes, this is at least a theoretical issue that should be fixed. I don't
> >> think it's a practical issue though: I'd expect the compiler to eliminate
> >> the two reads of the field and instead directly use the result of the
> >> increment.
> > 
> > Wouldn't that just mean the attacker needs to write fffffffe or ffffffff
> > instead?
> 
> No. The effect of what I said would amount to
> 
> 	x = p->TscSequence;
> 	do {
> 		x++;
> 	} while ( !(x + 1) || !x )
> 	p->TscSequence = x;
> 
> (or something equivalent without using a loop).

Ah right. Perhaps it would better to write it that way and use some sort
of ACCESS_ONCE like macrot enforce it actually ends up that way rather
than rely on the vagaries of the compiler?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-14 10:04         ` Ian Campbell
@ 2014-10-14 10:12           ` Jan Beulich
  2014-10-14 10:16             ` Paul Durrant
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-10-14 10:12 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Keir Fraser, Stefano Stabellini, Matt Wilson, Christoph Egger,
	IanJackson, xen-devel, Paul Durrant, Anthony Liguori

>>> On 14.10.14 at 12:04, <ian.campbell@citrix.com> wrote:
> On Tue, 2014-10-14 at 10:56 +0100, Jan Beulich wrote:
>> >>> On 14.10.14 at 09:45, <ian.campbell@citrix.com> wrote:
>> > On Mon, 2014-10-13 at 09:10 +0100, Jan Beulich wrote:
>> >> >>> On 10.10.14 at 18:36, <msw@linux.com> wrote:
>> >> > On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
>> >> >> +    /*
>> >> >> +     * The guest will calculate reference time according to the following
>> >> >> +     * formula:
>> >> >> +     *
>> >> >> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
>> >> >> +     *
>> >> >> +     * Windows uses a 100ns tick, so we need a scale which is cpu
>> >> >> +     * ticks per 100ns shifted left by 64.
>> >> >> +     */
>> >> >> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
>> >> >> +
>> >> >> +    do {
>> >> >> +        p->TscSequence++;
>> >> >> +    } while ( p->TscSequence == 0xFFFFFFFF ||
>> >> >> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
>> >> > 
>> >> > Anthony Liguori and I were looking this over today and he pointed
>> >> > something out: couldn't a second vCPU of the guest write 0 or
>> >> > 0xffffffff in a tight loop to cause a hypervisor DoS?
>> >> 
>> >> Yes, this is at least a theoretical issue that should be fixed. I don't
>> >> think it's a practical issue though: I'd expect the compiler to eliminate
>> >> the two reads of the field and instead directly use the result of the
>> >> increment.
>> > 
>> > Wouldn't that just mean the attacker needs to write fffffffe or ffffffff
>> > instead?
>> 
>> No. The effect of what I said would amount to
>> 
>> 	x = p->TscSequence;
>> 	do {
>> 		x++;
>> 	} while ( !(x + 1) || !x )
>> 	p->TscSequence = x;
>> 
>> (or something equivalent without using a loop).
> 
> Ah right. Perhaps it would better to write it that way and use some sort
> of ACCESS_ONCE like macrot enforce it actually ends up that way rather
> than rely on the vagaries of the compiler?

Of course - that's why I said it's at least a theoretical issue and
needs fixing. But the v3 Paul sent deals with this differently
anyway, so not much point in continuing finding another clean
solution.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
  2014-10-14 10:12           ` Jan Beulich
@ 2014-10-14 10:16             ` Paul Durrant
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Durrant @ 2014-10-14 10:16 UTC (permalink / raw)
  To: Jan Beulich, Ian Campbell
  Cc: Keir (Xen.org), Matt Wilson, Christoph Egger,
	xen-devel@lists.xen.org, Stefano Stabellini, Anthony Liguori,
	Ian Jackson

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 14 October 2014 11:12
> To: Ian Campbell
> Cc: Christoph Egger; Paul Durrant; Anthony Liguori; Ian Jackson; Stefano
> Stabellini; Matt Wilson; xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v2] x86/viridian: Add Partition Reference
> Time enlightenment
> 
> >>> On 14.10.14 at 12:04, <ian.campbell@citrix.com> wrote:
> > On Tue, 2014-10-14 at 10:56 +0100, Jan Beulich wrote:
> >> >>> On 14.10.14 at 09:45, <ian.campbell@citrix.com> wrote:
> >> > On Mon, 2014-10-13 at 09:10 +0100, Jan Beulich wrote:
> >> >> >>> On 10.10.14 at 18:36, <msw@linux.com> wrote:
> >> >> > On Mon, Sep 29, 2014 at 11:28:44AM +0100, Paul Durrant wrote:
> >> >> >> +    /*
> >> >> >> +     * The guest will calculate reference time according to the
> following
> >> >> >> +     * formula:
> >> >> >> +     *
> >> >> >> +     * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> >> >> >> +     *
> >> >> >> +     * Windows uses a 100ns tick, so we need a scale which is cpu
> >> >> >> +     * ticks per 100ns shifted left by 64.
> >> >> >> +     */
> >> >> >> +    p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32;
> >> >> >> +
> >> >> >> +    do {
> >> >> >> +        p->TscSequence++;
> >> >> >> +    } while ( p->TscSequence == 0xFFFFFFFF ||
> >> >> >> +              p->TscSequence == 0 ); /* Avoid both 'invalid' values */
> >> >> >
> >> >> > Anthony Liguori and I were looking this over today and he pointed
> >> >> > something out: couldn't a second vCPU of the guest write 0 or
> >> >> > 0xffffffff in a tight loop to cause a hypervisor DoS?
> >> >>
> >> >> Yes, this is at least a theoretical issue that should be fixed. I don't
> >> >> think it's a practical issue though: I'd expect the compiler to eliminate
> >> >> the two reads of the field and instead directly use the result of the
> >> >> increment.
> >> >
> >> > Wouldn't that just mean the attacker needs to write fffffffe or ffffffff
> >> > instead?
> >>
> >> No. The effect of what I said would amount to
> >>
> >> 	x = p->TscSequence;
> >> 	do {
> >> 		x++;
> >> 	} while ( !(x + 1) || !x )
> >> 	p->TscSequence = x;
> >>
> >> (or something equivalent without using a loop).
> >
> > Ah right. Perhaps it would better to write it that way and use some sort
> > of ACCESS_ONCE like macrot enforce it actually ends up that way rather
> > than rely on the vagaries of the compiler?
> 
> Of course - that's why I said it's at least a theoretical issue and
> needs fixing. But the v3 Paul sent deals with this differently
> anyway, so not much point in continuing finding another clean
> solution.
> 

I'll actually go with Matt's suggestion and just ditch the loop entirely. I'll send v4 in a few minutes.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-10-14 10:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-29 10:28 [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment Paul Durrant
2014-10-10 11:55 ` Egger, Christoph
2014-10-13  8:53   ` Egger, Christoph
2014-10-13  9:31     ` Jan Beulich
2014-10-13 10:33     ` Paul Durrant
2014-10-10 16:36 ` Matt Wilson
2014-10-13  8:10   ` Jan Beulich
2014-10-14  7:45     ` Ian Campbell
2014-10-14  9:56       ` Jan Beulich
2014-10-14 10:04         ` Ian Campbell
2014-10-14 10:12           ` Jan Beulich
2014-10-14 10:16             ` Paul Durrant
  -- strict thread matches above, loose matches on Subject: below --
2014-10-13 14:24 Paul Durrant
2014-10-13 14:25 ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).