xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Fixes to several domctls for migration
@ 2014-06-04 17:26 Andrew Cooper
  2014-06-04 17:26 ` [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Andrew Cooper @ 2014-06-04 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

These are all bugfixes for hypercalls relating to migration, which were
discovered during the migration v2 work, but logically distinct and provided
separately to help reduce the length of the migration v2 series.

Andrew Cooper (4):
  x86/domctl: Implement XEN_DOMCTL_{get,set}_vcpu_msrs
  tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext
  x86/domctl: Two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate

 tools/libxc/xc_domain_save.c |   38 +++++++
 xen/arch/x86/domctl.c        |  257 +++++++++++++++++++++++++-----------------
 xen/include/public/domctl.h  |   50 ++++----
 3 files changed, 225 insertions(+), 120 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-04 17:26 [PATCH 0/4] Fixes to several domctls for migration Andrew Cooper
@ 2014-06-04 17:26 ` Andrew Cooper
  2014-06-05 12:46   ` Jan Beulich
  2014-06-04 17:26 ` [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save() Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2014-06-04 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

Despite my 'Reviewed-by' tag on c/s 65e3554908 "x86/PV: support data
breakpoint extension registers", I have re-evaluated my position as far as the
hypercall interface is concerned.

Previously, for the sake of not modifying the migration code in libxc,
XEN_DOMCTL_get_ext_vcpucontext would jump though hoops to return -ENOBUFS if
and only if MSRs were in use and no buffer was present.

This is fragile, and awkward from a toolstack point-of-view when actually
sending MSR content in the migration stream.  It also complicates fixing a
further race condition, between querying the number of MSRs for a vcpu, and
the vcpu touching a new one.

As this code is still only in staging, take this opportunity to redesign the
interface.  This patch introduces the brand new XEN_DOMCTL_{get,set}_vcpu_msrs
subops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/domctl.c       |  126 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/domctl.h |   31 +++++++++++
 2 files changed, 157 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 1285dd0..c0c2d1b 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1341,6 +1341,132 @@ long arch_do_domctl(
     }
     break;
 
+    case XEN_DOMCTL_get_vcpu_msrs:
+    case XEN_DOMCTL_set_vcpu_msrs:
+    {
+        struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
+        struct xen_domctl_vcpu_msr msr;
+        struct vcpu *v;
+        uint32_t nr_msrs = 0;
+
+        ret = -ESRCH;
+        if ( (vmsrs->vcpu >= d->max_vcpus) ||
+             ((v = d->vcpu[vmsrs->vcpu]) == NULL) )
+            break;
+
+        ret = -EINVAL;
+        if ( (v == current) || /* no vcpu_pause() */
+             !is_pv_domain(d) )
+            break;
+
+        /* Count maximum number of optional msrs. */
+        if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+            nr_msrs += 4;
+
+        if ( domctl->cmd == XEN_DOMCTL_get_vcpu_msrs )
+        {
+            /* NULL guest handle is a request for max size. */
+            if ( guest_handle_is_null(vmsrs->msrs) ||
+                 (vmsrs->msr_count < nr_msrs) )
+            {
+                vmsrs->msr_count = nr_msrs;
+                ret = guest_handle_is_null(vmsrs->msrs) ? 0 : -ENOBUFS;
+            }
+            else
+            {
+                ret = i = 0;
+
+                vcpu_pause(v);
+
+                if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+                {
+                    unsigned int j;
+
+                    if ( v->arch.pv_vcpu.dr_mask[0] )
+                    {
+                        msr.index = MSR_AMD64_DR0_ADDRESS_MASK;
+                        msr.reserved = 0;
+                        msr.value = v->arch.pv_vcpu.dr_mask[0];
+                        if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
+                            ret = -EFAULT;
+                        else
+                            ++i;
+                    }
+
+                    for ( j = 0; j < 3; ++j )
+                    {
+                        if ( !v->arch.pv_vcpu.dr_mask[1 + j] )
+                            continue;
+                        if ( !ret )
+                        {
+                            msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
+                            msr.reserved = 0;
+                            msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
+                            if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
+                                ret = -EFAULT;
+                            else
+                                ++i;
+                        }
+                    }
+
+                }
+
+                vcpu_unpause(v);
+
+                /* Check we didn't lie to userspace then overflow the buffer */
+                BUG_ON(i > nr_msrs);
+                vmsrs->msr_count = i;
+            }
+
+            copyback = 1;
+        }
+        else
+        {
+            ret = -EINVAL;
+            if ( vmsrs->msr_count > nr_msrs )
+                break;
+
+            vcpu_pause(v);
+
+            for ( i = 0; i < vmsrs->msr_count; ++i )
+            {
+                ret = -EFAULT;
+                if ( copy_from_guest_offset(&msr, vmsrs->msrs, i, 1) )
+                    break;
+
+                ret = -EINVAL;
+                if ( msr.reserved )
+                    break;
+
+                switch ( msr.index )
+                {
+                case MSR_AMD64_DR0_ADDRESS_MASK:
+                    if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
+                         (msr.value >> 32) )
+                        break;
+                    v->arch.pv_vcpu.dr_mask[0] = msr.value;
+                    continue;
+
+                case MSR_AMD64_DR1_ADDRESS_MASK ...
+                    MSR_AMD64_DR3_ADDRESS_MASK:
+                    if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
+                         (msr.value >> 32) )
+                        break;
+                    msr.index -= MSR_AMD64_DR1_ADDRESS_MASK - 1;
+                    v->arch.pv_vcpu.dr_mask[msr.index] = msr.value;
+                    continue;
+                }
+                break;
+            }
+
+            vcpu_unpause(v);
+
+            if ( i == vmsrs->msr_count )
+                ret = 0;
+        }
+    }
+    break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 385b053..7a13e25 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -895,6 +895,34 @@ struct xen_domctl_cacheflush {
 typedef struct xen_domctl_cacheflush xen_domctl_cacheflush_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_cacheflush_t);
 
+#if defined(__i386__) || defined(__x86_64__)
+struct xen_domctl_vcpu_msr {
+    uint32_t         index;
+    uint32_t         reserved;
+    uint64_aligned_t value;
+};
+typedef struct xen_domctl_vcpu_msr xen_domctl_vcpu_msr_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpu_msr_t);
+
+/*
+ * XEN_DOMCTL_set_vcpu_msrs / XEN_DOMCTL_get_vcpu_msrs.
+ *
+ * For GET, a NULL 'msrs' guest handle is a request for the maximum
+ * 'msr_count' which is required to save the vcpu state.
+ *
+ * When actually getting the MSRs, 'msr_count' shall reflect the actual number
+ * of MSRs saved, which might be fewer than the maximum, if some are not
+ * currently used by the vcpu.
+ */
+struct xen_domctl_vcpu_msrs {
+    uint32_t vcpu;                                   /* IN     */
+    uint32_t msr_count;                              /* IN/OUT */
+    XEN_GUEST_HANDLE_64(xen_domctl_vcpu_msr_t) msrs; /* IN/OUT */
+};
+typedef struct xen_domctl_vcpu_msrs xen_domctl_vcpu_msrs_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpu_msrs_t);
+#endif
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -965,6 +993,8 @@ struct xen_domctl {
 #define XEN_DOMCTL_getnodeaffinity               69
 #define XEN_DOMCTL_set_max_evtchn                70
 #define XEN_DOMCTL_cacheflush                    71
+#define XEN_DOMCTL_get_vcpu_msrs                 72
+#define XEN_DOMCTL_set_vcpu_msrs                 73
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1014,6 +1044,7 @@ struct xen_domctl {
 #if defined(__i386__) || defined(__x86_64__)
         struct xen_domctl_cpuid             cpuid;
         struct xen_domctl_vcpuextstate      vcpuextstate;
+        struct xen_domctl_vcpu_msrs         vcpu_msrs;
 #endif
         struct xen_domctl_set_access_required access_required;
         struct xen_domctl_audit_p2m         audit_p2m;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-04 17:26 [PATCH 0/4] Fixes to several domctls for migration Andrew Cooper
  2014-06-04 17:26 ` [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs Andrew Cooper
@ 2014-06-04 17:26 ` Andrew Cooper
  2014-06-05 13:41   ` Jan Beulich
  2014-06-05 15:52   ` Ian Campbell
  2014-06-04 17:26 ` [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext Andrew Cooper
  2014-06-04 17:26 ` [PATCH 4/4] x86/domctl: Two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate Andrew Cooper
  3 siblings, 2 replies; 19+ messages in thread
From: Andrew Cooper @ 2014-06-04 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Jan Beulich

Migrating PV domains using MSRs is not supported.  This uses the new
XEN_DOMCTL_get_vcpu_msrs and will fail the migration with an explicit error.

This is an improvement upon the current failure of
  "No extended context for VCPUxx (ENOBUFS)"

Support for migrating PV domains which are using MSRs will be included in the
migration v2 work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxc/xc_domain_save.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index acf3685..7ef5183 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1995,6 +1995,44 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             goto out;
         }
 
+        /* Check there are no PV MSRs in use. */
+        domctl.cmd = XEN_DOMCTL_get_vcpu_msrs;
+        domctl.domain = dom;
+        memset(&domctl.u, 0, sizeof(domctl.u));
+        domctl.u.vcpu_msrs.vcpu = i;
+        if ( xc_domctl(xch, &domctl) < 0 )
+        {
+            PERROR("Error querying maximum number of MSRs for VCPU%d", i);
+            goto out;
+        }
+
+        if ( domctl.u.vcpu_msrs.msr_count )
+        {
+            buffer = xc_hypercall_buffer_alloc(xch, buffer,
+                                               domctl.u.vcpu_msrs.msr_count *
+                                               sizeof(xen_domctl_vcpu_msr_t));
+            if ( !buffer )
+            {
+                PERROR("Insufficient memory for getting MSRs for VCPU%d", i);
+                goto out;
+            }
+            set_xen_guest_handle(domctl.u.vcpu_msrs.msrs, buffer);
+
+            if ( xc_domctl(xch, &domctl) < 0 )
+            {
+                PERROR("Error querying MSRs for VCPU%d", i);
+                goto out;
+            }
+
+            xc_hypercall_buffer_free(xch, buffer);
+            if ( domctl.u.vcpu_msrs.msr_count )
+            {
+                errno = EOPNOTSUPP;
+                PERROR("Unable to migrate PV guest using MSRs (yet)");
+                goto out;
+            }
+        }
+
         /* Start to fetch CPU eXtended States */
         /* Get buffer size first */
         domctl.cmd = XEN_DOMCTL_getvcpuextstate;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext
  2014-06-04 17:26 [PATCH 0/4] Fixes to several domctls for migration Andrew Cooper
  2014-06-04 17:26 ` [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs Andrew Cooper
  2014-06-04 17:26 ` [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save() Andrew Cooper
@ 2014-06-04 17:26 ` Andrew Cooper
  2014-06-05  7:52   ` Frediano Ziglio
  2014-06-04 17:26 ` [PATCH 4/4] x86/domctl: Two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate Andrew Cooper
  3 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2014-06-04 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

The PV MSR functionality is now implemented as a separate set of domctls.

This is a revert of parts of c/s65e3554908
  "x86/PV: support data breakpoint extension registers"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/domctl.c       |   78 +------------------------------------------
 xen/include/public/domctl.h |   19 -----------
 2 files changed, 1 insertion(+), 96 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index c0c2d1b..8f5b287 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -818,7 +818,6 @@ long arch_do_domctl(
     {
         struct xen_domctl_ext_vcpucontext *evc;
         struct vcpu *v;
-        struct xen_domctl_ext_vcpu_msr msr;
 
         evc = &domctl->u.ext_vcpucontext;
 
@@ -864,42 +863,7 @@ long arch_do_domctl(
             evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
             evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
 
-            i = ret = 0;
-            if ( boot_cpu_has(X86_FEATURE_DBEXT) )
-            {
-                unsigned int j;
-
-                if ( v->arch.pv_vcpu.dr_mask[0] )
-                {
-                    if ( i < evc->msr_count && !ret )
-                    {
-                        msr.index = MSR_AMD64_DR0_ADDRESS_MASK;
-                        msr.reserved = 0;
-                        msr.value = v->arch.pv_vcpu.dr_mask[0];
-                        if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
-                            ret = -EFAULT;
-                    }
-                    ++i;
-                }
-                for ( j = 0; j < 3; ++j )
-                {
-                    if ( !v->arch.pv_vcpu.dr_mask[1 + j] )
-                        continue;
-                    if ( i < evc->msr_count && !ret )
-                    {
-                        msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
-                        msr.reserved = 0;
-                        msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
-                        if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
-                            ret = -EFAULT;
-                    }
-                    ++i;
-                }
-            }
-            if ( i > evc->msr_count && !ret )
-                ret = -ENOBUFS;
-            evc->msr_count = i;
-
+            ret = 0;
             vcpu_unpause(v);
             copyback = 1;
         }
@@ -954,49 +918,9 @@ long arch_do_domctl(
 
                 ret = vmce_restore_vcpu(v, &vmce);
             }
-            else if ( evc->size > offsetof(typeof(*evc), vmce) )
-                ret = -EINVAL;
             else
                 ret = 0;
 
-            if ( ret || evc->size <= offsetof(typeof(*evc), msrs) )
-                /* nothing */;
-            else if ( evc->size < offsetof(typeof(*evc), msrs) +
-                                  sizeof(evc->msrs) )
-                ret = -EINVAL;
-            else
-            {
-                for ( i = 0; i < evc->msr_count; ++i )
-                {
-                    ret = -EFAULT;
-                    if ( copy_from_guest_offset(&msr, evc->msrs, i, 1) )
-                        break;
-                    ret = -EINVAL;
-                    if ( msr.reserved )
-                        break;
-                    switch ( msr.index )
-                    {
-                    case MSR_AMD64_DR0_ADDRESS_MASK:
-                        if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
-                             (msr.value >> 32) )
-                            break;
-                        v->arch.pv_vcpu.dr_mask[0] = msr.value;
-                        continue;
-                    case MSR_AMD64_DR1_ADDRESS_MASK ...
-                         MSR_AMD64_DR3_ADDRESS_MASK:
-                        if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
-                             (msr.value >> 32) )
-                            break;
-                        msr.index -= MSR_AMD64_DR1_ADDRESS_MASK - 1;
-                        v->arch.pv_vcpu.dr_mask[msr.index] = msr.value;
-                        continue;
-                    }
-                    break;
-                }
-                if ( i == evc->msr_count )
-                    ret = 0;
-            }
-
             domain_unpause(d);
         }
     }
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7a13e25..4321a64 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -543,16 +543,6 @@ typedef struct xen_domctl_pin_mem_cacheattr xen_domctl_pin_mem_cacheattr_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_pin_mem_cacheattr_t);
 
 
-#if defined(__i386__) || defined(__x86_64__)
-struct xen_domctl_ext_vcpu_msr {
-    uint32_t         index;
-    uint32_t         reserved;
-    uint64_aligned_t value;
-};
-typedef struct xen_domctl_ext_vcpu_msr xen_domctl_ext_vcpu_msr_t;
-DEFINE_XEN_GUEST_HANDLE(xen_domctl_ext_vcpu_msr_t);
-#endif
-
 /* XEN_DOMCTL_set_ext_vcpucontext */
 /* XEN_DOMCTL_get_ext_vcpucontext */
 struct xen_domctl_ext_vcpucontext {
@@ -572,14 +562,6 @@ struct xen_domctl_ext_vcpucontext {
     uint16_t         sysenter_callback_cs;
     uint8_t          syscall32_disables_events;
     uint8_t          sysenter_disables_events;
-    /*
-     * When, for the "get" version, msr_count is too small to cover all MSRs
-     * the hypervisor needs to be saved, the call will return -ENOBUFS and
-     * set msr_count to the required (minimum) value. Furthermore, for both
-     * "get" and "set", that field as well as the msrs one only get looked at
-     * if the size field above covers the structure up to the entire msrs one.
-     */
-    uint16_t         msr_count;
 #if defined(__GNUC__)
     union {
         uint64_aligned_t mcg_cap;
@@ -588,7 +570,6 @@ struct xen_domctl_ext_vcpucontext {
 #else
     struct hvm_vmce_vcpu vmce;
 #endif
-    XEN_GUEST_HANDLE_64(xen_domctl_ext_vcpu_msr_t) msrs;
 #endif
 };
 typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] x86/domctl: Two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate
  2014-06-04 17:26 [PATCH 0/4] Fixes to several domctls for migration Andrew Cooper
                   ` (2 preceding siblings ...)
  2014-06-04 17:26 ` [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext Andrew Cooper
@ 2014-06-04 17:26 ` Andrew Cooper
  3 siblings, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2014-06-04 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

Interacting with the vcpu itself should be protected by vcpu_pause().
Buggy/naive toolstacks might encounter adverse interaction with a vcpu context
switch, or increase of xcr0_accum.  There are no much problems with current
in-tree code.

Explicitly permit a NULL guest handle as being a request for size.  It is the
prevailing Xen style, and without it, valgrind's ioctl handler is unable to
determine whether evc->buffer actually got written to.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>

---

There are further issues which are specifically not fixed in this patch.
There is a race condition between querying evc->size and trying to write the
xsave state into a buffer that size, if the vcpu goes and increases xcr0_accum
in the meantime.

The legacy migration stream is sufficiently inflexible that I can't find a way
of doing it in a backwards compatible way which still works in a heterogeneous
migration case.

As there are further problems anyway with the legacy migration stream anyway
(vcpus with different xcr0_accum's will result in stream corruption in
xc_domain_restore()), I intend to wait until the v2 code is present and v1
code removed, at which point this issue shall be trivial to fix.
---
 xen/arch/x86/domctl.c |   53 +++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8f5b287..99163f9 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1097,45 +1097,48 @@ long arch_do_domctl(
              ((v = d->vcpu[evc->vcpu]) == NULL) )
             goto vcpuextstate_out;
 
+        ret = -EINVAL;
+        if ( v == current ) /* no vcpu_pause() */
+            goto vcpuextstate_out;
+
         if ( domctl->cmd == XEN_DOMCTL_getvcpuextstate )
         {
-            unsigned int size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
+            unsigned int size;
 
-            if ( !evc->size && !evc->xfeature_mask )
+            ret = 0;
+            vcpu_pause(v);
+
+            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
+            if ( (!evc->size && !evc->xfeature_mask) ||
+                 guest_handle_is_null(evc->buffer) )
             {
                 evc->xfeature_mask = xfeature_mask;
                 evc->size = size;
-                ret = 0;
+                vcpu_unpause(v);
                 goto vcpuextstate_out;
             }
+
             if ( evc->size != size || evc->xfeature_mask != xfeature_mask )
-            {
                 ret = -EINVAL;
-                goto vcpuextstate_out;
-            }
-            if ( copy_to_guest_offset(domctl->u.vcpuextstate.buffer,
-                                      offset, (void *)&v->arch.xcr0,
-                                      sizeof(v->arch.xcr0)) )
-            {
+
+            if ( !ret && copy_to_guest_offset(evc->buffer, offset,
+                                              (void *)&v->arch.xcr0,
+                                              sizeof(v->arch.xcr0)) )
                 ret = -EFAULT;
-                goto vcpuextstate_out;
-            }
+
             offset += sizeof(v->arch.xcr0);
-            if ( copy_to_guest_offset(domctl->u.vcpuextstate.buffer,
-                                      offset, (void *)&v->arch.xcr0_accum,
-                                      sizeof(v->arch.xcr0_accum)) )
-            {
+            if ( !ret && copy_to_guest_offset(evc->buffer, offset,
+                                              (void *)&v->arch.xcr0_accum,
+                                              sizeof(v->arch.xcr0_accum)) )
                 ret = -EFAULT;
-                goto vcpuextstate_out;
-            }
+
             offset += sizeof(v->arch.xcr0_accum);
-            if ( copy_to_guest_offset(domctl->u.vcpuextstate.buffer,
-                                      offset, (void *)v->arch.xsave_area,
-                                      size - 2 * sizeof(uint64_t)) )
-            {
+            if ( !ret && copy_to_guest_offset(evc->buffer, offset,
+                                              (void *)v->arch.xsave_area,
+                                              size - 2 * sizeof(uint64_t)) )
                 ret = -EFAULT;
-                goto vcpuextstate_out;
-            }
+
+            vcpu_unpause(v);
         }
         else
         {
@@ -1183,12 +1186,14 @@ long arch_do_domctl(
 
             if ( evc->size <= PV_XSAVE_SIZE(_xcr0_accum) )
             {
+                vcpu_pause(v);
                 v->arch.xcr0 = _xcr0;
                 v->arch.xcr0_accum = _xcr0_accum;
                 if ( _xcr0_accum & XSTATE_NONLAZY )
                     v->arch.nonlazy_xstate_used = 1;
                 memcpy(v->arch.xsave_area, _xsave_area,
                        evc->size - 2 * sizeof(uint64_t));
+                vcpu_unpause(v);
             }
             else
                 ret = -EINVAL;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext
  2014-06-04 17:26 ` [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext Andrew Cooper
@ 2014-06-05  7:52   ` Frediano Ziglio
  2014-06-05  9:25     ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Frediano Ziglio @ 2014-06-05  7:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, Xen-devel

On Wed, 2014-06-04 at 18:26 +0100, Andrew Cooper wrote:
> The PV MSR functionality is now implemented as a separate set of domctls.
> 
> This is a revert of parts of c/s65e3554908
>   "x86/PV: support data breakpoint extension registers"
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <JBeulich@suse.com>
> ---
>  xen/arch/x86/domctl.c       |   78 +------------------------------------------
>  xen/include/public/domctl.h |   19 -----------
>  2 files changed, 1 insertion(+), 96 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index c0c2d1b..8f5b287 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -818,7 +818,6 @@ long arch_do_domctl(
>      {
>          struct xen_domctl_ext_vcpucontext *evc;
>          struct vcpu *v;
> -        struct xen_domctl_ext_vcpu_msr msr;
>  
>          evc = &domctl->u.ext_vcpucontext;
>  
> @@ -864,42 +863,7 @@ long arch_do_domctl(
>              evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
>              evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
>  
> -            i = ret = 0;
> -            if ( boot_cpu_has(X86_FEATURE_DBEXT) )
> -            {
> -                unsigned int j;
> -
> -                if ( v->arch.pv_vcpu.dr_mask[0] )
> -                {
> -                    if ( i < evc->msr_count && !ret )
> -                    {
> -                        msr.index = MSR_AMD64_DR0_ADDRESS_MASK;
> -                        msr.reserved = 0;
> -                        msr.value = v->arch.pv_vcpu.dr_mask[0];
> -                        if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
> -                            ret = -EFAULT;
> -                    }
> -                    ++i;
> -                }
> -                for ( j = 0; j < 3; ++j )
> -                {
> -                    if ( !v->arch.pv_vcpu.dr_mask[1 + j] )
> -                        continue;
> -                    if ( i < evc->msr_count && !ret )
> -                    {
> -                        msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
> -                        msr.reserved = 0;
> -                        msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
> -                        if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
> -                            ret = -EFAULT;
> -                    }
> -                    ++i;
> -                }
> -            }
> -            if ( i > evc->msr_count && !ret )
> -                ret = -ENOBUFS;
> -            evc->msr_count = i;
> -
> +            ret = 0;
>              vcpu_unpause(v);
>              copyback = 1;
>          }
> @@ -954,49 +918,9 @@ long arch_do_domctl(
>  
>                  ret = vmce_restore_vcpu(v, &vmce);
>              }
> -            else if ( evc->size > offsetof(typeof(*evc), vmce) )
> -                ret = -EINVAL;
>              else
>                  ret = 0;
>  
> -            if ( ret || evc->size <= offsetof(typeof(*evc), msrs) )
> -                /* nothing */;
> -            else if ( evc->size < offsetof(typeof(*evc), msrs) +
> -                                  sizeof(evc->msrs) )
> -                ret = -EINVAL;
> -            else
> -            {
> -                for ( i = 0; i < evc->msr_count; ++i )
> -                {
> -                    ret = -EFAULT;
> -                    if ( copy_from_guest_offset(&msr, evc->msrs, i, 1) )
> -                        break;
> -                    ret = -EINVAL;
> -                    if ( msr.reserved )
> -                        break;
> -                    switch ( msr.index )
> -                    {
> -                    case MSR_AMD64_DR0_ADDRESS_MASK:
> -                        if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
> -                             (msr.value >> 32) )
> -                            break;
> -                        v->arch.pv_vcpu.dr_mask[0] = msr.value;
> -                        continue;
> -                    case MSR_AMD64_DR1_ADDRESS_MASK ...
> -                         MSR_AMD64_DR3_ADDRESS_MASK:
> -                        if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
> -                             (msr.value >> 32) )
> -                            break;
> -                        msr.index -= MSR_AMD64_DR1_ADDRESS_MASK - 1;
> -                        v->arch.pv_vcpu.dr_mask[msr.index] = msr.value;
> -                        continue;
> -                    }
> -                    break;
> -                }
> -                if ( i == evc->msr_count )
> -                    ret = 0;
> -            }
> -
>              domain_unpause(d);
>          }
>      }
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 7a13e25..4321a64 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -543,16 +543,6 @@ typedef struct xen_domctl_pin_mem_cacheattr xen_domctl_pin_mem_cacheattr_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_pin_mem_cacheattr_t);
>  
> 
> -#if defined(__i386__) || defined(__x86_64__)
> -struct xen_domctl_ext_vcpu_msr {
> -    uint32_t         index;
> -    uint32_t         reserved;
> -    uint64_aligned_t value;
> -};
> -typedef struct xen_domctl_ext_vcpu_msr xen_domctl_ext_vcpu_msr_t;
> -DEFINE_XEN_GUEST_HANDLE(xen_domctl_ext_vcpu_msr_t);
> -#endif
> -
>  /* XEN_DOMCTL_set_ext_vcpucontext */
>  /* XEN_DOMCTL_get_ext_vcpucontext */
>  struct xen_domctl_ext_vcpucontext {
> @@ -572,14 +562,6 @@ struct xen_domctl_ext_vcpucontext {
>      uint16_t         sysenter_callback_cs;
>      uint8_t          syscall32_disables_events;
>      uint8_t          sysenter_disables_events;
> -    /*
> -     * When, for the "get" version, msr_count is too small to cover all MSRs
> -     * the hypervisor needs to be saved, the call will return -ENOBUFS and
> -     * set msr_count to the required (minimum) value. Furthermore, for both
> -     * "get" and "set", that field as well as the msrs one only get looked at
> -     * if the size field above covers the structure up to the entire msrs one.
> -     */
> -    uint16_t         msr_count;
>  #if defined(__GNUC__)
>      union {
>          uint64_aligned_t mcg_cap;
> @@ -588,7 +570,6 @@ struct xen_domctl_ext_vcpucontext {
>  #else
>      struct hvm_vmce_vcpu vmce;
>  #endif
> -    XEN_GUEST_HANDLE_64(xen_domctl_ext_vcpu_msr_t) msrs;
>  #endif
>  };
>  typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t;

This is an ABI change, you should bump XEN_DOMCTL_INTERFACE_VERSION too
(if not already done, if done there should be a comment in the header
stating when is bumped).

Frediano

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext
  2014-06-05  7:52   ` Frediano Ziglio
@ 2014-06-05  9:25     ` Andrew Cooper
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2014-06-05  9:25 UTC (permalink / raw)
  To: Frediano Ziglio; +Cc: Keir Fraser, Jan Beulich, Xen-devel

On 05/06/14 08:52, Frediano Ziglio wrote:
> On Wed, 2014-06-04 at 18:26 +0100, Andrew Cooper wrote:
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 7a13e25..4321a64 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -543,16 +543,6 @@ typedef struct xen_domctl_pin_mem_cacheattr xen_domctl_pin_mem_cacheattr_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_pin_mem_cacheattr_t);
>>  
>>
>> -#if defined(__i386__) || defined(__x86_64__)
>> -struct xen_domctl_ext_vcpu_msr {
>> -    uint32_t         index;
>> -    uint32_t         reserved;
>> -    uint64_aligned_t value;
>> -};
>> -typedef struct xen_domctl_ext_vcpu_msr xen_domctl_ext_vcpu_msr_t;
>> -DEFINE_XEN_GUEST_HANDLE(xen_domctl_ext_vcpu_msr_t);
>> -#endif
>> -
>>  /* XEN_DOMCTL_set_ext_vcpucontext */
>>  /* XEN_DOMCTL_get_ext_vcpucontext */
>>  struct xen_domctl_ext_vcpucontext {
>> @@ -572,14 +562,6 @@ struct xen_domctl_ext_vcpucontext {
>>      uint16_t         sysenter_callback_cs;
>>      uint8_t          syscall32_disables_events;
>>      uint8_t          sysenter_disables_events;
>> -    /*
>> -     * When, for the "get" version, msr_count is too small to cover all MSRs
>> -     * the hypervisor needs to be saved, the call will return -ENOBUFS and
>> -     * set msr_count to the required (minimum) value. Furthermore, for both
>> -     * "get" and "set", that field as well as the msrs one only get looked at
>> -     * if the size field above covers the structure up to the entire msrs one.
>> -     */
>> -    uint16_t         msr_count;
>>  #if defined(__GNUC__)
>>      union {
>>          uint64_aligned_t mcg_cap;
>> @@ -588,7 +570,6 @@ struct xen_domctl_ext_vcpucontext {
>>  #else
>>      struct hvm_vmce_vcpu vmce;
>>  #endif
>> -    XEN_GUEST_HANDLE_64(xen_domctl_ext_vcpu_msr_t) msrs;
>>  #endif
>>  };
>>  typedef struct xen_domctl_ext_vcpucontext xen_domctl_ext_vcpucontext_t;
> This is an ABI change, you should bump XEN_DOMCTL_INTERFACE_VERSION too
> (if not already done, if done there should be a comment in the header
> stating when is bumped).
>
> Frediano
>
>

The ABI version only needs bumping once per Xen release.  The ABI bump
post-4.4 was in the patch which I am partially reverting.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-04 17:26 ` [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs Andrew Cooper
@ 2014-06-05 12:46   ` Jan Beulich
  2014-06-05 13:01     ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-06-05 12:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Xen-devel

>>> On 04.06.14 at 19:26, <andrew.cooper3@citrix.com> wrote:
> Despite my 'Reviewed-by' tag on c/s 65e3554908 "x86/PV: support data
> breakpoint extension registers", I have re-evaluated my position as far as 
> the hypercall interface is concerned.
> 
> Previously, for the sake of not modifying the migration code in libxc,
> XEN_DOMCTL_get_ext_vcpucontext would jump though hoops to return -ENOBUFS if
> and only if MSRs were in use and no buffer was present.
> 
> This is fragile, and awkward from a toolstack point-of-view when actually
> sending MSR content in the migration stream.  It also complicates fixing a
> further race condition, between querying the number of MSRs for a vcpu, and
> the vcpu touching a new one.
> 
> As this code is still only in staging, take this opportunity to redesign the

s/staging/unstable/

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -1341,6 +1341,132 @@ long arch_do_domctl(
>      }
>      break;
>  
> +    case XEN_DOMCTL_get_vcpu_msrs:
> +    case XEN_DOMCTL_set_vcpu_msrs:
> +    {
> +        struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
> +        struct xen_domctl_vcpu_msr msr;
> +        struct vcpu *v;
> +        uint32_t nr_msrs = 0;
> +
> +        ret = -ESRCH;
> +        if ( (vmsrs->vcpu >= d->max_vcpus) ||
> +             ((v = d->vcpu[vmsrs->vcpu]) == NULL) )
> +            break;
> +
> +        ret = -EINVAL;
> +        if ( (v == current) || /* no vcpu_pause() */
> +             !is_pv_domain(d) )
> +            break;
> +
> +        /* Count maximum number of optional msrs. */
> +        if ( boot_cpu_has(X86_FEATURE_DBEXT) )
> +            nr_msrs += 4;
> +
> +        if ( domctl->cmd == XEN_DOMCTL_get_vcpu_msrs )
> +        {
> +            /* NULL guest handle is a request for max size. */
> +            if ( guest_handle_is_null(vmsrs->msrs) ||
> +                 (vmsrs->msr_count < nr_msrs) )
> +            {
> +                vmsrs->msr_count = nr_msrs;
> +                ret = guest_handle_is_null(vmsrs->msrs) ? 0 : -ENOBUFS;

I don't think you should be failing "get" if there is enough space in
the provided buffer to store the actually used number of MSRs. That
way the caller may make a first call with a few (rather than none at
all) entries, an grow the buffer only if this wasn't sufficient.

> +            }
> +            else
> +            {
> +                ret = i = 0;
> +
> +                vcpu_pause(v);
> +
> +                if ( boot_cpu_has(X86_FEATURE_DBEXT) )
> +                {
> +                    unsigned int j;
> +
> +                    if ( v->arch.pv_vcpu.dr_mask[0] )
> +                    {
> +                        msr.index = MSR_AMD64_DR0_ADDRESS_MASK;
> +                        msr.reserved = 0;
> +                        msr.value = v->arch.pv_vcpu.dr_mask[0];
> +                        if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
> +                            ret = -EFAULT;
> +                        else
> +                            ++i;
> +                    }
> +
> +                    for ( j = 0; j < 3; ++j )
> +                    {
> +                        if ( !v->arch.pv_vcpu.dr_mask[1 + j] )
> +                            continue;
> +                        if ( !ret )
> +                        {
> +                            msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
> +                            msr.reserved = 0;
> +                            msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
> +                            if ( copy_to_guest_offset(vmsrs->msrs, i, &msr, 1) )
> +                                ret = -EFAULT;
> +                            else
> +                                ++i;
> +                        }
> +                    }
> +

Stray blank line.

> +                }
> +
> +                vcpu_unpause(v);
> +
> +                /* Check we didn't lie to userspace then overflow the buffer */
> +                BUG_ON(i > nr_msrs);
> +                vmsrs->msr_count = i;
> +            }
> +
> +            copyback = 1;
> +        }
> +        else
> +        {
> +            ret = -EINVAL;
> +            if ( vmsrs->msr_count > nr_msrs )
> +                break;

Similarly I don't think you should fail the operation simply based on a
too large count. Who knows when or why it may make sense for the
caller to specify multiple entries with the same index.

But then again the way you do it we have a statically determined
maximum and don't need to worry about preemption here until a
really large number of MSRs would get handled.

> +
> +            vcpu_pause(v);
> +
> +            for ( i = 0; i < vmsrs->msr_count; ++i )
> +            {
> +                ret = -EFAULT;
> +                if ( copy_from_guest_offset(&msr, vmsrs->msrs, i, 1) )
> +                    break;
> +
> +                ret = -EINVAL;
> +                if ( msr.reserved )
> +                    break;
> +
> +                switch ( msr.index )
> +                {
> +                case MSR_AMD64_DR0_ADDRESS_MASK:
> +                    if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
> +                         (msr.value >> 32) )
> +                        break;
> +                    v->arch.pv_vcpu.dr_mask[0] = msr.value;
> +                    continue;
> +
> +                case MSR_AMD64_DR1_ADDRESS_MASK ...
> +                    MSR_AMD64_DR3_ADDRESS_MASK:
> +                    if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
> +                         (msr.value >> 32) )
> +                        break;
> +                    msr.index -= MSR_AMD64_DR1_ADDRESS_MASK - 1;
> +                    v->arch.pv_vcpu.dr_mask[msr.index] = msr.value;
> +                    continue;
> +                }
> +                break;
> +            }
> +
> +            vcpu_unpause(v);
> +
> +            if ( i == vmsrs->msr_count )
> +                ret = 0;

else {
    vmsrs->msr_count = i
    copyback = 1;
}

to at once give the caller an indication which slot failed?

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-05 12:46   ` Jan Beulich
@ 2014-06-05 13:01     ` Andrew Cooper
  2014-06-05 13:33       ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2014-06-05 13:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Xen-devel

On 05/06/14 13:46, Jan Beulich wrote:
>>>> On 04.06.14 at 19:26, <andrew.cooper3@citrix.com> wrote:
>> Despite my 'Reviewed-by' tag on c/s 65e3554908 "x86/PV: support data
>> breakpoint extension registers", I have re-evaluated my position as far as 
>> the hypercall interface is concerned.
>>
>> Previously, for the sake of not modifying the migration code in libxc,
>> XEN_DOMCTL_get_ext_vcpucontext would jump though hoops to return -ENOBUFS if
>> and only if MSRs were in use and no buffer was present.
>>
>> This is fragile, and awkward from a toolstack point-of-view when actually
>> sending MSR content in the migration stream.  It also complicates fixing a
>> further race condition, between querying the number of MSRs for a vcpu, and
>> the vcpu touching a new one.
>>
>> As this code is still only in staging, take this opportunity to redesign the
> s/staging/unstable/

Oops - yes.

>
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -1341,6 +1341,132 @@ long arch_do_domctl(
>>      }
>>      break;
>>  
>> +    case XEN_DOMCTL_get_vcpu_msrs:
>> +    case XEN_DOMCTL_set_vcpu_msrs:
>> +    {
>> +        struct xen_domctl_vcpu_msrs *vmsrs = &domctl->u.vcpu_msrs;
>> +        struct xen_domctl_vcpu_msr msr;
>> +        struct vcpu *v;
>> +        uint32_t nr_msrs = 0;
>> +
>> +        ret = -ESRCH;
>> +        if ( (vmsrs->vcpu >= d->max_vcpus) ||
>> +             ((v = d->vcpu[vmsrs->vcpu]) == NULL) )
>> +            break;
>> +
>> +        ret = -EINVAL;
>> +        if ( (v == current) || /* no vcpu_pause() */
>> +             !is_pv_domain(d) )
>> +            break;
>> +
>> +        /* Count maximum number of optional msrs. */
>> +        if ( boot_cpu_has(X86_FEATURE_DBEXT) )
>> +            nr_msrs += 4;
>> +
>> +        if ( domctl->cmd == XEN_DOMCTL_get_vcpu_msrs )
>> +        {
>> +            /* NULL guest handle is a request for max size. */
>> +            if ( guest_handle_is_null(vmsrs->msrs) ||
>> +                 (vmsrs->msr_count < nr_msrs) )
>> +            {
>> +                vmsrs->msr_count = nr_msrs;
>> +                ret = guest_handle_is_null(vmsrs->msrs) ? 0 : -ENOBUFS;
> I don't think you should be failing "get" if there is enough space in
> the provided buffer to store the actually used number of MSRs. That
> way the caller may make a first call with a few (rather than none at
> all) entries, an grow the buffer only if this wasn't sufficient.

I am not sure I agree.  The MSRs are unordered in the buffer which the
caller cannot control, and issuing a hypercall again with a larger
buffer will rewrite it from the start again.

The sole use of this hypercall needs to ensure that all MSRs are gotten,
otherwise VM corruption will occur.  Permitting a partial get will make
the return value ambiguous for making this hypercall a single time and
guessing at the size to use, although I suspect we are less interested
in this problem.

>
>> +                }
>> +
>> +                vcpu_unpause(v);
>> +
>> +                /* Check we didn't lie to userspace then overflow the buffer */
>> +                BUG_ON(i > nr_msrs);
>> +                vmsrs->msr_count = i;
>> +            }
>> +
>> +            copyback = 1;
>> +        }
>> +        else
>> +        {
>> +            ret = -EINVAL;
>> +            if ( vmsrs->msr_count > nr_msrs )
>> +                break;
> Similarly I don't think you should fail the operation simply based on a
> too large count. Who knows when or why it may make sense for the
> caller to specify multiple entries with the same index.
>
> But then again the way you do it we have a statically determined
> maximum and don't need to worry about preemption here until a
> really large number of MSRs would get handled.

For now, this prevents looping to the end of a guest controlled
uint32_t, as per some of the concerns in XSA-77.

If in the future, there is a scenario where reloading the same MSR
several times becomes valid, the logic can be re-addressed.

>
>> +
>> +            vcpu_pause(v);
>> +
>> +            for ( i = 0; i < vmsrs->msr_count; ++i )
>> +            {
>> +                ret = -EFAULT;
>> +                if ( copy_from_guest_offset(&msr, vmsrs->msrs, i, 1) )
>> +                    break;
>> +
>> +                ret = -EINVAL;
>> +                if ( msr.reserved )
>> +                    break;
>> +
>> +                switch ( msr.index )
>> +                {
>> +                case MSR_AMD64_DR0_ADDRESS_MASK:
>> +                    if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
>> +                         (msr.value >> 32) )
>> +                        break;
>> +                    v->arch.pv_vcpu.dr_mask[0] = msr.value;
>> +                    continue;
>> +
>> +                case MSR_AMD64_DR1_ADDRESS_MASK ...
>> +                    MSR_AMD64_DR3_ADDRESS_MASK:
>> +                    if ( !boot_cpu_has(X86_FEATURE_DBEXT) ||
>> +                         (msr.value >> 32) )
>> +                        break;
>> +                    msr.index -= MSR_AMD64_DR1_ADDRESS_MASK - 1;
>> +                    v->arch.pv_vcpu.dr_mask[msr.index] = msr.value;
>> +                    continue;
>> +                }
>> +                break;
>> +            }
>> +
>> +            vcpu_unpause(v);
>> +
>> +            if ( i == vmsrs->msr_count )
>> +                ret = 0;
> else {
>     vmsrs->msr_count = i
>     copyback = 1;
> }
>
> to at once give the caller an indication which slot failed?
>
> Jan

Ok - seems useful as a debugging measure.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-05 13:01     ` Andrew Cooper
@ 2014-06-05 13:33       ` Jan Beulich
  2014-06-06 14:53         ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-06-05 13:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Xen-devel

>>> On 05.06.14 at 15:01, <andrew.cooper3@citrix.com> wrote:
> On 05/06/14 13:46, Jan Beulich wrote:
>>>>> On 04.06.14 at 19:26, <andrew.cooper3@citrix.com> wrote:
>>> +        if ( domctl->cmd == XEN_DOMCTL_get_vcpu_msrs )
>>> +        {
>>> +            /* NULL guest handle is a request for max size. */
>>> +            if ( guest_handle_is_null(vmsrs->msrs) ||
>>> +                 (vmsrs->msr_count < nr_msrs) )
>>> +            {
>>> +                vmsrs->msr_count = nr_msrs;
>>> +                ret = guest_handle_is_null(vmsrs->msrs) ? 0 : -ENOBUFS;
>> I don't think you should be failing "get" if there is enough space in
>> the provided buffer to store the actually used number of MSRs. That
>> way the caller may make a first call with a few (rather than none at
>> all) entries, an grow the buffer only if this wasn't sufficient.
> 
> I am not sure I agree.  The MSRs are unordered in the buffer which the
> caller cannot control, and issuing a hypercall again with a larger
> buffer will rewrite it from the start again.

And I didn't suggest incremental use.

> The sole use of this hypercall needs to ensure that all MSRs are gotten,
> otherwise VM corruption will occur.  Permitting a partial get will make
> the return value ambiguous for making this hypercall a single time and
> guessing at the size to use, although I suspect we are less interested
> in this problem.

Why would the return value ambiguous? You'd get -ENOBUFS if you
provided too few slots, and you'd get to know the maximum number
at that point at once.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-04 17:26 ` [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save() Andrew Cooper
@ 2014-06-05 13:41   ` Jan Beulich
  2014-06-05 15:52   ` Ian Campbell
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2014-06-05 13:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Ian Campbell, Xen-devel

>>> On 04.06.14 at 19:26, <andrew.cooper3@citrix.com> wrote:
> +        /* Check there are no PV MSRs in use. */
> +        domctl.cmd = XEN_DOMCTL_get_vcpu_msrs;
> +        domctl.domain = dom;
> +        memset(&domctl.u, 0, sizeof(domctl.u));
> +        domctl.u.vcpu_msrs.vcpu = i;
> +        if ( xc_domctl(xch, &domctl) < 0 )
> +        {
> +            PERROR("Error querying maximum number of MSRs for VCPU%d", i);
> +            goto out;
> +        }
> +
> +        if ( domctl.u.vcpu_msrs.msr_count )
> +        {
> +            buffer = xc_hypercall_buffer_alloc(xch, buffer,
> +                                               domctl.u.vcpu_msrs.msr_count *
> +                                               sizeof(xen_domctl_vcpu_msr_t));
> +            if ( !buffer )
> +            {
> +                PERROR("Insufficient memory for getting MSRs for VCPU%d", i);
> +                goto out;
> +            }
> +            set_xen_guest_handle(domctl.u.vcpu_msrs.msrs, buffer);
> +
> +            if ( xc_domctl(xch, &domctl) < 0 )
> +            {
> +                PERROR("Error querying MSRs for VCPU%d", i);
> +                goto out;
> +            }
> +
> +            xc_hypercall_buffer_free(xch, buffer);
> +            if ( domctl.u.vcpu_msrs.msr_count )
> +            {
> +                errno = EOPNOTSUPP;
> +                PERROR("Unable to migrate PV guest using MSRs (yet)");
> +                goto out;
> +            }
> +        }

If you changed the hypervisor side as suggested, you'd get away
with a single xc_domctl() and less than half of the number of added
lines I suppose: You'd simply as for one MSR, and if you get one, or
get -ENOBUFS, you'd know you can't handle it (for now).

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-04 17:26 ` [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save() Andrew Cooper
  2014-06-05 13:41   ` Jan Beulich
@ 2014-06-05 15:52   ` Ian Campbell
  2014-06-05 15:57     ` Andrew Cooper
  1 sibling, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2014-06-05 15:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Jan Beulich, Xen-devel

On Wed, 2014-06-04 at 18:26 +0100, Andrew Cooper wrote:
> Migrating PV domains using MSRs is not supported.  This uses the new
> XEN_DOMCTL_get_vcpu_msrs and will fail the migration with an explicit error.
> 
> This is an improvement upon the current failure of
>   "No extended context for VCPUxx (ENOBUFS)"
> 
> Support for migrating PV domains which are using MSRs will be included in the
> migration v2 work.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> ---
>  tools/libxc/xc_domain_save.c |   38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index acf3685..7ef5183 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -1995,6 +1995,44 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>              goto out;
>          }
>  
> +        /* Check there are no PV MSRs in use. */
> +        domctl.cmd = XEN_DOMCTL_get_vcpu_msrs;
> +        domctl.domain = dom;
> +        memset(&domctl.u, 0, sizeof(domctl.u));
> +        domctl.u.vcpu_msrs.vcpu = i;
> +        if ( xc_domctl(xch, &domctl) < 0 )
> +        {
> +            PERROR("Error querying maximum number of MSRs for VCPU%d", i);
> +            goto out;
> +        }
> +
> +        if ( domctl.u.vcpu_msrs.msr_count )
> +        {
> +            buffer = xc_hypercall_buffer_alloc(xch, buffer,
> +                                               domctl.u.vcpu_msrs.msr_count *
> +                                               sizeof(xen_domctl_vcpu_msr_t));
> +            if ( !buffer )
> +            {
> +                PERROR("Insufficient memory for getting MSRs for VCPU%d", i);
> +                goto out;
> +            }
> +            set_xen_guest_handle(domctl.u.vcpu_msrs.msrs, buffer);
> +
> +            if ( xc_domctl(xch, &domctl) < 0 )
> +            {
> +                PERROR("Error querying MSRs for VCPU%d", i);
> +                goto out;
> +            }
> +
> +            xc_hypercall_buffer_free(xch, buffer);
> +            if ( domctl.u.vcpu_msrs.msr_count )

I'm obviously missing something.

You first call it with a NULL buffer to get
domctl.u.vcpu_msrs.msr_count. Then if msr_count is non-zero you allocate
a buffer and ask Xen to fill it. If it turns out that Xen did actually
fill the buffer with something then you consider that an error.

Can you not just error out on the basis of the initial msr_count?

Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-05 15:52   ` Ian Campbell
@ 2014-06-05 15:57     ` Andrew Cooper
  2014-06-06  9:15       ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2014-06-05 15:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Jan Beulich, Xen-devel

On 05/06/14 16:52, Ian Campbell wrote:
> On Wed, 2014-06-04 at 18:26 +0100, Andrew Cooper wrote:
>> Migrating PV domains using MSRs is not supported.  This uses the new
>> XEN_DOMCTL_get_vcpu_msrs and will fail the migration with an explicit error.
>>
>> This is an improvement upon the current failure of
>>   "No extended context for VCPUxx (ENOBUFS)"
>>
>> Support for migrating PV domains which are using MSRs will be included in the
>> migration v2 work.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> ---
>>  tools/libxc/xc_domain_save.c |   38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
>> index acf3685..7ef5183 100644
>> --- a/tools/libxc/xc_domain_save.c
>> +++ b/tools/libxc/xc_domain_save.c
>> @@ -1995,6 +1995,44 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>>              goto out;
>>          }
>>  
>> +        /* Check there are no PV MSRs in use. */
>> +        domctl.cmd = XEN_DOMCTL_get_vcpu_msrs;
>> +        domctl.domain = dom;
>> +        memset(&domctl.u, 0, sizeof(domctl.u));
>> +        domctl.u.vcpu_msrs.vcpu = i;
>> +        if ( xc_domctl(xch, &domctl) < 0 )
>> +        {
>> +            PERROR("Error querying maximum number of MSRs for VCPU%d", i);
>> +            goto out;
>> +        }
>> +
>> +        if ( domctl.u.vcpu_msrs.msr_count )
>> +        {
>> +            buffer = xc_hypercall_buffer_alloc(xch, buffer,
>> +                                               domctl.u.vcpu_msrs.msr_count *
>> +                                               sizeof(xen_domctl_vcpu_msr_t));
>> +            if ( !buffer )
>> +            {
>> +                PERROR("Insufficient memory for getting MSRs for VCPU%d", i);
>> +                goto out;
>> +            }
>> +            set_xen_guest_handle(domctl.u.vcpu_msrs.msrs, buffer);
>> +
>> +            if ( xc_domctl(xch, &domctl) < 0 )
>> +            {
>> +                PERROR("Error querying MSRs for VCPU%d", i);
>> +                goto out;
>> +            }
>> +
>> +            xc_hypercall_buffer_free(xch, buffer);
>> +            if ( domctl.u.vcpu_msrs.msr_count )
> I'm obviously missing something.
>
> You first call it with a NULL buffer to get
> domctl.u.vcpu_msrs.msr_count. Then if msr_count is non-zero you allocate
> a buffer and ask Xen to fill it. If it turns out that Xen did actually
> fill the buffer with something then you consider that an error.

Correct.  This is the "ask once for size, second for content" style used
in other domctls.

>
> Can you not just error out on the basis of the initial msr_count?
>
> Ian.
>

No.

To avoid a race condition with the vcpu touching a new MSR between the
two hypercalls, Xen must return the maximum possible msr_count with the
request for size, so the toolstack can guarantee to allocate a large
enough buffer.

Otherwise, the second hypercall would fail because of an undersized
buffer, despite querying for size beforehand.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-05 15:57     ` Andrew Cooper
@ 2014-06-06  9:15       ` Ian Campbell
  2014-06-06  9:44         ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2014-06-06  9:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Jan Beulich, Xen-devel

On Thu, 2014-06-05 at 16:57 +0100, Andrew Cooper wrote:
> To avoid a race condition with the vcpu touching a new MSR between the
> two hypercalls, Xen must return the maximum possible msr_count with the
> request for size, so the toolstack can guarantee to allocate a large
> enough buffer.

If the msr_count returned a build time constant can't it just be part of
the interface then? Either as #define or an explicitly sized array type
somewhere.

Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-06  9:15       ` Ian Campbell
@ 2014-06-06  9:44         ` Andrew Cooper
  2014-06-06  9:48           ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2014-06-06  9:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Ian Jackson, Jan Beulich, Xen-devel

On 06/06/14 10:15, Ian Campbell wrote:
> On Thu, 2014-06-05 at 16:57 +0100, Andrew Cooper wrote:
>> To avoid a race condition with the vcpu touching a new MSR between the
>> two hypercalls, Xen must return the maximum possible msr_count with the
>> request for size, so the toolstack can guarantee to allocate a large
>> enough buffer.
> If the msr_count returned a build time constant can't it just be part of
> the interface then? Either as #define or an explicitly sized array type
> somewhere.
>
> Ian.
>

The maximum possible msr_count depends on architecture, features
available and, plausibly, domain cpuid policy.

At the moment it is trivial, Intel = 0 and AMD = {0, 4}, but this will
get more complicated as we add support for more MSRs.

The information does have to be made available on a per-domain basis.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save()
  2014-06-06  9:44         ` Andrew Cooper
@ 2014-06-06  9:48           ` Ian Campbell
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Campbell @ 2014-06-06  9:48 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Jan Beulich, Xen-devel

On Fri, 2014-06-06 at 10:44 +0100, Andrew Cooper wrote:
> On 06/06/14 10:15, Ian Campbell wrote:
> > On Thu, 2014-06-05 at 16:57 +0100, Andrew Cooper wrote:
> >> To avoid a race condition with the vcpu touching a new MSR between the
> >> two hypercalls, Xen must return the maximum possible msr_count with the
> >> request for size, so the toolstack can guarantee to allocate a large
> >> enough buffer.
> > If the msr_count returned a build time constant can't it just be part of
> > the interface then? Either as #define or an explicitly sized array type
> > somewhere.
> >
> > Ian.
> >
> 
> The maximum possible msr_count depends on architecture, features
> available and, plausibly, domain cpuid policy.
> 
> At the moment it is trivial, Intel = 0 and AMD = {0, 4}, but this will
> get more complicated as we add support for more MSRs.
> 
> The information does have to be made available on a per-domain basis.

OK then.

Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-05 13:33       ` Jan Beulich
@ 2014-06-06 14:53         ` Andrew Cooper
  2014-06-06 15:09           ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2014-06-06 14:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Xen-devel

On 05/06/14 14:33, Jan Beulich wrote:
> <snip>
>> The sole use of this hypercall needs to ensure that all MSRs are gotten,
>> otherwise VM corruption will occur.  Permitting a partial get will make
>> the return value ambiguous for making this hypercall a single time and
>> guessing at the size to use, although I suspect we are less interested
>> in this problem.
> Why would the return value ambiguous? You'd get -ENOBUFS if you
> provided too few slots, and you'd get to know the maximum number
> at that point at once.
>
> Jan
>

Having tried to implement these improvements, I hit problems so would
like to decide upon an interface before hacking futher.

Currently behaviour for get:
* Null guest handle returns msr_count set to maximum number of msrs Xen
might write
* msr_count < max_msrs fails with -ENOBUFS
* if msrs are written, msr_count reflects the number written (likely
less than max_msrs)

Current behaviour for set:
* msr_count > max_msrs fails with -EINVAL
* problems with individual msrs fail with -EINVAL

Suggestions:
* for get, msr_count < max_msrs should perform a partial write,
returning -ENOBUFS if Xen needs to write more than msr_count msrs.

This reduces the amount of code added to xc_domain_save() to fail
migrations actually using PV msrs.  I am not too concerned about this
code, as it will be rm'd in the migration-v2 series which implements PV
MSR migration properly.  I am a little bit hesitant about supporting
partial writes, although I suppose it is plausible to want to know "how
many MSRs is the vcpu currently using", and doing that with a single
hypercall is preferable to requiring two.

* for set, in the case of a bad msr, identify it back to the caller to
aid with debugging.

This is useful to help debugging, but needs disambiguating against the
other cases which fail with -EINVAL, including the paths which would
fail before having a chance to set msr_count to the index of the bad
msr.  Therefore, msr_count *can't* be overloaded for this purpose.


I see one solution to these problems.  Using:

struct xen_domctl_vcpu_msrs {
    u32 vcpu;
    union { u16 max_msrs, /* OUT from get */
            u16 err_idx}; /* Possibly OUT from set */
    u16 msr_count;
    XEN_GUEST_HANDLE_64(xen_domctl_vcpu_msr_t) msrs;
};

max_msrs and current msrs can be reported at the same time (both on a
NULL guest handle).  If the caller of set sets err_idx to ~0 before the
call, it can unambiguously determine the offending MSR, without
confusing other -EINVAL failure cases.

Does this look plausible? Can we get away with anonymous unions in the
public header files?

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-06 14:53         ` Andrew Cooper
@ 2014-06-06 15:09           ` Jan Beulich
  2014-06-06 15:28             ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-06-06 15:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Xen-devel

>>> On 06.06.14 at 16:53, <andrew.cooper3@citrix.com> wrote:
> On 05/06/14 14:33, Jan Beulich wrote:
>> <snip>
>>> The sole use of this hypercall needs to ensure that all MSRs are gotten,
>>> otherwise VM corruption will occur.  Permitting a partial get will make
>>> the return value ambiguous for making this hypercall a single time and
>>> guessing at the size to use, although I suspect we are less interested
>>> in this problem.
>> Why would the return value ambiguous? You'd get -ENOBUFS if you
>> provided too few slots, and you'd get to know the maximum number
>> at that point at once.
>>
>> Jan
>>
> 
> Having tried to implement these improvements, I hit problems so would
> like to decide upon an interface before hacking futher.
> 
> Currently behaviour for get:
> * Null guest handle returns msr_count set to maximum number of msrs Xen
> might write
> * msr_count < max_msrs fails with -ENOBUFS
> * if msrs are written, msr_count reflects the number written (likely
> less than max_msrs)
> 
> Current behaviour for set:
> * msr_count > max_msrs fails with -EINVAL
> * problems with individual msrs fail with -EINVAL
> 
> Suggestions:
> * for get, msr_count < max_msrs should perform a partial write,
> returning -ENOBUFS if Xen needs to write more than msr_count msrs.
> 
> This reduces the amount of code added to xc_domain_save() to fail
> migrations actually using PV msrs.  I am not too concerned about this
> code, as it will be rm'd in the migration-v2 series which implements PV
> MSR migration properly.  I am a little bit hesitant about supporting
> partial writes, although I suppose it is plausible to want to know "how
> many MSRs is the vcpu currently using", and doing that with a single
> hypercall is preferable to requiring two.

Yes. I didn't see above what problem you found with this.

> * for set, in the case of a bad msr, identify it back to the caller to
> aid with debugging.
> 
> This is useful to help debugging, but needs disambiguating against the
> other cases which fail with -EINVAL, including the paths which would
> fail before having a chance to set msr_count to the index of the bad
> msr.  Therefore, msr_count *can't* be overloaded for this purpose.

Actually it can - the caller will know the number it put there, and if it's
unchanged then the failure was not associated with a particular array
entry (all possible values on error would be smaller than the value
originally there).

> I see one solution to these problems.  Using:
> 
> struct xen_domctl_vcpu_msrs {
>     u32 vcpu;
>     union { u16 max_msrs, /* OUT from get */
>             u16 err_idx}; /* Possibly OUT from set */
>     u16 msr_count;
>     XEN_GUEST_HANDLE_64(xen_domctl_vcpu_msr_t) msrs;
> };
> 
> max_msrs and current msrs can be reported at the same time (both on a
> NULL guest handle).  If the caller of set sets err_idx to ~0 before the
> call, it can unambiguously determine the offending MSR, without
> confusing other -EINVAL failure cases.
> 
> Does this look plausible? Can we get away with anonymous unions in the
> public header files?

No, nothing that isn't C89 is permitted in the public headers.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs
  2014-06-06 15:09           ` Jan Beulich
@ 2014-06-06 15:28             ` Andrew Cooper
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2014-06-06 15:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Xen-devel

On 06/06/14 16:09, Jan Beulich wrote:
>>
>> Having tried to implement these improvements, I hit problems so would
>> like to decide upon an interface before hacking futher.
>>
>> Currently behaviour for get:
>> * Null guest handle returns msr_count set to maximum number of msrs Xen
>> might write
>> * msr_count < max_msrs fails with -ENOBUFS
>> * if msrs are written, msr_count reflects the number written (likely
>> less than max_msrs)
>>
>> Current behaviour for set:
>> * msr_count > max_msrs fails with -EINVAL
>> * problems with individual msrs fail with -EINVAL
>>
>> Suggestions:
>> * for get, msr_count < max_msrs should perform a partial write,
>> returning -ENOBUFS if Xen needs to write more than msr_count msrs.
>>
>> This reduces the amount of code added to xc_domain_save() to fail
>> migrations actually using PV msrs.  I am not too concerned about this
>> code, as it will be rm'd in the migration-v2 series which implements PV
>> MSR migration properly.  I am a little bit hesitant about supporting
>> partial writes, although I suppose it is plausible to want to know "how
>> many MSRs is the vcpu currently using", and doing that with a single
>> hypercall is preferable to requiring two.
> Yes. I didn't see above what problem you found with this.

Not a problem purse, just a concern.

>
>> * for set, in the case of a bad msr, identify it back to the caller to
>> aid with debugging.
>>
>> This is useful to help debugging, but needs disambiguating against the
>> other cases which fail with -EINVAL, including the paths which would
>> fail before having a chance to set msr_count to the index of the bad
>> msr.  Therefore, msr_count *can't* be overloaded for this purpose.
> Actually it can - the caller will know the number it put there, and if it's
> unchanged then the failure was not associated with a particular array
> entry (all possible values on error would be smaller than the value
> originally there).

Ah yes - quite correct.  That make the set-side debugging trivial.

I will see about making these alterations.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-06-06 15:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-04 17:26 [PATCH 0/4] Fixes to several domctls for migration Andrew Cooper
2014-06-04 17:26 ` [PATCH 1/4] x86/domctl: Implement XEN_DOMCTL_{get, set}_vcpu_msrs Andrew Cooper
2014-06-05 12:46   ` Jan Beulich
2014-06-05 13:01     ` Andrew Cooper
2014-06-05 13:33       ` Jan Beulich
2014-06-06 14:53         ` Andrew Cooper
2014-06-06 15:09           ` Jan Beulich
2014-06-06 15:28             ` Andrew Cooper
2014-06-04 17:26 ` [PATCH 2/4] tools/libxc: Use an explicit check for PV MSRs in xc_domain_save() Andrew Cooper
2014-06-05 13:41   ` Jan Beulich
2014-06-05 15:52   ` Ian Campbell
2014-06-05 15:57     ` Andrew Cooper
2014-06-06  9:15       ` Ian Campbell
2014-06-06  9:44         ` Andrew Cooper
2014-06-06  9:48           ` Ian Campbell
2014-06-04 17:26 ` [PATCH 3/4] x86/domctl: Remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext Andrew Cooper
2014-06-05  7:52   ` Frediano Ziglio
2014-06-05  9:25     ` Andrew Cooper
2014-06-04 17:26 ` [PATCH 4/4] x86/domctl: Two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).