All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: huaitong.han@intel.com, Luwei Kang <luwei.kang@intel.com>,
	yong.y.wang@intel.com, xen-devel@lists.xen.org
Subject: Re: [PATCH] x86/cpuid: fix dom0 crash on skylake machine
Date: Thu, 2 Jun 2016 12:12:35 +0100	[thread overview]
Message-ID: <575014A3.4000503@citrix.com> (raw)
In-Reply-To: <574EFF2E02000078000F06E6@prv-mh.provo.novell.com>

[-- Attachment #1: Type: text/plain, Size: 2343 bytes --]

On 01/06/16 14:28, Jan Beulich wrote:
>>>> On 01.06.16 at 15:03, <andrew.cooper3@citrix.com> wrote:
>> On 01/06/16 13:01, Jan Beulich wrote:
>>>>>> I want to adjust the representation of cpuid information in struct
>>>>>> domain. The current loop in domain_cpuid() causes an O(N) overhead for
>>>>>> every query, which is very poor for actions which really should be a
>>>>>> single bit test at a fixed offset.
>>>>>>
>>>>>> This needs to be combined with properly splitting the per-domain and
>>>>>> per-vcpu information, which requires knowing the expected vcpu topology
>>>>>> during domain creation.
>>>>>>
>>>>>> On top of that, there needs to be verification logic to check the
>>>>>> correctness of information passed from the toolstack.
>>>>>>
>>>>>> All of these areas are covered in the "known issues" section of the
>>>>>> feature doc, and I do plan to fix them all.  However, it isn't a couple
>>>>>> of hours worth of work.
>>>>> All understood, yet not to the point: The original remark was that
>>>>> the very XSTATE handling could be done better with far not as much
>>>>> of a change, at least afaict without having tried.
>>>> In which case I don't know what you were suggesting.
>>> Make {hvm,pv}_cpuid() invoke themselves recursively to
>>> determine what bits to mask off from CPUID[0xd].EAX.
>> So that would work.  However, to do this, you need to query leaves 1,
>> 0x80000001 and 7, all of which will hit the O(N) loop in domain_cpuid()
>>
>> Luckily, none of those specific paths further recurse into {hvm,pv}_cpuid().
>>
>> I am unsure which to go with.  My gut feel is that this would be quite a
>> performance hit, but I have no evidence either way.  OTOH, it will give
>> the correct answer, rather than an approximation.
> Not only since I believe performance is very close to irrelevant for
> CPUID leaf 0xD invocations, I think I'd prefer correctness over
> performance (as would be basically always the case). How about
> you?

Right - this is the alternative, doing the calculation in
{hvm,pv}_cpuid(), based on top of your cleanup from yesterday.

There is a bugfix in the PV side (pv_featureset[FEATURESET_1c] should be
taken into account even for control/hardware domain accesses), and a
preemptive fix on the HVM side to avoid advertising any XSS states, as
we don't support any yet.

Thoughts?

~Andrew

[-- Attachment #2: 0001-xen-x86-Clip-guests-view-of-xfeature_mask.patch --]
[-- Type: text/x-diff, Size: 8366 bytes --]

>From be056a4b78929e428d50ca386ec56b42c9cde9ac Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 2 Jun 2016 12:08:42 +0100
Subject: [PATCH] xen/x86: Clip guests view of xfeature_mask

---
 xen/arch/x86/hvm/hvm.c       | 46 +++++++++++++++++++++++++++++--
 xen/arch/x86/traps.c         | 65 +++++++++++++++++++++++++++++++++++++-------
 xen/include/asm-x86/xstate.h | 31 ++++++++++++++-------
 3 files changed, 120 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bb98051..53f2bdd 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3362,7 +3362,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
 
     switch ( input )
     {
-        unsigned int _ecx, _edx;
+        unsigned int _ebx, _ecx, _edx;
 
     case 0x1:
         /* Fix up VLAPIC details. */
@@ -3443,6 +3443,44 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         switch ( count )
         {
         case 0:
+        {
+            uint64_t xfeature_mask = XSTATE_SSE | XSTATE_FP;
+            uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
+
+            if ( _ecx & cpufeat_mask(X86_FEATURE_AVX) )
+            {
+                xfeature_mask |= XSTATE_YMM;
+                xstate_size += xstate_sizes[_XSTATE_YMM];
+            }
+
+            _ecx = 0;
+            hvm_cpuid(7, NULL, &_ebx, &_ecx, NULL);
+
+            if ( _ebx & cpufeat_mask(X86_FEATURE_MPX) )
+            {
+                xfeature_mask |= XSTATE_BNDREGS | XSTATE_BNDCSR;
+                xstate_size += xstate_sizes[_XSTATE_BNDREGS] +
+                    xstate_sizes[_XSTATE_BNDCSR];
+            }
+
+            if ( _ebx & cpufeat_mask(X86_FEATURE_PKU) )
+            {
+                xfeature_mask |= XSTATE_PKRU;
+                xstate_size += xstate_sizes[_XSTATE_PKRU];
+            }
+
+            hvm_cpuid(0x80000001, NULL, NULL, &_ecx, NULL);
+
+            if ( _ecx & cpufeat_mask(X86_FEATURE_LWP) )
+            {
+                xfeature_mask |= XSTATE_LWP;
+                xstate_size += xstate_sizes[_XSTATE_LWP];
+            }
+
+            *eax = (uint32_t)xfeature_mask;
+            *edx = (uint32_t)(xfeature_mask >> 32);
+            *ecx = xstate_size;
+
             /*
              * Always read CPUID[0xD,0].EBX from hardware, rather than domain
              * policy.  It varies with enabled xstate, and the correct xcr0 is
@@ -3450,6 +3488,8 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
              */
             cpuid_count(input, count, &dummy, ebx, &dummy, &dummy);
             break;
+        }
+
         case 1:
             *eax &= hvm_featureset[FEATURESET_Da1];
 
@@ -3463,7 +3503,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                 cpuid_count(input, count, &dummy, ebx, &dummy, &dummy);
             }
             else
-                *ebx = *ecx = *edx = 0;
+                *ebx = 0;
+
+            *ecx = *edx = 0;
             break;
         }
         break;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 5d7232d..4ba90a4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -928,7 +928,7 @@ void pv_cpuid(struct cpu_user_regs *regs)
 
     switch ( leaf )
     {
-        uint32_t tmp;
+        uint32_t tmp, _ebx, _ecx, _edx;
 
     case 0x00000001:
         c &= pv_featureset[FEATURESET_1c];
@@ -1087,19 +1087,63 @@ void pv_cpuid(struct cpu_user_regs *regs)
         break;
 
     case XSTATE_CPUID:
-        if ( !((!is_control_domain(currd) && !is_hardware_domain(currd)
-                ? ({
-                    uint32_t ecx;
-
-                    domain_cpuid(currd, 1, 0, &tmp, &tmp, &ecx, &tmp);
-                    ecx & pv_featureset[FEATURESET_1c];
-                  })
-                : cpuid_ecx(1)) & cpufeat_mask(X86_FEATURE_XSAVE)) ||
-             subleaf >= 63 )
+
+        if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
+            domain_cpuid(currd, 1, 0, &tmp, &tmp, &_ecx, &tmp);
+        else
+            _ecx = cpuid_ecx(1);
+        _ecx &= pv_featureset[FEATURESET_1c];
+
+        if ( !(_ecx & cpufeat_mask(X86_FEATURE_XSAVE)) || subleaf >= 63 )
             goto unsupported;
         switch ( subleaf )
         {
         case 0:
+        {
+            uint64_t xfeature_mask = XSTATE_SSE | XSTATE_FP;
+            uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
+
+            if ( _ecx & cpufeat_mask(X86_FEATURE_AVX) )
+            {
+                xfeature_mask |= XSTATE_YMM;
+                xstate_size += xstate_sizes[_XSTATE_YMM];
+            }
+
+            if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
+                domain_cpuid(currd, 7, 0, &tmp, &_ebx, &tmp, &tmp);
+            else
+                cpuid_count(7, 0, &tmp, &_ebx, &tmp, &tmp);
+            _ebx &= pv_featureset[FEATURESET_7b0];
+
+            if ( _ebx & cpufeat_mask(X86_FEATURE_MPX) )
+            {
+                xfeature_mask |= XSTATE_BNDREGS | XSTATE_BNDCSR;
+                xstate_size += xstate_sizes[_XSTATE_BNDREGS] +
+                    xstate_sizes[_XSTATE_BNDCSR];
+            }
+
+            if ( _ebx & cpufeat_mask(X86_FEATURE_PKU) )
+            {
+                xfeature_mask |= XSTATE_PKRU;
+                xstate_size += xstate_sizes[_XSTATE_PKRU];
+            }
+
+            if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
+                domain_cpuid(currd, 0x80000001, 0, &tmp, &tmp, &tmp, &_edx);
+            else
+                _edx = cpuid_edx(0x80000001);
+            _edx &= pv_featureset[FEATURESET_e1d];
+
+            if ( _ecx & cpufeat_mask(X86_FEATURE_LWP) )
+            {
+                xfeature_mask |= XSTATE_LWP;
+                xstate_size += xstate_sizes[_XSTATE_LWP];
+            }
+
+            a = (uint32_t)xfeature_mask;
+            d = (uint32_t)(xfeature_mask >> 32);
+            c = xstate_size;
+
             /*
              * Always read CPUID.0xD[ECX=0].EBX from hardware, rather than
              * domain policy.  It varies with enabled xstate, and the correct
@@ -1108,6 +1152,7 @@ void pv_cpuid(struct cpu_user_regs *regs)
             if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
                 cpuid_count(leaf, subleaf, &tmp, &b, &tmp, &tmp);
             break;
+        }
 
         case 1:
             a &= pv_featureset[FEATURESET_Da1];
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 4535354..4b0c33e 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -26,16 +26,27 @@
 #define XSAVE_HDR_OFFSET          FXSAVE_SIZE
 #define XSTATE_AREA_MIN_SIZE      (FXSAVE_SIZE + XSAVE_HDR_SIZE)
 
-#define XSTATE_FP      (1ULL << 0)
-#define XSTATE_SSE     (1ULL << 1)
-#define XSTATE_YMM     (1ULL << 2)
-#define XSTATE_BNDREGS (1ULL << 3)
-#define XSTATE_BNDCSR  (1ULL << 4)
-#define XSTATE_OPMASK  (1ULL << 5)
-#define XSTATE_ZMM     (1ULL << 6)
-#define XSTATE_HI_ZMM  (1ULL << 7)
-#define XSTATE_PKRU    (1ULL << 9)
-#define XSTATE_LWP     (1ULL << 62) /* AMD lightweight profiling */
+#define _XSTATE_FP                0
+#define XSTATE_FP                 (1ULL << _XSTATE_FP)
+#define _XSTATE_SSE               1
+#define XSTATE_SSE                (1ULL << _XSTATE_SSE)
+#define _XSTATE_YMM               2
+#define XSTATE_YMM                (1ULL << _XSTATE_YMM)
+#define _XSTATE_BNDREGS           3
+#define XSTATE_BNDREGS            (1ULL << _XSTATE_BNDREGS)
+#define _XSTATE_BNDCSR            4
+#define XSTATE_BNDCSR             (1ULL << _XSTATE_BNDCSR)
+#define _XSTATE_OPMASK            5
+#define XSTATE_OPMASK             (1ULL << _XSTATE_OPMASK)
+#define _XSTATE_ZMM               6
+#define XSTATE_ZMM                (1ULL << _XSTATE_ZMM)
+#define _XSTATE_HI_ZMM            7
+#define XSTATE_HI_ZMM             (1ULL << _XSTATE_HI_ZMM)
+#define _XSTATE_PKRU              9
+#define XSTATE_PKRU               (1ULL << _XSTATE_PKRU)
+#define _XSTATE_LWP               62
+#define XSTATE_LWP                (1ULL << _XSTATE_LWP)
+
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK    (XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
                         XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
-- 
2.1.4


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-06-02 11:12 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01  4:58 [PATCH] x86/cpuid: fix dom0 crash on skylake machine Luwei Kang
2016-06-01  5:54 ` Han, Huaitong
2016-06-01  8:49   ` Jan Beulich
2016-06-01  9:00     ` Han, Huaitong
2016-06-01  9:03 ` Andrew Cooper
2016-06-01  9:17   ` Jan Beulich
2016-06-01  9:34     ` Andrew Cooper
2016-06-01  9:43       ` Jan Beulich
2016-06-01 11:27         ` Andrew Cooper
2016-06-01 11:38           ` Jan Beulich
2016-06-01 11:45             ` Andrew Cooper
2016-06-01 12:01               ` Jan Beulich
2016-06-01 13:03                 ` Andrew Cooper
2016-06-01 13:28                   ` Jan Beulich
2016-06-02 11:12                     ` Andrew Cooper [this message]
2016-06-02 11:34                       ` Jan Beulich
2016-06-02 11:44                         ` Andrew Cooper
2016-06-02 12:15                           ` Jan Beulich
2016-06-01  9:21   ` Han, Huaitong
2016-06-01  9:30   ` Wei Liu
2016-06-01  9:45     ` Andrew Cooper
2016-06-01 10:54   ` Kang, Luwei
2016-06-01 10:57     ` Andrew Cooper
2016-06-01  9:04 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575014A3.4000503@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=huaitong.han@intel.com \
    --cc=luwei.kang@intel.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yong.y.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.