xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
Date: Mon, 15 Feb 2016 17:12:45 +0000	[thread overview]
Message-ID: <56C2070D.10803@citrix.com> (raw)
In-Reply-To: <56C2003902000078000D23AC@prv-mh.provo.novell.com>

On 15/02/16 15:43, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>          /* Fix up VLAPIC details. */
>>          *ebx &= 0x00FFFFFFu;
>>          *ebx |= (v->vcpu_id * 2) << 24;
>> +
>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>> +        *edx &= hvm_featureset[FEATURESET_1d];
> Looks like I've overlooked an issue in patch 11, which becomes
> apparent here: How can you use a domain-independent featureset
> here, when features vary between HAP and shadow mode guests?
> I.e. in the earlier patch I suppose you need to calculate two
> hvm_*_featureset[]s, with the HAP one perhaps empty when
> !hvm_funcs.hap_supported.

Their use here is a halfway house between nothing and the planned full
per-domain policies.

In this case, the "don't expose $X to a non-hap domain" checks have been
retained, to cover the difference.

>
>> @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>          break;
>>  
>>      case 0x80000001:
>> +        *ecx &= hvm_featureset[FEATURESET_e1c];
>> +        *edx &= hvm_featureset[FEATURESET_e1d];
> Looking at the uses here, wouldn't it be better (less ambiguous) for
> these to be named FEATURESET_x1c and FEATURESET_x1d
> respectively, since x is not a hex digit, but e is?

I originally chose e because these are commonly known as the extended
cpuid leaves.  'e' being a hex digit is why the number is specified as
an upper case hex number, as shown with FEATURESET_Da1.

I suppose x could work here, but I don't see being less ambiguous. 
(Perhaps I am clouded by having this syntax firmly embedded in my
expectation).

>
>> @@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>      else
>>          cpuid_count(leaf, subleaf, &a, &b, &c, &d);
>>  
>> -    if ( (leaf & 0x7fffffff) == 0x00000001 )
>> -    {
>> -        /* Modify Feature Information. */
>> -        if ( !cpu_has_apic )
>> -            __clear_bit(X86_FEATURE_APIC, &d);
>> -
>> -        if ( !is_pvh_domain(currd) )
>> -        {
>> -            __clear_bit(X86_FEATURE_PSE, &d);
>> -            __clear_bit(X86_FEATURE_PGE, &d);
>> -            __clear_bit(X86_FEATURE_PSE36, &d);
>> -            __clear_bit(X86_FEATURE_VME, &d);
>> -        }
>> -    }
>> -
>>      switch ( leaf )
>>      {
>>      case 0x00000001:
>> -        /* Modify Feature Information. */
>> -        if ( !cpu_has_sep )
>> -            __clear_bit(X86_FEATURE_SEP, &d);
>> -        __clear_bit(X86_FEATURE_DS, &d);
>> -        __clear_bit(X86_FEATURE_ACC, &d);
>> -        __clear_bit(X86_FEATURE_PBE, &d);
>> -        if ( is_pvh_domain(currd) )
>> -            __clear_bit(X86_FEATURE_MTRR, &d);
>> -
>> -        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
>> -        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
>> -        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
>> -        __clear_bit(X86_FEATURE_VMXE % 32, &c);
>> -        __clear_bit(X86_FEATURE_SMXE % 32, &c);
>> -        __clear_bit(X86_FEATURE_TM2 % 32, &c);
>> +        c &= pv_featureset[FEATURESET_1c];
>> +        d &= pv_featureset[FEATURESET_1d];
>> +
>>          if ( is_pv_32bit_domain(currd) )
>> -            __clear_bit(X86_FEATURE_CX16 % 32, &c);
>> -        __clear_bit(X86_FEATURE_XTPR % 32, &c);
>> -        __clear_bit(X86_FEATURE_PDCM % 32, &c);
>> -        __clear_bit(X86_FEATURE_PCID % 32, &c);
>> -        __clear_bit(X86_FEATURE_DCA % 32, &c);
>> -        if ( !cpu_has_xsave )
>> -        {
>> -            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
>> -            __clear_bit(X86_FEATURE_AVX % 32, &c);
>> -        }
>> -        if ( !cpu_has_apic )
>> -           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
>> -        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
>> +            c &= ~cpufeat_mask(X86_FEATURE_CX16);
> Shouldn't this be taken care of by clearing LM and then applying
> your dependencies correction? Or is that meant to only get
> enforced later? Is it maybe still worth having both pv64_featureset[]
> and pv32_featureset[]?

Again, this is just used as a halfway house.  Longterm, I plan to read
the featureset straight from the per-domain policy, which won't be
stored as an adhoc array.

>
>> +        /*
>> +         * !!! Warning - OSXSAVE handling for PV guests is non-architectural !!!
>> +         *
>> +         * Architecturally, the correct code here is simply:
>> +         *
>> +         *   if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE )
>> +         *       c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
>> +         *
>> +         * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the
>> +         * XSAVE cpuid flag leaked into guests despite the feature not being
>> +         * avilable for use), buggy workarounds where introduced to Linux (c/s
>> +         * 947ccf9c, also Nov 2010) which relied on the fact that Xen also
>> +         * incorrectly leaked OSXSAVE into the guest.
>> +         *
>> +         * Furthermore, providing architectural OSXSAVE behaviour to a many
>> +         * Linux PV guests triggered a further kernel bug when the fpu code
>> +         * observes that XSAVEOPT is available, assumes that xsave state had
>> +         * been set up for the task, and follows a wild pointer.
>> +         *
>> +         * Therefore, the leaking of Xen's OSXSAVE setting has become a
>> +         * defacto part of the PV ABI and can't reasonably be corrected.
>> +         *
>> +         * The following situations and logic now applies:
>> +         *
>> +         * - Hardware without CPUID faulting support and native CPUID:
>> +         *    There is nothing Xen can do here.  The hosts XSAVE flag will
>> +         *    leak through and Xen's OSXSAVE choice will leak through.
>> +         *
>> +         *    In the case that the guest kernel has not set up OSXSAVE, only
>> +         *    SSE will be set in xcr0, and guest userspace can't do too much
>> +         *    damage itself.
>> +         *
>> +         * - Enlightened CPUID or CPUID faulting available:
>> +         *    Xen can fully control what is seen here.  Guest kernels need to
>> +         *    see the leaked OSXSAVE, but guest userspace is given
>> +         *    architectural behaviour, to reflect the guest kernels
>> +         *    intentions.
>> +         */
> Well, I think all of this is too harsh: In a hypervisor-guest
> relationship of PV kind I don't view it as entirely wrong to let the
> guest kernel know of whether the hypervisor enabled XSAVE
> support by inspecting the OSXSAVE bit. From guest kernel
> perspective, the hypervisor is kind of the OS.

Xen shadows the guests %cr4.  That alone means that the emulated cpuid
should have matched.

If you are going for the OS argument, that means "Xen has XSAVE turned
on and the guest can the provided features" and not "Xen supports the
guest configuring its own XCR0", which is what it is taken to mean.  In
both of these cases, deviating from the architectural behaviour confuses
the situation, and adds adds needless divergence from the native
implementation in guests.

I admit that all of this stems from the complete fiasco which was the
original XSAVE support, but all of it would be unnecessary had buggy
bugfixes not been stack up on each other.

>  While I won't
> make weakening the above a little a requirement, I'd appreciate
> you doing so.

I don't which to be deliberately over the top, but I really don't think
I am in this case.  None of this should be necessary.

>
>> +    case 0x80000007:
>> +        d &= pv_featureset[FEATURESET_e7d];
>> +        break;
> By not clearing eax and ebx (not sure about ecx) here you would
> again expose flags to guests without proper white listing.
>
>> +    case 0x80000008:
>> +        b &= pv_featureset[FEATURESET_e8b];
>>          break;
> Same here for ecx and edx and perhaps the upper 8 bits of eax.

Both of these would be changes to how these things are currently
handled, whereby a guest gets to see whatever the toolstack managed to
find in said leaf.  I was hoping to put off some of these decisions, but
they probably need making now.  On the PV side they definitely can't be
fully hidden, as these leaves are not maskable.

~Andrew

  reply	other threads:[~2016-02-15 17:12 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 02/30] xen/x86: Do not store VIA/Cyrix/Centaur CPU features Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 03/30] xen/x86: Drop cpuinfo_x86.x86_power Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 04/30] xen/x86: Improvements to pv_cpuid() Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
2016-02-12 16:27   ` Jan Beulich
2016-02-17 13:08     ` Andrew Cooper
2016-02-17 13:34       ` Jan Beulich
2016-02-19 17:29   ` Joao Martins
2016-02-19 17:55     ` Andrew Cooper
2016-02-19 22:03       ` Joao Martins
2016-02-20 16:17         ` Andrew Cooper
2016-02-20 17:39           ` Joao Martins
2016-02-20 19:17             ` Andrew Cooper
2016-02-22 18:50               ` Joao Martins
2016-02-05 13:41 ` [PATCH v2 06/30] xen/x86: Script to automatically process featureset information Andrew Cooper
2016-02-12 16:36   ` Jan Beulich
2016-02-12 16:43     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves Andrew Cooper
2016-02-12 16:38   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities Andrew Cooper
2016-02-12 16:43   ` Jan Beulich
2016-02-12 16:48     ` Andrew Cooper
2016-02-12 17:14       ` Jan Beulich
2016-02-17 13:12         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset Andrew Cooper
2016-02-12 16:47   ` Jan Beulich
2016-02-12 16:50     ` Andrew Cooper
2016-02-12 17:15       ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset Andrew Cooper
2016-02-12 17:05   ` Jan Beulich
2016-02-12 17:42     ` Andrew Cooper
2016-02-15  9:20       ` Jan Beulich
2016-02-15 14:38         ` Andrew Cooper
2016-02-15 14:50           ` Jan Beulich
2016-02-15 14:53             ` Andrew Cooper
2016-02-15 15:02               ` Jan Beulich
2016-02-15 15:41                 ` Andrew Cooper
2016-02-17 19:02                   ` Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: " Konrad Rzeszutek Wilk
2016-02-17 19:58                     ` Boris Ostrovsky
2016-02-18 15:02                     ` Roger Pau Monné
2016-02-18 15:12                       ` Andrew Cooper
2016-02-18 16:24                         ` Boris Ostrovsky
2016-02-18 16:48                           ` Andrew Cooper
2016-02-18 17:03                         ` Roger Pau Monné
2016-02-18 22:08                           ` Konrad Rzeszutek Wilk
2016-02-18 15:16                       ` David Vrabel
2016-02-05 13:42 ` [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
2016-02-15 13:37   ` Jan Beulich
2016-02-15 14:57     ` Andrew Cooper
2016-02-15 15:07       ` Jan Beulich
2016-02-15 15:52         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Andrew Cooper
2016-02-15 14:06   ` Jan Beulich
2016-02-15 15:28     ` Andrew Cooper
2016-02-15 15:52       ` Jan Beulich
2016-02-15 16:09         ` Andrew Cooper
2016-02-15 16:27           ` Jan Beulich
2016-02-15 19:07             ` Andrew Cooper
2016-02-16  9:54               ` Jan Beulich
2016-02-17 10:25                 ` Andrew Cooper
2016-02-17 10:42                   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
2016-02-15 14:53   ` Jan Beulich
2016-02-15 15:33     ` Andrew Cooper
2016-02-15 14:56   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 14/30] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
2016-02-15 15:43   ` Jan Beulich
2016-02-15 17:12     ` Andrew Cooper [this message]
2016-02-16 10:06       ` Jan Beulich
2016-02-17 10:43         ` Andrew Cooper
2016-02-17 10:55           ` Jan Beulich
2016-02-17 14:02             ` Andrew Cooper
2016-02-17 14:45               ` Jan Beulich
2016-02-18 12:17                 ` Andrew Cooper
2016-02-18 13:23                   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
2016-02-16 14:10   ` Jan Beulich
2016-02-17 10:45     ` Andrew Cooper
2016-02-17 10:58       ` Jan Beulich
2016-02-18 12:41         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
2016-02-16 14:15   ` Jan Beulich
2016-02-17  8:15     ` Jan Beulich
2016-02-17 10:46       ` Andrew Cooper
2016-02-17 19:06   ` Konrad Rzeszutek Wilk
2016-02-05 13:42 ` [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
2016-02-17  7:40   ` Jan Beulich
2016-02-17 10:56     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
2016-02-17  7:57   ` Jan Beulich
2016-02-17 10:59     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
2016-02-17  8:06   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
2016-02-17  8:13   ` Jan Beulich
2016-02-17 11:03     ` Andrew Cooper
2016-02-17 11:14       ` Jan Beulich
2016-02-18 12:48         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
2016-02-17  8:22   ` Jan Beulich
2016-02-17 12:13     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-17  8:30   ` Jan Beulich
2016-02-17 12:17     ` Andrew Cooper
2016-02-17 12:23       ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-08 11:40     ` Andrew Cooper
2016-02-08 16:23   ` Tim Deegan
2016-02-08 16:36     ` Ian Campbell
2016-02-10 10:07       ` Andrew Cooper
2016-02-10 10:18         ` Ian Campbell
2016-02-18 13:37           ` Andrew Cooper
2016-02-17 20:06         ` Konrad Rzeszutek Wilk
2016-02-05 13:42 ` [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-05 16:15     ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 27/30] tools: Utility for dealing with featuresets Andrew Cooper
2016-02-05 16:13   ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
2016-02-05 16:13   ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
2016-02-05 16:13   ` Wei Liu
2016-02-17  8:55   ` Jan Beulich
2016-02-17 13:03     ` Andrew Cooper
2016-02-17 13:19       ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
2016-02-05 14:28   ` Jan Beulich
2016-02-05 15:22     ` Andrew Cooper
2016-02-08 17:26 ` [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling Andrew Cooper
2016-02-17  9:02   ` Jan Beulich
2016-02-17 13:06     ` Andrew Cooper
2016-02-17 13:36       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C2070D.10803@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).