Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@web.de>
To: Gleb Natapov <gleb@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>,
	Nadav Har'El <nyh@math.technion.ac.il>,
	"Nakajima, Jun" <jun.nakajima@intel.com>
Subject: Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode
Date: Mon, 04 Mar 2013 21:12:25 +0100	[thread overview]
Message-ID: <51350029.6080908@web.de> (raw)
In-Reply-To: <20130304200033.GH14220@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6372 bytes --]

On 2013-03-04 21:00, Gleb Natapov wrote:
> On Mon, Mar 04, 2013 at 08:37:38PM +0100, Jan Kiszka wrote:
>> On 2013-03-04 20:33, Gleb Natapov wrote:
>>> On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote:
>>>> On 2013-03-04 19:39, Gleb Natapov wrote:
>>>>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote:
>>>>>> On 2013-03-04 18:56, Gleb Natapov wrote:
>>>>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote:
>>>>>>>> On 2013-03-04 15:15, Gleb Natapov wrote:
>>>>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote:
>>>>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote:
>>>>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote:
>>>>>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was
>>>>>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode
>>>>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early
>>>>>>>>>>>> when in guest mode.
>>>>>>>>>>>>
>>>>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned
>>>>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
>>>>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
>>>>>>>>>>>> are not suited as input.
>>>>>>>>>>>>
>>>>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
>>>>>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this
>>>>>>>>>>>> check now also for CR4.
>>>>>>>>>>>>
>>>>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write
>>>>>>>>>>>> originally.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> Found while making unrestricted guest mode working. Not sure what impact
>>>>>>>>>>>> the bugs had on current feature level, if any.
>>>>>>>>>>>>
>>>>>>>>>>>> For interested folks, I've pushed my nEPT environment here:
>>>>>>>>>>>>
>>>>>>>>>>>>     git://git.kiszka.org/linux-kvm.git nept-hacking
>>>>>>>>>>>>
>>>>>>>>>>>>  arch/x86/kvm/vmx.c |   49 ++++++++++++++++++++++++++++++-------------------
>>>>>>>>>>>>  1 files changed, 30 insertions(+), 19 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>>>>>>> index 7cc566b..d1dac08 100644
>>>>>>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
>>>>>>>>>>>>  /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
>>>>>>>>>>>>  static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
>>>>>>>>>>>>  {
>>>>>>>>>>>> -	if (to_vmx(vcpu)->nested.vmxon &&
>>>>>>>>>>>> -	    ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
>>>>>>>>>>>> -		return 1;
>>>>>>>>>>>> -
>>>>>>>>>>>>  	if (is_guest_mode(vcpu)) {
>>>>>>>>>>>> -		/*
>>>>>>>>>>>> -		 * We get here when L2 changed cr0 in a way that did not change
>>>>>>>>>>>> -		 * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
>>>>>>>>>>>> -		 * but did change L0 shadowed bits. This can currently happen
>>>>>>>>>>>> -		 * with the TS bit: L0 may want to leave TS on (for lazy fpu
>>>>>>>>>>>> -		 * loading) while pretending to allow the guest to change it.
>>>>>>>>>>>> -		 */
>>>>>>>>>>> Can't say I understand this patch yet, but it looks like the comment is
>>>>>>>>>>> still valid. Why have you removed it?
>>>>>>>>>>
>>>>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think
>>>>>>>>>> the comment was always misleading.
>>>>>>>>>>
>>>>>>>>> I do not see how it is misleading. For everything but TS we will not get
>>>>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change
>>>>>>>>> it, but L0 does not.
>>>>>>>>
>>>>>>>> For everything *but guest-owned* we will get here, thus for most CR0
>>>>>>>> accesses (bit-wise, not regarding frequency).
>>>>>>>>
>>>>>>> I do not see how. If bit is trapped by L1 we will not get here. We will
>>>>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition.
>>>>>>> I am not arguing about you code (didn't grok it yet), but the comment
>>>>>>> still make sense to me.
>>>>>>
>>>>>> "We get here when L2 changed cr0 in a way that did not change any of
>>>>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0
>>>>>> shadowed bits." That I can sign. But the rest about TS is just
>>>>>> misleading as we trap _every_ change in L0 - except for TS under certain
>>>>>> conditions. The old code was tested against TS only, that's what the
>>>>>> comment witness.
>>>>>>
>>>>> TS is just an example of how we can get here with KVM on KVM. Obviously
>>>>> other hypervisors may have different configuration. L2 may allow full
>>>>> guest access to CR0 and then each CR0 write by L2 will be handled here.
>>>>> Under what other condition "we trap _every_ change in L0 - except for
>>>>> TS" here?
>>>>
>>>> On FPU activation:
>>>>
>>>>     cr0_guest_owned_bits = X86_CR0_TS;
>>>>
>>>> And on FPU deactivation:
>>>>
>>>>     cr0_guest_owned_bits = 0;
>>>>
>>> That's exactly TS case that comment explains. Note that
>>> CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits.
>>
>> Again, it's the inverse of what the comment suggest: we enter
>> handle_set_cr0 for every change on CR0 that doesn't match the shadow -
>> except TS was given to the guest by both L1 and L0 (or TS isn't changed
>> as well).
> That doesn't make sense to me. I do not even sure what you are saying
> since you do not specify what shadow is matched. From the code I see
> that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits
> that L1 claims to belong to it and do #vmexit to L1 if it is:
> 
>    if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow))
>             return 1;
> 
> We never reach handle_set_cr0() in that case.
> 
> Can you provide an example with actual values for L2/L1/L0 of what you
> are trying to say?

I already provided a concrete one: L1 clears PE/PG from its
guest_host_mask (assuming we support unrestricted guest mode for L1), L2
switches from real to protected mode, thus sets PE=1 while the shadow
(set by L0) holds 0 => we end up in handle_set_cr0.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

next prev parent reply	other threads:[~2013-03-04 20:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-28  9:44 [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode Jan Kiszka
2013-03-04 13:22 ` Gleb Natapov
2013-03-04 14:09   ` Jan Kiszka
2013-03-04 14:15     ` Gleb Natapov
2013-03-04 14:25       ` Jan Kiszka
2013-03-04 15:30         ` Nadav Har'El
2013-03-04 16:01           ` Jan Kiszka
2013-03-04 17:56         ` Gleb Natapov
2013-03-04 18:08           ` Jan Kiszka
2013-03-04 18:39             ` Gleb Natapov
2013-03-04 19:23               ` Jan Kiszka
2013-03-04 19:33                 ` Gleb Natapov
2013-03-04 19:37                   ` Jan Kiszka
2013-03-04 20:00                     ` Gleb Natapov
2013-03-04 20:12                       ` Jan Kiszka [this message]
2013-03-04 20:24                         ` Gleb Natapov
2013-03-04 20:37                           ` Jan Kiszka
2013-03-04 21:00                             ` Gleb Natapov
2013-03-04 21:09                               ` Jan Kiszka
2013-03-05  6:25                                 ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51350029.6080908@web.de \
    --to=jan.kiszka@web.de \
    --cc=gleb@redhat.com \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox