From: Jan Kiszka <jan.kiszka@siemens.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>,
"Nadav Har'El" <nyh@math.technion.ac.il>
Subject: Re: [PATCH v2] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode
Date: Thu, 07 Mar 2013 12:57:27 +0100 [thread overview]
Message-ID: <513880A7.8070109@siemens.com> (raw)
In-Reply-To: <20130307115058.GG11223@redhat.com>
On 2013-03-07 12:50, Gleb Natapov wrote:
> On Thu, Mar 07, 2013 at 12:25:26PM +0100, Jan Kiszka wrote:
>> On 2013-03-07 12:06, Gleb Natapov wrote:
>>> On Thu, Mar 07, 2013 at 11:37:43AM +0100, Jan Kiszka wrote:
>>>> On 2013-03-07 09:57, Gleb Natapov wrote:
>>>>> On Thu, Mar 07, 2013 at 09:53:49AM +0100, Jan Kiszka wrote:
>>>>>> On 2013-03-07 09:43, Gleb Natapov wrote:
>>>>>>> On Thu, Mar 07, 2013 at 09:12:19AM +0100, Jan Kiszka wrote:
>>>>>>>> On 2013-03-07 08:51, Gleb Natapov wrote:
>>>>>>>>> On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote:
>>>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was
>>>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode
>>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early
>>>>>>>>>> when in guest mode.
>>>>>>>>>>
>>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned
>>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
>>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
>>>>>>>>>> are not suited as input.
>>>>>>>>>>
>>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
>>>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this
>>>>>>>>>> check now also for CR4.
>>>>>>>>>>
>>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write
>>>>>>>>>> originally.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Changes in v2:
>>>>>>>>>> - keep the non-misleading part of the comment in handle_set_cr0
>>>>>>>>>>
>>>>>>>>>> arch/x86/kvm/vmx.c | 46 +++++++++++++++++++++++++++++++---------------
>>>>>>>>>> 1 files changed, 31 insertions(+), 15 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>>>>> index 7cc566b..832b7b4 100644
>>>>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>>>>> @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
>>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
>>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
>>>>>>>>>> {
>>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon &&
>>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
>>>>>>>>>> - return 1;
>>>>>>>>>> -
>>>>>>>>>> if (is_guest_mode(vcpu)) {
>>>>>>>>>> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>>>>>>>>>> + unsigned long orig_val = val;
>>>>>>>>>> +
>>>>>>>>>> /*
>>>>>>>>>> * We get here when L2 changed cr0 in a way that did not change
>>>>>>>>>> * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
>>>>>>>>>> - * but did change L0 shadowed bits. This can currently happen
>>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu
>>>>>>>>>> - * loading) while pretending to allow the guest to change it.
>>>>>>>>>> + * but did change L0 shadowed bits.
>>>>>>>>>> */
>>>>>>>>>> - if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) |
>>>>>>>>>> - (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits)))
>>>>>>>>>> + val = (val & ~vmcs12->cr0_guest_host_mask) |
>>>>>>>>>> + (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask);
>>>>>>>>> I think using GUEST_CR0 here is incorrect. It contains combination of bits
>>>>>>>>> set by L2, L1 and L0 and here we need to get only L2/L1 mix which is in
>>>>>>>>> vcpu->arch.cr0 (almost, but good enough for this case). Why vcpu->arch.cr0
>>>>>>>>> contains right L2/L1 mix?
>>>>>>>>
>>>>>>>> L0/L1. E.g., kvm_set_cr0 unconditionally injects X86_CR0_ET and masks
>>>>>>>> out reserved bits. But you are right, GUEST_CR0 isn't much better. And
>>>>>>>> maybe that mangling in kvm_set_cr0 is a corner case we can ignore.
>>>>>>>>
>>>>>>> I think we can. ET is R/O and wired to 1, so it does not matter what
>>>>>>> guest writes there it should be treated as 1. About reserved bits spec
>>>>>>> says that software should write what it reads there and does not specify
>>>>>>> what happens if software does not follow this.
>>>>>>>
>>>>>>>>> Because it was set to vmcs12->guest_cr0 during
>>>>>>>>> L2 #vmentry. While L2 is running three things may happen to CR0:
>>>>>>>>>
>>>>>>>>> 1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. It
>>>>>>>>> will go strait to GUEST_CR0.
>>>>>>>>> 2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. On the
>>>>>>>>> next #vmetry vcpu->arch.cr0 will be set to whatever value L1 calculated.
>>>>>>>>> 3. L2 writes to a bit shadowed by L0, but not L1. This is the case we
>>>>>>>>> are handling here. And if we will do it right vcpu->arch.cr0 will be
>>>>>>>>> up-to-date at the end.
>>>>>>>>>
>>>>>>>>> The only case when, while this code running, vcpu->arch.cr0 has not
>>>>>>>>> up-to-date value is if 1 happened, but since L2 guest overwriting cr0
>>>>>>>>> here anyway it does not matter what it previously set in GUEST_CR0. The
>>>>>>>>> correct bits are in a new cr0 value provided by val and accessible by
>>>>>>>>> (val & ~vmcs12->cr0_guest_host_mask).
>>>>>>>>
>>>>>>>> I need to think about it again. Maybe vmcs12->guest_cr0 is best, but
>>>>>>>> that's a shot from the hips now.
>>>>>>>>
>>>>>>> I do not think it is correct because case 3 does not update it. So if 3
>>>>>>> happens twice without L1 #vmexit between then vmcs12->guest_cr0 will be
>>>>>>> outdated.
>>>>>>
>>>>>> Again, the only thing that matters here is L1's, not L0's view on the
>>>>>> "real" CR0 value. So guest_cr0 is never outdated (/wrt
>>>>>> cr0_guest_host_mask) as it will be updated by L1 in step 2. Even if
>>>>>> arch.cr0 vs. guest_cr0 makes no difference in practice, the latter is
>>>>>> more consistent, so I will go for it unless you can convince me it is wrong.
>>>>>>
>>>>> Hmm, yes you are right that wrt cr0_guest_host_mask guest_cr0 should be
>>>>> up-to-date. Please write a big comment about it.
>>>>
>>>> Will do.
>>>>
>>>>> And what about moving VMXON_CR0_ALWAYSON check into vmx_set_cr0()?
>>>>
>>>> That doesn't make much sense for CR0 (due to the differences between
>>>> vmxon and guest mode - and lacking return code of set_cr4). But I can
>>>> consolidate the CR4 checks.
>>>>
>>> Isn't vmxon check is implicit in a guest mode. i.e if is_guest_mode() is
>>> trues then vmxon is on? Return code can be added.
>>
>> Ah, sorry, you are not seeing what I'm looking at: The test will change
>> for L2 context once unrestricted guest mode is added. At that point, it
>> makes more sense to split it into one version that checks against
>> VMXON_CR0_ALWAYSON while in vmxon, targeting L1, and another that does
>> more complex evaluation for L2, depending on nested_cpu_has2(vmcs12,
>> SECONDARY_EXEC_UNRESTRICTED_GUEST).
>>
> Ah, OK. Hard to argue that those checks can be consolidated without
> seeing them :) So you want to implement unrestricted L1 on restricted L0 and
> let L0 emulate real mode of L2 directly?
Err, no. :) Well, that emulation might even work but doesn't help unless
you also emulate EPT (not unrestricted guest mode without EPT support -
according to the spec).
I just want make L0's unrestricted guest mode support available for L1
(in fact, I already did this in my hacking branch).
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2013-03-07 11:57 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-04 19:40 [PATCH v2] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode Jan Kiszka
2013-03-05 12:57 ` Gleb Natapov
2013-03-05 13:02 ` Gleb Natapov
2013-03-05 13:18 ` Jan Kiszka
2013-03-07 0:33 ` Marcelo Tosatti
2013-03-07 7:51 ` Gleb Natapov
2013-03-07 8:12 ` Jan Kiszka
2013-03-07 8:27 ` Jan Kiszka
2013-03-07 8:48 ` Gleb Natapov
2013-03-07 8:43 ` Gleb Natapov
2013-03-07 8:53 ` Jan Kiszka
2013-03-07 8:57 ` Gleb Natapov
2013-03-07 10:37 ` Jan Kiszka
2013-03-07 11:06 ` Gleb Natapov
2013-03-07 11:25 ` Jan Kiszka
2013-03-07 11:50 ` Gleb Natapov
2013-03-07 11:57 ` Jan Kiszka [this message]
2013-03-07 12:05 ` Gleb Natapov
2013-03-07 12:18 ` Jan Kiszka
2013-03-07 12:21 ` Gleb Natapov
2013-03-07 12:48 ` Jan Kiszka
2013-03-07 13:04 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=513880A7.8070109@siemens.com \
--to=jan.kiszka@siemens.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=nyh@math.technion.ac.il \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox