From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Chenyi Qiang <chenyi.qiang@intel.com>,
Sean Christopherson <seanjc@google.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Joerg Roedel <joro@8bytes.org>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] KVM: VMX: Enable Notify VM exit
Date: Tue, 1 Mar 2022 09:40:55 +0800 [thread overview]
Message-ID: <1cca344e-1c2d-8ebf-87ae-d9298a73306a@intel.com> (raw)
In-Reply-To: <CALMp9eQj4Xr9VAdHw4BfPEskQYptEYYHRrpmFfVU1TCQJmHwug@mail.gmail.com>
On 2/28/2022 10:30 PM, Jim Mattson wrote:
> On Sun, Feb 27, 2022 at 11:10 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 2/26/2022 10:24 PM, Jim Mattson wrote:
>>> On Fri, Feb 25, 2022 at 10:24 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>>
>>>> On 2/26/2022 12:53 PM, Jim Mattson wrote:
>>>>> On Fri, Feb 25, 2022 at 8:25 PM Jim Mattson <jmattson@google.com> wrote:
>>>>>>
>>>>>> On Fri, Feb 25, 2022 at 8:07 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>>>>>
>>>>>>> On 2/25/2022 11:13 PM, Paolo Bonzini wrote:
>>>>>>>> On 2/25/22 16:12, Xiaoyao Li wrote:
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I don't like the idea of making things up without notifying userspace
>>>>>>>>>>> that this is fictional. How is my customer running nested VMs supposed
>>>>>>>>>>> to know that L2 didn't actually shutdown, but L0 killed it because the
>>>>>>>>>>> notify window was exceeded? If this information isn't reported to
>>>>>>>>>>> userspace, I have no way of getting the information to the customer.
>>>>>>>>>>
>>>>>>>>>> Then, maybe a dedicated software define VM exit for it instead of
>>>>>>>>>> reusing triple fault?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Second thought, we can even just return Notify VM exit to L1 to tell
>>>>>>>>> L2 causes Notify VM exit, even thought Notify VM exit is not exposed
>>>>>>>>> to L1.
>>>>>>>>
>>>>>>>> That might cause NULL pointer dereferences or other nasty occurrences.
>>>>>>>
>>>>>>> IMO, a well written VMM (in L1) should handle it correctly.
>>>>>>>
>>>>>>> L0 KVM reports no Notify VM Exit support to L1, so L1 runs without
>>>>>>> setting Notify VM exit. If a L2 causes notify_vm_exit with
>>>>>>> invalid_vm_context, L0 just reflects it to L1. In L1's view, there is no
>>>>>>> support of Notify VM Exit from VMX MSR capability. Following L1 handler
>>>>>>> is possible:
>>>>>>>
>>>>>>> a) if (notify_vm_exit available & notify_vm_exit enabled) {
>>>>>>> handle in b)
>>>>>>> } else {
>>>>>>> report unexpected vm exit reason to userspace;
>>>>>>> }
>>>>>>>
>>>>>>> b) similar handler like we implement in KVM:
>>>>>>> if (!vm_context_invalid)
>>>>>>> re-enter guest;
>>>>>>> else
>>>>>>> report to userspace;
>>>>>>>
>>>>>>> c) no Notify VM Exit related code (e.g. old KVM), it's treated as
>>>>>>> unsupported exit reason
>>>>>>>
>>>>>>> As long as it belongs to any case above, I think L1 can handle it
>>>>>>> correctly. Any nasty occurrence should be caused by incorrect handler in
>>>>>>> L1 VMM, in my opinion.
>>>>>>
>>>>>> Please test some common hypervisors (e.g. ESXi and Hyper-V).
>>>>>
>>>>> I took a look at KVM in Linux v4.9 (one of our more popular guests),
>>>>> and it will not handle this case well:
>>>>>
>>>>> if (exit_reason < kvm_vmx_max_exit_handlers
>>>>> && kvm_vmx_exit_handlers[exit_reason])
>>>>> return kvm_vmx_exit_handlers[exit_reason](vcpu);
>>>>> else {
>>>>> WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_reason);
>>>>> kvm_queue_exception(vcpu, UD_VECTOR);
>>>>> return 1;
>>>>> }
>>>>>
>>>>> At least there's an L1 kernel log message for the first unexpected
>>>>> NOTIFY VM-exit, but after that, there is silence. Just a completely
>>>>> inexplicable #UD in L2, assuming that L2 is resumable at this point.
>>>>
>>>> At least there is a message to tell L1 a notify VM exit is triggered in
>>>> L2. Yes, the inexplicable #UD won't be hit unless L2 triggers Notify VM
>>>> exit with invalid_context, which is malicious to L0 and L1.
>>>
>>> There is only an L1 kernel log message *the first time*. That's not
>>> good enough. And this is just one of the myriad of possible L1
>>> hypervisors.
>>>
>>>> If we use triple_fault (i.e., shutdown), then no info to tell L1 that
>>>> it's caused by Notify VM exit with invalid context. Triple fault needs
>>>> to be extended and L1 kernel needs to be enlightened. It doesn't help
>>>> old guest kernel.
>>>>
>>>> If we use Machine Check, it's somewhat same inexplicable to L2 unless
>>>> it's enlightened. But it doesn't help old guest kernel.
>>>>
>>>> Anyway, for Notify VM exit with invalid context from L2, I don't see a
>>>> good solution to tell L1 VMM it's a "Notify VM exit with invalid context
>>>> from L2" and keep all kinds of L1 VMM happy, especially for those with
>>>> old kernel versions.
>>>
>>> I agree that there is no way to make every conceivable L1 happy.
>>> That's why the information needs to be surfaced to the L0 userspace. I
>>> contend that any time L0 kvm violates the architectural specification
>>> in its emulation of L1 or L2, the L0 userspace *must* be informed.
>>
>> We can make the design to exit to userspace on notify vm exit
>> unconditionally with exit_qualification passed, then userspace can take
>> the same action like what this patch does in KVM that
>>
>> - re-enter guest when context_invalid is false;
>> - stop running the guest if context_invalid is true; (userspace can
>> definitely re-enter the guest in this case, but it needs to take the
>> fall on this)
>>
>> Then, for nested case, L0 needs to enable it transparently for L2 if
>> this feature is enabled for L1 guest (the reason as we all agreed that
>> cannot allow L1 to escape just by creating a L2). Then what should KVM
>> do when notify vm exit from L2?
>>
>> - Exit to L0 userspace on L2's notify vm exit. L0 userspace takes the
>> same action:
>> - re-enter if context-invalid is false;
>> - kill L1 if context-invalid is true; (I don't know if there is any
>> interface for L0 userspace to kill L2). Then it opens the potential door
>> for malicious user to kill L1 by creating a L2 to trigger fatal notify
>> vm exit. If you guys accept it, we can implement in this way.
>>
>>
>> in conclusion, we have below solution:
>>
>> 1. Take this patch as is. The drawback is L1 VMM receives a triple_fault
>> from L2 when L2 triggers notify vm exit with invalid context. Neither of
>> L1 VMM, L1 userspace, nor L2 kernel know it's caused due to notify vm
>> exit. There is only kernel log in L0, which seems not accessible for L1
>> user or L2 guest.
>
> You are correct on that last point, and I feel that I cannot stress it
> enough. In a typical environment, the L0 kernel log is only available
> to the administrator of the L0 host.
>
>> 2. a) Inject notify vm exit back to L1 if L2 triggers notify vm exit
>> with invalid context. The drawback is, old L1 hypervisor is not
>> enlightened of it and maybe misbehave on it.
>>
>> b) Inject a synthesized SHUTDOWN exit to L1, with additional info to
>> tell it's caused by fatal notify vm exit from L2. It has the same
>> drawback that old hypervisor has no idea of it and maybe misbehave on it.
>>
>> 3. Exit to L0 usersapce unconditionally no matter it's caused from L1 or
>> L2. Then it may open the door for L1 user to kill L1.
>>
>> Do you have any better solution other than above? If no, we need to pick
>> one from above though it cannot make everyone happy.
>
> Yes, I believe I have a better solution. We obviously need an API for
> userspace to synthesize a SHUTDOWN event for a vCPU.
Can you elaborate on it? Do you mean userspace to inject a synthesized
SHUTDOWN to guest? If so, I have no idea how it will work.
> In addition, to
> avoid breaking legacy userspace, the NOTIFY VM-exit should be opt-in.
Yes, it's designed as opt-in already that the feature is off by default.
next prev parent reply other threads:[~2022-03-01 1:41 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-23 6:24 [PATCH v3] KVM: VMX: Enable Notify VM exit Chenyi Qiang
2022-02-25 11:54 ` Paolo Bonzini
2022-02-25 12:46 ` Xiaoyao Li
2022-02-25 14:54 ` Jim Mattson
2022-02-25 15:04 ` Xiaoyao Li
2022-02-25 15:12 ` Xiaoyao Li
2022-02-25 15:13 ` Paolo Bonzini
2022-02-25 18:06 ` Jim Mattson
2022-02-25 18:29 ` Sean Christopherson
2022-02-25 19:15 ` Jim Mattson
2022-02-26 4:07 ` Xiaoyao Li
2022-02-26 4:25 ` Jim Mattson
2022-02-26 4:53 ` Jim Mattson
2022-02-26 6:24 ` Xiaoyao Li
2022-02-26 14:24 ` Jim Mattson
2022-02-28 7:10 ` Xiaoyao Li
2022-02-28 14:30 ` Jim Mattson
2022-03-01 1:40 ` Xiaoyao Li [this message]
2022-03-01 4:32 ` Jim Mattson
2022-03-01 5:30 ` Xiaoyao Li
2022-03-01 21:57 ` Jim Mattson
2022-03-02 2:15 ` Chenyi Qiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1cca344e-1c2d-8ebf-87ae-d9298a73306a@intel.com \
--to=xiaoyao.li@intel.com \
--cc=chenyi.qiang@intel.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox