From: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Tianyu Lan <ltykernel@gmail.com>,
"Michael Kelley (LINUX)" <mikelley@microsoft.com>
Subject: Re: "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing" breaks SVM on Hyper-V
Date: Mon, 13 Feb 2023 13:44:46 +0100 [thread overview]
Message-ID: <9a046de1-8085-3df4-94cd-39bb893c8c9a@linux.microsoft.com> (raw)
In-Reply-To: <Y+aQyFJt9Tn2PJnC@google.com>
On 10/02/2023 19:45, Sean Christopherson wrote:
> On Fri, Feb 10, 2023, Jeremi Piotrowski wrote:
>> Hi Paolo/Sean,
>>
>> We've noticed that changes introduced in "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing"
>> conflict with a nested Hyper-V enlightenment that is always enabled on AMD CPUs
>> (HV_X64_NESTED_ENLIGHTENED_TLB). The scenario that is affected is L0 Hyper-V + L1 KVM on AMD,
>>
>> L2 VMs fail to boot due to to stale data being seen on L1/L2 side, it looks
>> like the NPT is not in sync with L0. I can reproduce this on any kernel >=5.18,
>> the easiest way is by launching qemu in a loop with debug OVMF, you can observe
>> various #GP faults, assert failures, or the guest just suddenly dies. You can try it
>> for yourself in Azure by launching an Ubuntu 22.10 image on an AMD SKU with nested
>> virtualization (Da_v5).
>>
>> In investigating I found that 3 things allow L2 guests to boot again:
>> * force tdp_mmu=N when loading kvm
>> * recompile L1 kernel to force disable HV_X64_NESTED_ENLIGHTENED_TLB
>> * revert both of these commits (found through bisecting):
>> bb95dfb9e2dfbe6b3f5eb5e8a20e0259dadbe906 "KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages"
>> efd995dae5eba57c5d28d6886a85298b390a4f07 "KVM: x86/mmu: Zap defunct roots via asynchronous worker"
>>
>> I'll paste our understanding of what is happening (thanks Tianyu):
>> """
>> Hyper-V provides HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE
>> and HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST hvcalls for l1
>> hypervisor to notify Hyper-V after L1 hypervisor changes L2 GPA <-> L1 GPA address
>> translation tables(Intel calls EPT and AMD calls NPT). This may help not to
>> mask whole address translation tables of L1 hypervisor to be write-protected in Hyper-V
>> and avoid vmexits triggered by changing address translation table in L1 hypervisor.
>>
>> The following commits defers to call these two hvcalls when there are changes in the L1
>> hypervisor address translation table. Hyper-V can't sync/shadow L1 address space
>> table at the first time due to the delay and this may cause mismatch between shadow page table
>> in the Hyper-V and L1 address translation table. IIRC, KVM side always uses write-protected
>> translation table to shadow and so doesn't meet such issue with the commit.
>> """
>>
>> Let me know if either of you have any ideas on how to approach fixing this.
>> I'm not familiar enough with TDP MMU code to be able to contribute a fix directly
>> but I'm happy to help in any way I can.
>
> As a hopefully quick-and-easy first step, can you try running KVM built from:
>
> https://github.com/kvm-x86/linux/tree/mmu
>
> specifically to get the fixes for KVM's usage of range-based TLB flushes:
>
> https://lore.kernel.org/all/cover.1665214747.git.houwenlong.hwl@antgroup.com
Just built a kernel from that tree, and it displays the same behavior. The problem
is not that the addresses are wrong, but that the flushes are issued at the wrong
time now. At least for what "enlightened NPT TLB flush" requires.
Jeremi
next prev parent reply other threads:[~2023-02-13 12:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-10 18:17 "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing" breaks SVM on Hyper-V Jeremi Piotrowski
2023-02-10 18:45 ` Sean Christopherson
2023-02-13 12:44 ` Jeremi Piotrowski [this message]
2023-02-13 12:50 ` Paolo Bonzini
2023-02-13 18:05 ` Jeremi Piotrowski
2023-02-13 18:26 ` Paolo Bonzini
2023-02-13 17:38 ` Sean Christopherson
2023-02-13 17:49 ` Jeremi Piotrowski
2023-02-13 18:11 ` Paolo Bonzini
2023-02-13 19:11 ` Sean Christopherson
2023-02-13 19:56 ` Paolo Bonzini
2023-02-14 20:27 ` Jeremi Piotrowski
2023-02-15 22:16 ` Sean Christopherson
2023-02-16 14:40 ` Jeremi Piotrowski
2023-02-24 16:17 ` Jeremi Piotrowski
2023-02-24 16:26 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a046de1-8085-3df4-94cd-39bb893c8c9a@linux.microsoft.com \
--to=jpiotrowski@linux.microsoft.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ltykernel@gmail.com \
--cc=mikelley@microsoft.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox