From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [FYI PATCH] Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
Date: Fri, 25 Mar 2022 16:03:06 +0100 [thread overview]
Message-ID: <87r16qnkgl.fsf@redhat.com> (raw)
In-Reply-To: <Yj0FYSC2sT4k/ELl@google.com>
Sean Christopherson <seanjc@google.com> writes:
...
So I went back to "KVM: x86/mmu: Zap only TDP MMU leafs in
kvm_zap_gfn_range()" and confirmed that with the patch in place Hyper-V
always crashes, sooner or later. With the patch reverted (as well as
with current 'kvm/queue') it boots.
>
> Actually, since this is apparently specific to kvm_zap_gfn_range(), can you add
> printk "tracing" in update_mtrr(), kvm_post_set_cr0(), and __kvm_request_apicv_update()
> to see what is actually triggering zaps? Capturing the start and end GFNs would be very
> helpful for the MTRR case.
>
> The APICv update seems unlikely to affect only Hyper-V guests, though there is the auto
> EOI crud. And the other two only come into play with non-coherent DMA. In other words,
> figuring out exactly what sequence leads to failure should be straightforward.
The tricky part here is that Hyper-V doesn't crash immediately, the
crash is always different (if you look at the BSOD) and happens at
different times. Crashes mention various stuff like trying to execute
non-executable memory, ...
I've added tracing you've suggested:
- __kvm_request_apicv_update() happens only once in the very beginning.
- update_mtrr() never actually reaches kvm_zap_gfn_range()
- kvm_post_set_cr0() happen in early boot but the crash happen much much
later. E.g.:
...
qemu-system-x86-117525 [019] ..... 4738.682954: kvm_post_set_cr0: vCPU 12 10 11
qemu-system-x86-117525 [019] ..... 4738.682997: kvm_post_set_cr0: vCPU 12 11 80000011
qemu-system-x86-117525 [019] ..... 4738.683053: kvm_post_set_cr0: vCPU 12 80000011 c0000011
qemu-system-x86-117525 [019] ..... 4738.683059: kvm_post_set_cr0: vCPU 12 c0000011 80010031
qemu-system-x86-117526 [005] ..... 4738.812107: kvm_post_set_cr0: vCPU 13 10 11
qemu-system-x86-117526 [005] ..... 4738.812148: kvm_post_set_cr0: vCPU 13 11 80000011
qemu-system-x86-117526 [005] ..... 4738.812198: kvm_post_set_cr0: vCPU 13 80000011 c0000011
qemu-system-x86-117526 [005] ..... 4738.812205: kvm_post_set_cr0: vCPU 13 c0000011 80010031
qemu-system-x86-117527 [003] ..... 4738.941004: kvm_post_set_cr0: vCPU 14 10 11
qemu-system-x86-117527 [003] ..... 4738.941107: kvm_post_set_cr0: vCPU 14 11 80000011
qemu-system-x86-117527 [003] ..... 4738.941218: kvm_post_set_cr0: vCPU 14 80000011 c0000011
qemu-system-x86-117527 [003] ..... 4738.941235: kvm_post_set_cr0: vCPU 14 c0000011 80010031
qemu-system-x86-117528 [035] ..... 4739.070338: kvm_post_set_cr0: vCPU 15 10 11
qemu-system-x86-117528 [035] ..... 4739.070428: kvm_post_set_cr0: vCPU 15 11 80000011
qemu-system-x86-117528 [035] ..... 4739.070539: kvm_post_set_cr0: vCPU 15 80000011 c0000011
qemu-system-x86-117528 [035] ..... 4739.070557: kvm_post_set_cr0: vCPU 15 c0000011 80010031
##### CPU 8 buffer started ####
qemu-system-x86-117528 [008] ..... 4760.099532: kvm_hv_set_msr_pw: 15
The debug patch for kvm_post_set_cr0() is:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fa4d8269e5b..db7c5a05e574 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -870,6 +870,8 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
{
+ trace_printk("vCPU %d %lx %lx\n", vcpu->vcpu_id, old_cr0, cr0);
+
if ((cr0 ^ old_cr0) & X86_CR0_PG) {
kvm_clear_async_pf_completion_queue(vcpu);
kvm_async_pf_hash_reset(vcpu);
kvm_hv_set_msr_pw() call is when Hyper-V writes to HV_X64_MSR_CRASH_CTL
('hv-crash' QEMU flag is needed to enable the feature). The debug patch
is:
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a32f54ab84a2..59a72f6ced99 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1391,6 +1391,7 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
/* Send notification about crash to user space */
kvm_make_request(KVM_REQ_HV_CRASH, vcpu);
+ trace_printk("%d\n", vcpu->vcpu_id);
}
break;
case HV_X64_MSR_RESET:
So it's 20 seconds (!) between the last kvm_post_set_cr0() call and the
crash. My (disappointing) conclusion is: the problem can be anywhere and
Hyper-V detects it much much later.
--
Vitaly
next prev parent reply other threads:[~2022-03-25 15:03 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-18 16:48 [FYI PATCH] Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" Paolo Bonzini
2022-03-21 9:13 ` Paolo Bonzini
2022-03-24 23:57 ` Sean Christopherson
2022-03-25 10:38 ` Vitaly Kuznetsov
2022-03-25 11:21 ` Paolo Bonzini
2022-03-25 20:22 ` Sean Christopherson
2022-03-25 15:03 ` Vitaly Kuznetsov [this message]
2022-03-25 20:18 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r16qnkgl.fsf@redhat.com \
--to=vkuznets@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox