From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <laijs@linux.alibaba.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
X86 ML <x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC PATCH 5/6] KVM: X86: Alloc pae_root shadow page
Date: Thu, 6 Jan 2022 19:41:47 +0000 [thread overview]
Message-ID: <YddF+6eX7ycAsZLr@google.com> (raw)
In-Reply-To: <dc8f2508-35ac-0dee-2465-4b5a8e3879ca@linux.alibaba.com>
On Thu, Jan 06, 2022, Lai Jiangshan wrote:
>
>
> On 2022/1/6 00:45, Sean Christopherson wrote:
> > On Wed, Jan 05, 2022, Lai Jiangshan wrote:
> > > On Wed, Jan 5, 2022 at 5:54 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > > >
> > > > > default_pae_pdpte is needed because the cpu expect PAE pdptes are
> > > > > present when VMenter.
> > > >
> > > > That's incorrect. Neither Intel nor AMD require PDPTEs to be present. Not present
> > > > is perfectly ok, present with reserved bits is what's not allowed.
> > > >
> > > > Intel SDM:
> > > > A VM entry that checks the validity of the PDPTEs uses the same checks that are
> > > > used when CR3 is loaded with MOV to CR3 when PAE paging is in use[7]. If MOV to CR3
> > > > would cause a general-protection exception due to the PDPTEs that would be loaded
> > > > (e.g., because a reserved bit is set), the VM entry fails.
> > > >
> > > > 7. This implies that (1) bits 11:9 in each PDPTE are ignored; and (2) if bit 0
> > > > (present) is clear in one of the PDPTEs, bits 63:1 of that PDPTE are ignored.
> > >
> > > But in practice, the VM entry fails if the present bit is not set in the
> > > PDPTE for the linear address being accessed (when EPT enabled at least). The
> > > host kvm complains and dumps the vmcs state.
> >
> > That doesn't make any sense. If EPT is enabled, KVM should never use a pae_root.
> > The vmcs.GUEST_PDPTRn fields are in play, but those shouldn't derive from KVM's
> > shadow page tables.
>
> Oh, I wrote the negative what I want to say again when I try to emphasis
> something after I wrote a sentence and modified it several times.
>
> I wanted to mean "EPT not enabled" when vmx.
Heh, that makes a lot more sense.
> The VM entry fails when the guest is in very early stage when booting which
> might be still in real mode.
>
> VMEXIT: intr_info=00000000 errorcode=0000000 ilen=00000000
> reason=80000021 qualification=0000000000000002
Yep, that's the signature for an illegal PDPTE at VM-Enter. But as noted above,
a not-present PDPTE is perfectly legal, VM-Enter should failed if and only if a
PDPTE is present and has reserved bits set.
> IDTVectoring: info=00000000 errorcode=00000000
>
> >
> > And I doubt there is a VMX ucode bug at play, as KVM currently uses '0' in its
> > shadow page tables for not-present PDPTEs.
> >
> > If you can post/provide the patches that lead to VM-Fail, I'd be happy to help
> > debug.
>
> If you can try this patchset, you can just set the default_pae_pdpte to 0 to test
> it.
I can't reproduce the failure with this on top of your series + kvm/queue (commit
cc0e35f9c2d4 ("KVM: SVM: Nullify vcpu_(un)blocking() hooks if AVIC is disabled")).
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f6f7caf76b70..b7170a840330 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -728,22 +728,11 @@ static u64 default_pae_pdpte;
static void free_default_pae_pdpte(void)
{
- free_page((unsigned long)__va(default_pae_pdpte & PAGE_MASK));
default_pae_pdpte = 0;
}
static int alloc_default_pae_pdpte(void)
{
- unsigned long p = __get_free_page(GFP_KERNEL | __GFP_ZERO);
-
- if (!p)
- return -ENOMEM;
- default_pae_pdpte = __pa(p) | PT_PRESENT_MASK | shadow_me_mask;
- if (WARN_ON(is_shadow_present_pte(default_pae_pdpte) ||
- is_mmio_spte(default_pae_pdpte))) {
- free_default_pae_pdpte();
- return -EINVAL;
- }
return 0;
}
Are you using a different base and/or running with other changes?
To aid debug, the below patch will dump the PDPTEs from the current MMU root on
failure (I'll also submit this as a formal patch). On failure, I would expect
that at least one of the PDPTEs will be present with reserved bits set.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fe06b02994e6..c13f37ef1bbc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5773,11 +5773,19 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
pr_err("CR4: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx\n",
cr4, vmcs_readl(CR4_READ_SHADOW), vmcs_readl(CR4_GUEST_HOST_MASK));
pr_err("CR3 = 0x%016lx\n", vmcs_readl(GUEST_CR3));
- if (cpu_has_vmx_ept()) {
+ if (enable_ept) {
pr_err("PDPTR0 = 0x%016llx PDPTR1 = 0x%016llx\n",
vmcs_read64(GUEST_PDPTR0), vmcs_read64(GUEST_PDPTR1));
pr_err("PDPTR2 = 0x%016llx PDPTR3 = 0x%016llx\n",
vmcs_read64(GUEST_PDPTR2), vmcs_read64(GUEST_PDPTR3));
+ } else if (vcpu->arch.mmu->shadow_root_level == PT32E_ROOT_LEVEL &&
+ VALID_PAGE(vcpu->arch.mmu->root_hpa)) {
+ u64 *pdpte = __va(vcpu->arch.mmu->root_hpa);
+
+ pr_err("PDPTE0 = 0x%016llx PDPTE1 = 0x%016llx\n",
+ pdpte[0], pdpte[1]);
+ pr_err("PDPTE2 = 0x%016llx PDPTE3 = 0x%016llx\n",
+ pdpte[2], pdpte[3]);
}
pr_err("RSP = 0x%016lx RIP = 0x%016lx\n",
vmcs_readl(GUEST_RSP), vmcs_readl(GUEST_RIP));
> If you can't try this patchset, the mmu->pae_root can be possible to be modified
> to test it.
>
> I guess the vmx fails to translate %rip when VMentry in this case.
No, the CPU doesn't translate RIP at VM-Enter, vmcs.GUEST_RIP is only checked for
legality, e.g. that it's canonical. Translating RIP through page tables is firmly
a post-VM-Enter code fetch action.
next prev parent reply other threads:[~2022-01-06 19:41 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 9:25 [RFC PATCH 0/6] KVM: X86: Add and use shadow page with level promoted or acting as pae_root Lai Jiangshan
2021-12-10 9:25 ` [RFC PATCH 1/6] KVM: X86: Check root_level only in fast_pgd_switch() Lai Jiangshan
2022-01-04 20:24 ` Sean Christopherson
2021-12-10 9:25 ` [RFC PATCH 2/6] KVM: X86: Walk shadow page starting with shadow_root_level Lai Jiangshan
2022-01-04 20:34 ` Sean Christopherson
2022-01-04 20:37 ` Paolo Bonzini
2022-01-04 20:43 ` Maxim Levitsky
2021-12-10 9:25 ` [RFC PATCH 3/6] KVM: X86: Add arguement gfn and role to kvm_mmu_alloc_page() Lai Jiangshan
2022-01-04 20:53 ` Sean Christopherson
2021-12-10 9:25 ` [RFC PATCH 4/6] KVM: X86: Introduce role.level_promoted Lai Jiangshan
2022-01-04 22:14 ` Sean Christopherson
2022-02-11 16:06 ` Paolo Bonzini
2021-12-10 9:25 ` [RFC PATCH 5/6] KVM: X86: Alloc pae_root shadow page Lai Jiangshan
2022-01-04 21:54 ` Sean Christopherson
2022-01-05 3:11 ` Lai Jiangshan
2022-01-05 16:45 ` Sean Christopherson
2022-01-06 2:01 ` Lai Jiangshan
2022-01-06 19:41 ` Sean Christopherson [this message]
2022-01-07 4:36 ` Lai Jiangshan
2021-12-10 9:25 ` [RFC PATCH 6/6] KVM: X86: Use level_promoted and pae_root shadow page for 32bit guests Lai Jiangshan
2022-01-04 20:55 ` Sean Christopherson
2021-12-10 10:27 ` [RFC PATCH 0/6] KVM: X86: Add and use shadow page with level promoted or acting as pae_root Maxim Levitsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YddF+6eX7ycAsZLr@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jiangshanlai@gmail.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=laijs@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.