public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
Cc: ishiir@g.ecc.u-tokyo.ac.jp, shina@ecc.u-tokyo.ac.jp,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Prevent vmlaunch with EPTP pointing outside assigned memory area
Date: Wed, 28 Jun 2023 08:37:45 -0700	[thread overview]
Message-ID: <ZJxTTZzZnfbyMVIH@google.com> (raw)
In-Reply-To: <20230628071217.71126-1-ishiir@g.ecc.u-tokyo.ac.jp>

On Wed, Jun 28, 2023, Reima Ishii wrote:
> In nested virtualization, if L1 sets an EPTP in VMCS12 that points
> beyond the assigned memory area and initiates a vmlaunch to L2, the
> existing KVM code handles the vmlaunch, and passes the VMCS
> consistency check. However, due to EPTP pointing outside accessible
> memory from the vCPU in mmu_check_root(), it attempts to trigger a
> triple fault.
> 
> As a result, the nested_vmx_triple_fault() and nested_vmx_vmexit() are
> called before the actual vmlaunch to L2 occurs. Despite the vmlaunch
> not actually being executed in L2, KVM attempts a vmexit to L1,
> resulting in a warning in nested_vmx_vmexit() due to the
> nested_run_pending flag.
> 
> This commit resolves the issue by modifying the nested_vmx_check_eptp()
> to return false when EPTP points outside the assigned memory area.
> This effectively prevents a vmlaunch with such an EPTP, thus avoiding
> the unnecessary warning.
> 
> Signed-off-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
> ---
>  arch/x86/kvm/vmx/nested.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index e35cf0bd0df9..721f98a5dc24 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2727,6 +2727,10 @@ static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp)
>  			return false;
>  	}
>  
> +	/* Check if EPTP points to an assigned memory area */
> +	if (!kvm_vcpu_is_visible_gfn(vcpu, new_eptp >> PAGE_SHIFT))
> +		return false;

This is wrong, an EPTP that has a legal GPA but points at non-existent memory
does not trigger VM-Fail.  But the existing behavior isn't correct either
(obviously), because even though a bad EPT will *probably* lead to shutdown,
(a) it's not guaranteed and (b) the CPU won't read the underlying memory until
after VM-Enter succeeds.  E.g. if L1 runs L2 with a VMX preemption timer value
of '0', then architecturally the preemption timer VM-Exit is guaranteed to occur
before the CPU executes any instruction, i.e. before the CPU needs to translate
a GPA to a HPA (so long as there are no injected events with higher priority than
the preemption timer).

Furthermore, KVM applies this logic *only* to the root (and PDPTRs).  E.g. if
a lower level page table points at a KVM-internal memslot, KVM will happily read
the backing memory and use it as the guest PTE value.

The behavior is quite ancient, and unsurprisingly the changelog provides no real
justification for the behavior.

  commit 8986ecc0ef58c96eec48d8502c048f3ab67fd8e2
  Author: Marcelo Tosatti <mtosatti@redhat.com>
  Date:   Tue May 12 18:55:45 2009 -0300

    KVM: x86: check for cr3 validity in mmu_alloc_roots
    
    Verify the cr3 address stored in vcpu->arch.cr3 points to an existant
    memslot. If not, inject a triple fault.

I'm struggling to think of anything that will break (in KVM) if we simply drop the
check, e.g. L1 can already read and write to KVM-internal memslots, so it's not
like the data is sensitive.  The guest is going to have weird behavior, especially
for the APIC access page memslot, but that's more architecturally correct than
injecting a triple fault, e.g. KVM would effectively act like there's a "hidden"
MMIO device at the address.

So I think we should try this:

---
 arch/x86/kvm/mmu/mmu.c   | 19 -------------------
 include/linux/kvm_host.h |  1 -
 virt/kvm/kvm_main.c      | 13 ++-----------
 3 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 60397a1beda3..e305737edf84 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3671,19 +3671,6 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
 
-
-static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
-{
-	int ret = 0;
-
-	if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) {
-		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
-		ret = 1;
-	}
-
-	return ret;
-}
-
 static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 			    u8 level)
 {
@@ -3821,9 +3808,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
 	root_gfn = root_pgd >> PAGE_SHIFT;
 
-	if (mmu_check_root(vcpu, root_gfn))
-		return 1;
-
 	/*
 	 * On SVM, reading PDPTRs might access guest memory, which might fault
 	 * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
@@ -3833,9 +3817,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 			pdptrs[i] = mmu->get_pdptr(vcpu, i);
 			if (!(pdptrs[i] & PT_PRESENT_MASK))
 				continue;
-
-			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
-				return 1;
 		}
 	}
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d90331f16db1..4645c205a4d1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1234,7 +1234,6 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
-bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
 void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 66c1447d3c7f..61ab6e367397 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1720,9 +1720,8 @@ static void kvm_invalidate_memslot(struct kvm *kvm,
 
 	/*
 	 * From this point no new shadow pages pointing to a deleted, or moved,
-	 * memslot will be created.  Validation of sp->gfn happens in:
-	 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
-	 *	- kvm_is_visible_gfn (mmu_check_root)
+	 * memslot will be created.  Arch MMUs must zap all relevant mappings,
+	 * and must not follow the address of an INVALID memslots.
 	 */
 	kvm_arch_flush_shadow_memslot(kvm, old);
 	kvm_arch_guest_memory_reclaimed(kvm);
@@ -2345,14 +2344,6 @@ bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
 
-bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	struct kvm_memory_slot *memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-
-	return kvm_is_visible_memslot(memslot);
-}
-EXPORT_SYMBOL_GPL(kvm_vcpu_is_visible_gfn);
-
 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	struct vm_area_struct *vma;

base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
-- 


  reply	other threads:[~2023-06-28 15:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28  7:12 [PATCH] KVM: nVMX: Prevent vmlaunch with EPTP pointing outside assigned memory area Reima Ishii
2023-06-28 15:37 ` Sean Christopherson [this message]
2023-06-29  7:21   ` Yan Zhao
2023-06-29 17:56     ` Sean Christopherson
2023-06-29 20:30       ` Sean Christopherson
2023-06-30  5:01         ` Yuan Yao
2023-06-30 15:37           ` Sean Christopherson
2023-07-03  2:20             ` Yuan Yao
2023-07-03  9:40         ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJxTTZzZnfbyMVIH@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=ishiir@g.ecc.u-tokyo.ac.jp \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=shina@ecc.u-tokyo.ac.jp \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox