public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mingwei Zhang <mizhang@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: fix potential races when walking host page table
Date: Fri, 29 Apr 2022 15:47:23 +0000	[thread overview]
Message-ID: <YmwIi3bXr/1yhYV/@google.com> (raw)
In-Reply-To: <67222fe0-7bf0-ec7a-0791-a4d48391a15e@redhat.com>

On Fri, Apr 29, 2022, Paolo Bonzini wrote:
> On 4/29/22 16:35, Sean Christopherson wrote:
> > On Fri, Apr 29, 2022, Paolo Bonzini wrote:
> > > > +out:
> > > > +	local_irq_restore(flags);
> > > > +	return level;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(kvm_lookup_address_level_in_mm);
> > > 
> > > Exporting is not needed.
> > > 
> > > Thanks for writing the walk code though.  I'll adapt it and integrate the
> > > patch.
> > 
> > But why are we fixing this only in KVM?  I liked the idea of stealing perf's
> > implementation because it was a seemlingly perfect fit and wouldn't introduce
> > new code (ignoring wrappers, etc...).
> > 
> > We _know_ that at least one subsystem is misusing lookup_address_in_pgd() and
> > given that its wrappers are exported, I highly doubt KVM is the only offender.
> > It really feels like we're passing the buck here by burying the fix in KVM.
> 
> There are two ways to do it:
> 
> * having a generic function in mm/.  The main issue there is the lack of a
> PG_LEVEL_{P4D,PUD,PMD,PTE} enum at the mm/ level.  We could use (ctz(x) -
> 12) / 9 to go from size to level, but it's ugly and there could be
> architectures with heterogeneous page table sizes.
> 
> * having a generic function in arch/x86/.  In this case KVM seems to be the
> odd one that doesn't need the PTE.  For example vc_slow_virt_to_phys needs
> the PTE, and needs the size rather than the "level" per se.
> 
> So for now I punted, while keeping open the door for moving code from
> arch/x86/kvm/ to mm/ if anyone else (even other KVM ports) need the same
> logic.

Ugh.  I was going to say that KVM is the only in-tree user that's subtly broken,
but then I saw vc_slow_virt_to_phys()...  So there's at least one other use case
for walking user addresses and being able to tolerate a not-present mapping.

There are no other users of lookup_address_in_mm(), and other than SEV-ES's
dastardly use of lookup_address_in_pgd(), pgd can only ever come from init_mm or
efi_mm, i.e. can't work with user address anyways.

If we go the KVM-only route, can send the below revert along with it?  The least
we can do is not give others an easy way to screw up.

Until I saw the #VC crud, I was hoping we could also explicitly prevent using
lookup_address_in_pgd() with user addresses.  If/when #VC is fixed, we can/should
add this:

@@ -592,6 +592,15 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,

 	*level = PG_LEVEL_NONE;

+	/*
+	 * The below walk does not guard against user page tables being torn
+	 * down, attempting to walk a user address is dangerous and likely to
+	 * explode sooner or later.  This helper is intended only for use with
+	 * kernel-only mm_structs, e.g. init_mm and efi_mm.
+	 */
+	if (WARN_ON_ONCE(address < TASK_SIZE_MAX))
+		return NULL;
+


From: Sean Christopherson <seanjc@google.com>
Date: Fri, 29 Apr 2022 07:57:53 -0700
Subject: [PATCH] Revert "x86/mm: Introduce lookup_address_in_mm()"

Drop lookup_address_in_mm() now that KVM is providing it's own variant
of lookup_address_in_pgd() that is safe for use with user addresses, e.g.
guards against page tables being torn down.  A variant that provides a
non-init mm is inherently dangerous and flawed, as the only reason to use
an mm other than init_mm is to walk a userspace mapping, and
lookup_address_in_pgd() does not play nice with userspace mappings, e.g.
doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between
checking the PAGE_SIZE bit and grabbing the address of the next level
down.

This reverts commit 13c72c060f1ba6f4eddd7b1c4f52a8aded43d6d9.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/pgtable_types.h |  4 ----
 arch/x86/mm/pat/set_memory.c         | 11 -----------
 2 files changed, 15 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 40497a9020c6..407084d9fd99 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -559,10 +559,6 @@ static inline void update_page_count(int level, unsigned long pages) { }
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
 				    unsigned int *level);
-
-struct mm_struct;
-extern pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
-				   unsigned int *level);
 extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
 extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index abf5ed76e4b7..0656db33574d 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -638,17 +638,6 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)
 }
 EXPORT_SYMBOL_GPL(lookup_address);

-/*
- * Lookup the page table entry for a virtual address in a given mm. Return a
- * pointer to the entry and the level of the mapping.
- */
-pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
-			    unsigned int *level)
-{
-	return lookup_address_in_pgd(pgd_offset(mm, address), address, level);
-}
-EXPORT_SYMBOL_GPL(lookup_address_in_mm);
-
 static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
 				  unsigned int *level)
 {

base-commit: 6f363ed2fa4c24c400acc29b659c96e4dc7930e8
--




  reply	other threads:[~2022-04-29 15:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29  3:17 [PATCH] KVM: x86/mmu: fix potential races when walking host page table Mingwei Zhang
2022-04-29  9:10 ` Paolo Bonzini
2022-04-29 14:35   ` Sean Christopherson
2022-04-29 14:45     ` Paolo Bonzini
2022-04-29 15:47       ` Sean Christopherson [this message]
2022-04-29 15:33   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YmwIi3bXr/1yhYV/@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox