public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Sean Christopherson <seanjc@google.com>,
	Alexander Bulekov <bkov@amazon.com>,
	Fred Griffoul <fgriffo@amazon.co.uk>,
	stable@vger.kernel.org
Subject: stable backports for "KVM: x86: Fix shadow paging use-after-free due to unexpected GFN"
Date: Tue, 5 May 2026 12:13:34 +0200	[thread overview]
Message-ID: <62bedd23-a9d8-4c05-bf39-662c2d37b793@redhat.com> (raw)
In-Reply-To: <20260503201029.106481-1-pbonzini@redhat.com>

I have started sending out backports for stable kernels up to 6.1.  For 
5.10 and 5.15 I have identified the required patches and backported 
them, but I haven't tested them yet.

I'll get to testing and sending them out, but it will take a while; if 
anybody wants to help testing, I can provide my tentative patches.

This is the list for 5.15:

     27a59d57f073 KVM: x86/mmu: Use a bool for direct
     86938ab6925b KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root()
     2e65e842c57d KVM: x86/mmu: Derive shadow MMU page role from parent
     7f49777550e5 KVM: x86/mmu: Always pass 0 for @quadrant when gptes
                  are 8 bytes
     0cd8dc739833 KVM: x86/mmu: pull call to drop_large_spte() into
                  __link_shadow_page()
     0cb2af2ea66a KVM: x86: Fix shadow paging use-after-free due to
                  unexpected GFN


and the longer one for 5.10:

     b37233c911cb KVM: x86/mmu: Capture 'mmu' in a local variable when
                  allocating roots
     ba0a194ffbfb KVM: x86/mmu: Allocate the lm_root before allocating
                  PAE roots
     748e52b9b736 KVM: x86/mmu: Allocate pae_root and lm_root pages in
                  dedicated helper
     6e6ec5848574 KVM: x86/mmu: Ensure MMU pages are available when
                  allocating roots
     27a59d57f073 KVM: x86/mmu: Use a bool for direct
     86938ab6925b KVM: x86/mmu: Stop passing "direct" to mmu_alloc_root()
     03fffc5493c8 KVM: x86/mmu: Refactor shadow walk in __direct_map() to
                  reduce indentation
     f81602958c11 KVM: X86: Fix missed remote tlb flush in
                  rmap_write_protect()
     65855ed8b034 KVM: X86: Synchronize the shadow pagetable before link
                  it
     2e65e842c57d KVM: x86/mmu: Derive shadow MMU page role from parent
     7f49777550e5 KVM: x86/mmu: Always pass 0 for @quadrant when gptes
                  are 8 bytes
     6e0918aec49a KVM: x86/mmu: Check PDPTRs before allocating PAE roots
     0cd8dc739833 KVM: x86/mmu: pull call to drop_large_spte() into
                  __link_shadow_page()
     0cb2af2ea66a KVM: x86: Fix shadow paging use-after-free due to
                  unexpected GFN

Paolo

On Sun, May 3, 2026 at 10:10 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus
> the SPTE index. This assumption breaks for shadow paging if the guest
> page tables are modified between VM entries (similar to commit
> aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even
> when creating an MMIO SPTE", 2026-03-27).  The flow is as follows:
>
> - a PDE is installed for a 2MB mapping, and a page in that area is
>   accessed.  KVM creates a kvm_mmu_page consisting of 512 4KB pages;
>   the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because
>   the guest's mapping is a huge page (and thus contiguous).
>
> - the PDE mapping is changed from outside the guest.
>
> - the guest accesses another page in the same 2MB area.  KVM installs
>   a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN
>   (i.e. based on the new mapping, as changed in the previous step) but
>   that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore
>   the rmap entry cannot be found and removed when the kvm_mmu_page
>   is zapped.
>
> - the memslot that covers the first 2MB mapping is deleted, and the
>   kvm_mmu_page for the now-invalid GPA is zapped.  However, rmap_remove()
>   only looks at the [sp->gfn, sp->gfn + 511] range established in step 1,
>   and fails to find the rmap entry that was recorded by step 3.
>
> - any operation that causes an rmap walk for the same page accessed
>   by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_age.
>   This includes dirty logging or MMU notifier invalidations (e.g., from
>   MADV_DONTNEED).
>
> The underlying issue is that KVM's walking of shadow PTEs assumes that
> if a SPTE is present when KVM wants to install a non-leaf SPTE, then the
> existing kvm_mmu_page must be for the correct gfn.  Because the only way
> for the gfn to be wrong is if KVM messed up and failed to zap a SPTE...
> which shouldn't happen, but *actually* only happens in response to a
> guest write.
>
> That bug dates back literally forever, as even the first version of KVM
> assumes that the GFN matches and walks into the "wrong" shadow page.
> However, that was only an imprecision until 2032a93d66fa ("KVM: MMU:
> Don't allocate gfns page for direct mmu pages") came along.
>
> Fix it by checking for a target gfn mismatch and zapping the existing
> SPTE.  That way the old SP and rmap entries are gone, KVM installs
> the rmap in the right location, and everyone is happy.
>
> Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages")
> Fixes: 6aa8b732ca01 ("kvm: userspace interface")
> Reported-by: Alexander Bulekov <bkov@amazon.com>
> Reported-by: Fred Griffoul <fgriffo@amazon.co.uk>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++---------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 24fbc9ea502a..892246204435 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -182,6 +182,8 @@ static struct kmem_cache *pte_list_desc_cache;
>  struct kmem_cache *mmu_page_header_cache;
>
>  static void mmu_spte_set(u64 *sptep, u64 spte);
> +static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
> +                           u64 *spte, struct list_head *invalid_list);
>
>  struct kvm_mmu_role_regs {
>         const unsigned long cr0;
> @@ -1287,19 +1289,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
>                 rmap_remove(kvm, sptep);
>  }
>
> -static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
> -{
> -       struct kvm_mmu_page *sp;
> -
> -       sp = sptep_to_sp(sptep);
> -       WARN_ON_ONCE(sp->role.level == PG_LEVEL_4K);
> -
> -       drop_spte(kvm, sptep);
> -
> -       if (flush)
> -               kvm_flush_remote_tlbs_sptep(kvm, sptep);
> -}
> -
>  /*
>   * Write-protect on the specified @sptep, @pt_protect indicates whether
>   * spte write-protection is caused by protecting shadow page table.
> @@ -2466,7 +2455,8 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
>  {
>         union kvm_mmu_page_role role;
>
> -       if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
> +       if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep) &&
> +           spte_to_child_sp(*sptep) && spte_to_child_sp(*sptep)->gfn == gfn)
>                 return ERR_PTR(-EEXIST);
>
>         role = kvm_mmu_child_role(sptep, direct, access);
> @@ -2544,13 +2534,16 @@ static void __link_shadow_page(struct kvm *kvm,
>
>         BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
>
> -       /*
> -        * If an SPTE is present already, it must be a leaf and therefore
> -        * a large one.  Drop it, and flush the TLB if needed, before
> -        * installing sp.
> -        */
> -       if (is_shadow_present_pte(*sptep))
> -               drop_large_spte(kvm, sptep, flush);
> +       if (is_shadow_present_pte(*sptep)) {
> +               struct kvm_mmu_page *parent_sp;
> +               LIST_HEAD(invalid_list);
> +
> +               parent_sp = sptep_to_sp(sptep);
> +               WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K);
> +
> +               mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list);
> +               kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true);
> +       }
>
>         spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
>
> --
> 2.54.0


      reply	other threads:[~2026-05-05 10:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-03 20:10 [PATCH] KVM: x86: Fix shadow paging use-after-free due to unexpected GFN Paolo Bonzini
2026-05-05 10:13 ` Paolo Bonzini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62bedd23-a9d8-4c05-bf39-662c2d37b793@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=bkov@amazon.com \
    --cc=fgriffo@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox