* [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()
2022-05-13 19:49 [PATCH 0/2] KVM: x86/mmu: nEPT X-only unsync bug fix Sean Christopherson
@ 2022-05-13 19:49 ` Sean Christopherson
2022-05-13 20:54 ` David Matlack
2022-05-13 19:50 ` [PATCH 2/2] KVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic Sean Christopherson
2022-05-20 13:31 ` [PATCH 0/2] KVM: x86/mmu: nEPT X-only unsync bug fix Paolo Bonzini
2 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2022-05-13 19:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm, linux-kernel, David Matlack, Ben Gardon
Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE
that is marked shadow-present but is not-present in the page tables. If
EPT with execute-only support is in use by L1, KVM can create a RWX=0
SPTE can be created for an EPTE if the upper level combined permissions
are R (or RW) and the leaf EPTE is changed from R (or RW) to X. Because
the EPTE is considered present when viewed in isolation, and no reserved
bits are set, FNAME(prefetch_invalid_gpte) will consider the GPTE valid.
Creating a not-present SPTE isn't fatal as the SPTE is "correct" in the
sense that the guest translation is inaccesible (the combined protections
of all levels yield RWX=0), i.e. the guest won't get stuck in an infinite
loop. If EPT A/D bits are disabled, KVM can mistake the SPTE for an
access-tracked SPTE. But again, such confusion isn't fatal as the "saved"
protections are also RWX=0.
Add a WARN in make_spte() to detect creation of SPTEs that will result in
RWX=0 protections, which is the real motivation for fixing ept_sync_page().
Creating a useless SPTE means KVM messed up _something_, even if whatever
goof occurred doesn't manifest as a functional bug.
Fixes: d95c55687e11 ("kvm: mmu: track read permission explicitly for shadow EPT page tables")
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++++++-
arch/x86/kvm/mmu/spte.c | 2 ++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index b025decf610d..d9f98f9ed4a0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1052,7 +1052,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
continue;
- if (gfn != sp->gfns[i]) {
+ /*
+ * Drop the SPTE if the new protections would result in a RWX=0
+ * SPTE or if the gfn is changing. The RWX=0 case only affects
+ * EPT with execute-only support, i.e. EPT without an effective
+ * "present" bit, as all other paging modes will create a
+ * read-only SPTE if pte_access is zero.
+ */
+ if ((!pte_access && !shadow_present_mask) || gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i]);
flush = true;
continue;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 75c9e87d446a..9ad60662beac 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -101,6 +101,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 spte = SPTE_MMU_PRESENT_MASK;
bool wrprot = false;
+ WARN_ON_ONCE(!pte_access && !shadow_present_mask);
+
if (sp->role.ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
else if (kvm_mmu_page_ad_need_write_protect(sp))
--
2.36.0.550.gb090851708-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()
2022-05-13 19:49 ` [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page() Sean Christopherson
@ 2022-05-13 20:54 ` David Matlack
2022-05-14 0:55 ` Sean Christopherson
0 siblings, 1 reply; 9+ messages in thread
From: David Matlack @ 2022-05-13 20:54 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm list, LKML, Ben Gardon
On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE
> that is marked shadow-present but is not-present in the page tables. If
> EPT with execute-only support is in use by L1, KVM can create a RWX=0
> SPTE can be created for an EPTE if the upper level combined permissions
> are R (or RW) and the leaf EPTE is changed from R (or RW) to X.
For some reason I found this sentence hard to read. What about this:
When shadowing EPT and NX HugePages is enabled, if the guest changes
the permissions on a huge page in the EPT12 to be execute-only, KVM
will end shadowing it with an RWX=0 SPTE in the EPT02 when it picks up
the change in FNAME(sync_page). Note that the guest can't induce KVM
to create a RWX=0 during FNAME(fetch), since the only valid way for
the guest to fault in an execute-only huge page is with an instruction
fetch, which KVM will handle by mapping the page as an executable 4KiB
page.
> Because
> the EPTE is considered present when viewed in isolation, and no reserved
> bits are set, FNAME(prefetch_invalid_gpte) will consider the GPTE valid.
>
> Creating a not-present SPTE isn't fatal as the SPTE is "correct" in the
> sense that the guest translation is inaccesible (the combined protections
> of all levels yield RWX=0), i.e. the guest won't get stuck in an infinite
> loop. If EPT A/D bits are disabled, KVM can mistake the SPTE for an
> access-tracked SPTE. But again, such confusion isn't fatal as the "saved"
> protections are also RWX=0.
>
> Add a WARN in make_spte() to detect creation of SPTEs that will result in
> RWX=0 protections, which is the real motivation for fixing ept_sync_page().
> Creating a useless SPTE means KVM messed up _something_, even if whatever
> goof occurred doesn't manifest as a functional bug.
>
> Fixes: d95c55687e11 ("kvm: mmu: track read permission explicitly for shadow EPT page tables")
> Cc: David Matlack <dmatlack@google.com>
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++++++-
> arch/x86/kvm/mmu/spte.c | 2 ++
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index b025decf610d..d9f98f9ed4a0 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -1052,7 +1052,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
> continue;
>
> - if (gfn != sp->gfns[i]) {
> + /*
> + * Drop the SPTE if the new protections would result in a RWX=0
> + * SPTE or if the gfn is changing. The RWX=0 case only affects
> + * EPT with execute-only support, i.e. EPT without an effective
> + * "present" bit, as all other paging modes will create a
> + * read-only SPTE if pte_access is zero.
> + */
> + if ((!pte_access && !shadow_present_mask) || gfn != sp->gfns[i]) {
> drop_spte(vcpu->kvm, &sp->spt[i]);
> flush = true;
> continue;
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 75c9e87d446a..9ad60662beac 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -101,6 +101,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> u64 spte = SPTE_MMU_PRESENT_MASK;
> bool wrprot = false;
>
> + WARN_ON_ONCE(!pte_access && !shadow_present_mask);
> +
> if (sp->role.ad_disabled)
> spte |= SPTE_TDP_AD_DISABLED_MASK;
> else if (kvm_mmu_page_ad_need_write_protect(sp))
> --
> 2.36.0.550.gb090851708-goog
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()
2022-05-13 20:54 ` David Matlack
@ 2022-05-14 0:55 ` Sean Christopherson
2022-05-16 22:22 ` David Matlack
0 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2022-05-14 0:55 UTC (permalink / raw)
To: David Matlack
Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm list, LKML, Ben Gardon
On Fri, May 13, 2022, David Matlack wrote:
> On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE
> > that is marked shadow-present but is not-present in the page tables. If
> > EPT with execute-only support is in use by L1, KVM can create a RWX=0
> > SPTE can be created for an EPTE if the upper level combined permissions
> > are R (or RW) and the leaf EPTE is changed from R (or RW) to X.
>
> For some reason I found this sentence hard to read.
Heh, probably because "KVM can create a RWX=0 SPTE can be created" is nonsensical.
I botched a late edit to the changelog...
> What about this:
>
> When shadowing EPT and NX HugePages is enabled, if the guest changes
This doesn' thave anything to do with NX HugePages, it's an execute-only specific
bug where L1 can create a gPTE that is !READABLE but is considered PRESENT because
it is EXECUTABLE. If the upper level protections are R or RW, the resulting
protections for the entire translation are RWX=0. All of sync_page()'s existing
checks filter out only !PRESENT gPTE, because without execute-only, all upper
levels are guaranteed to be at least READABLE.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()
2022-05-14 0:55 ` Sean Christopherson
@ 2022-05-16 22:22 ` David Matlack
2022-05-17 3:52 ` Sean Christopherson
0 siblings, 1 reply; 9+ messages in thread
From: David Matlack @ 2022-05-16 22:22 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm list, LKML, Ben Gardon
On Fri, May 13, 2022 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, May 13, 2022, David Matlack wrote:
> > On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE
> > > that is marked shadow-present but is not-present in the page tables. If
> > > EPT with execute-only support is in use by L1, KVM can create a RWX=0
> > > SPTE can be created for an EPTE if the upper level combined permissions
> > > are R (or RW) and the leaf EPTE is changed from R (or RW) to X.
> >
> > For some reason I found this sentence hard to read.
>
> Heh, probably because "KVM can create a RWX=0 SPTE can be created" is nonsensical.
> I botched a late edit to the changelog...
>
> > What about this:
> >
> > When shadowing EPT and NX HugePages is enabled, if the guest changes
>
> This doesn' thave anything to do with NX HugePages, it's an execute-only specific
> bug where L1 can create a gPTE that is !READABLE but is considered PRESENT because
> it is EXECUTABLE. If the upper level protections are R or RW, the resulting
> protections for the entire translation are RWX=0. All of sync_page()'s existing
> checks filter out only !PRESENT gPTE, because without execute-only, all upper
> levels are guaranteed to be at least READABLE.
I see what you mean, thanks.
And I also recall now you mentioned (off-list) that the NX HugePage
scenario isn't possible because KVM does not let huge pages go unsync.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page()
2022-05-16 22:22 ` David Matlack
@ 2022-05-17 3:52 ` Sean Christopherson
0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-05-17 3:52 UTC (permalink / raw)
To: David Matlack
Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm list, LKML, Ben Gardon
On Mon, May 16, 2022, David Matlack wrote:
> On Fri, May 13, 2022 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, May 13, 2022, David Matlack wrote:
> > > On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE
> > > > that is marked shadow-present but is not-present in the page tables. If
> > > > EPT with execute-only support is in use by L1, KVM can create a RWX=0
> > > > SPTE can be created for an EPTE if the upper level combined permissions
> > > > are R (or RW) and the leaf EPTE is changed from R (or RW) to X.
> > >
> > > For some reason I found this sentence hard to read.
> >
> > Heh, probably because "KVM can create a RWX=0 SPTE can be created" is nonsensical.
> > I botched a late edit to the changelog...
> >
> > > What about this:
> > >
> > > When shadowing EPT and NX HugePages is enabled, if the guest changes
> >
> > This doesn' thave anything to do with NX HugePages, it's an execute-only specific
> > bug where L1 can create a gPTE that is !READABLE but is considered PRESENT because
> > it is EXECUTABLE. If the upper level protections are R or RW, the resulting
> > protections for the entire translation are RWX=0. All of sync_page()'s existing
> > checks filter out only !PRESENT gPTE, because without execute-only, all upper
> > levels are guaranteed to be at least READABLE.
>
> I see what you mean, thanks.
>
> And I also recall now you mentioned (off-list) that the NX HugePage
> scenario isn't possible because KVM does not let huge pages go unsync.
Yep. The other thing that's semi-relevant and I've mentioned off-list at least
once is that our (Google's) old kernel has a different NX HugePage implementation
that _can_ result in RWX=0 SPTEs. Unlike upstream, the internal NX HugePage
implementation shatters a huge page _after_ installing said huge page, whereas
upstream demotes the huge page before it's installed. If shattering fails on huge
page that L1 created a huge page with just X permissions, KVM is left with a RWX=0
huge page.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] KVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic
2022-05-13 19:49 [PATCH 0/2] KVM: x86/mmu: nEPT X-only unsync bug fix Sean Christopherson
2022-05-13 19:49 ` [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page() Sean Christopherson
@ 2022-05-13 19:50 ` Sean Christopherson
2022-05-13 20:27 ` Jim Mattson
2022-05-20 13:31 ` [PATCH 0/2] KVM: x86/mmu: nEPT X-only unsync bug fix Paolo Bonzini
2 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2022-05-13 19:50 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm, linux-kernel, David Matlack, Ben Gardon
Add a comment to FNAME(sync_page) to explain why the TLB flushing logic
conspiculously doesn't handle the scenario of guest protections being
reduced. Specifically, if synchronizing a SPTE drops execute protections,
KVM will not emit a TLB flush, whereas dropping writable or clearing A/D
bits does trigger a flush via mmu_spte_update(). Architecturally, until
the GPTE is implicitly or explicitly flushed from the guest's perspective,
KVM is not required to flush any old, stale translations.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/paging_tmpl.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d9f98f9ed4a0..d39706e46ad9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1076,6 +1076,15 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
flush |= mmu_spte_update(sptep, spte);
}
+ /*
+ * Note, any flush is purely for KVM's correctness, e.g. when dropping
+ * an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
+ * unmap or dirty logging event doesn't fail to flush. The guest is
+ * responsible for flushing the TLB to ensure any changes in protection
+ * bits are recognized, i.e. until the guest flushes or page faults on
+ * a relevant address, KVM is architecturally allowed to let vCPUs use
+ * cached translations with the old protection bits.
+ */
return flush;
}
--
2.36.0.550.gb090851708-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] KVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic
2022-05-13 19:50 ` [PATCH 2/2] KVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic Sean Christopherson
@ 2022-05-13 20:27 ` Jim Mattson
0 siblings, 0 replies; 9+ messages in thread
From: Jim Mattson @ 2022-05-13 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm,
linux-kernel, David Matlack, Ben Gardon
On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Add a comment to FNAME(sync_page) to explain why the TLB flushing logic
> conspiculously doesn't handle the scenario of guest protections being
> reduced. Specifically, if synchronizing a SPTE drops execute protections,
> KVM will not emit a TLB flush, whereas dropping writable or clearing A/D
> bits does trigger a flush via mmu_spte_update(). Architecturally, until
> the GPTE is implicitly or explicitly flushed from the guest's perspective,
> KVM is not required to flush any old, stale translations.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
Reviewed-by: Jim Mattson <jmattson@google.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] KVM: x86/mmu: nEPT X-only unsync bug fix
2022-05-13 19:49 [PATCH 0/2] KVM: x86/mmu: nEPT X-only unsync bug fix Sean Christopherson
2022-05-13 19:49 ` [PATCH 1/2] KVM: x86/mmu: Drop RWX=0 SPTEs during ept_sync_page() Sean Christopherson
2022-05-13 19:50 ` [PATCH 2/2] KVM: x86/mmu: Comment FNAME(sync_page) to document TLB flushing logic Sean Christopherson
@ 2022-05-20 13:31 ` Paolo Bonzini
2 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2022-05-20 13:31 UTC (permalink / raw)
To: Sean Christopherson
Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
linux-kernel, David Matlack, Ben Gardon
Queued, thanks. Here is the new message for patch 1:
All of sync_page()'s existing checks filter out only !PRESENT gPTE,
because without execute-only, all upper levels are guaranteed to be at
least READABLE. However, if EPT with execute-only support is in use by
L1, KVM can create an SPTE that is shadow-present but guest-inaccessible
(RWX=0) if the upper level combined permissions are R (or RW) and
the leaf EPTE is changed from R (or RW) to X. Because the EPTE is
considered present when viewed in isolation, and no reserved bits are set,
FNAME(prefetch_invalid_gpte) will consider the GPTE valid, and cause a
not-present SPTE to be created.
The SPTE is "correct": the guest translation is inaccessible because
the combined protections of all levels yield RWX=0, and KVM will just
redirect any vmexits to the guest. If EPT A/D bits are disabled, KVM
can mistake the SPTE for an access-tracked SPTE, but again such confusion
isn't fatal, as the "saved" protections are also RWX=0. However,
creating a useless SPTE in general means that KVM messed up something,
even if this particular goof didn't manifest as a functional bug.
So, drop SPTEs whose new protections will yield a RWX=0 SPTE, and
add a WARN in make_spte() to detect creation of SPTEs that will
result in RWX=0 protections.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread