From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Lai Jiangshan <laijs@linux.alibaba.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Ovidiu Panait <ovidiu.panait@windriver.com>
Subject: [PATCH 5.4 12/27] KVM: X86: MMU: Use the correct inherited permissions to get shadow page
Date: Fri, 13 Aug 2021 17:07:10 +0200 [thread overview]
Message-ID: <20210813150523.765921441@linuxfoundation.org> (raw)
In-Reply-To: <20210813150523.364549385@linuxfoundation.org>
From: Lai Jiangshan <laijs@linux.alibaba.com>
commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream.
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
- apply documentation changes to Documentation/virt/kvm/mmu.txt
- adjusted context in arch/x86/kvm/paging_tmpl.h]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Documentation/virt/kvm/mmu.txt | 4 ++--
arch/x86/kvm/paging_tmpl.h | 14 +++++++++-----
2 files changed, 11 insertions(+), 7 deletions(-)
--- a/Documentation/virt/kvm/mmu.txt
+++ b/Documentation/virt/kvm/mmu.txt
@@ -152,8 +152,8 @@ Shadow pages contain the following infor
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -406,13 +406,15 @@ retry_walk:
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -451,7 +453,8 @@ retry_walk:
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -620,7 +623,7 @@ static int FNAME(fetch)(struct kvm_vcpu
{
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, ret;
gfn_t gfn, base_gfn;
@@ -652,6 +655,7 @@ static int FNAME(fetch)(struct kvm_vcpu
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
next prev parent reply other threads:[~2021-08-13 15:14 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-13 15:06 [PATCH 5.4 00/27] 5.4.141-rc1 review Greg Kroah-Hartman
2021-08-13 15:06 ` [PATCH 5.4 01/27] KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 02/27] tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 03/27] media: v4l2-mem2mem: always consider OUTPUT queue during poll Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 04/27] tracing: Reject string operand in the histogram expression Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 05/27] usb: dwc3: Stop active transfers before halting the controller Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 06/27] usb: dwc3: gadget: Allow runtime suspend if UDC unbinded Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 07/27] usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 08/27] usb: dwc3: gadget: Prevent EP queuing while stopping transfers Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 09/27] usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 10/27] usb: dwc3: gadget: Disable gadget IRQ during pullup disable Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 11/27] usb: dwc3: gadget: Avoid runtime resume if disabling pullup Greg Kroah-Hartman
2021-08-13 15:07 ` Greg Kroah-Hartman [this message]
2021-08-13 15:07 ` [PATCH 5.4 13/27] USB:ehci:fix Kunpeng920 ehci hardware problem Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 14/27] ALSA: hda: Add quirk for ASUS Flow x13 Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 15/27] ppp: Fix generating ppp unit id when ifname is not specified Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 16/27] ovl: prevent private clone if bind mount is not allowed Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 17/27] btrfs: make qgroup_free_reserved_data take btrfs_inode Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 18/27] btrfs: make btrfs_qgroup_reserve_data " Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 19/27] btrfs: qgroup: allow to unreserve range without releasing other ranges Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 20/27] btrfs: qgroup: try to flush qgroup space when we get -EDQUOT Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 21/27] btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 22/27] btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 23/27] btrfs: fix lockdep splat when enabling and disabling qgroups Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 24/27] net: xilinx_emaclite: Do not print real IOMEM pointer Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 25/27] btrfs: qgroup: dont commit transaction when we already hold the handle Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 26/27] btrfs: export and rename qgroup_reserve_meta Greg Kroah-Hartman
2021-08-13 15:07 ` [PATCH 5.4 27/27] btrfs: dont flush from btrfs_delayed_inode_reserve_metadata Greg Kroah-Hartman
2021-08-13 23:24 ` [PATCH 5.4 00/27] 5.4.141-rc1 review Shuah Khan
2021-08-14 11:11 ` Sudip Mukherjee
2021-08-14 11:39 ` Naresh Kamboju
2021-08-14 18:15 ` Guenter Roeck
2021-08-16 3:02 ` Samuel Zou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210813150523.765921441@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=laijs@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ovidiu.panait@windriver.com \
--cc=pbonzini@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox