kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Marc Zyngier <maz@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	leksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Peter Xu <peterx@redhat.com>, Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Peter Feiner <pfeiner@google.com>,
	Andrew Jones <drjones@redhat.com>,
	maciej.szmigiero@oracle.com, kvm@vger.kernel.org
Subject: Re: [PATCH 02/23] KVM: x86/mmu: Derive shadow MMU page role from parent
Date: Fri, 4 Mar 2022 00:22:16 +0000	[thread overview]
Message-ID: <YiFbuDaMfNItGwLw@google.com> (raw)
In-Reply-To: <YhBEaPWDoBiTpNV3@google.com>

On Sat, Feb 19, 2022 at 01:14:16AM +0000, Sean Christopherson wrote:
> On Thu, Feb 03, 2022, David Matlack wrote:
> > Instead of computing the shadow page role from scratch for every new
> > page, we can derive most of the information from the parent shadow page.
> > This avoids redundant calculations such as the quadrant, and reduces the
> 
> Uh, calculating quadrant isn't redundant.  The quadrant forces KVM to use different
> (multiple) shadow pages to shadow a single guest PTE when the guest is using 32-bit
> paging (1024 PTEs per page table vs. 512 PTEs per page table).  The reason quadrant
> is "quad" and not more or less is because 32-bit paging has two levels.  First-level
> PTEs can have quadrant=0/1, and that gets doubled for second-level PTEs because we
> need to use four PTEs (two to handle 2x guest PTEs, and each of those needs to be
> unique for the first-level PTEs they point at).

One solution is to keep the quadrant calculation in kvm_mmu_get_page().
The obvious problem for eager page splitting is we need the faulting
address to use the existing calculation to get the quadrant, and there
is no faulting address when doing eager page splitting. This doesn't
really matter though because we really don't care about eagerly
splitting huge pages that are shadowing a 32-bit non-PAE guest, so we
can just skip huge pages mapped on shadow pages with has_4_byte_gpte and
hard-code the quadrant to 0.

Plumbing all that shouldn't be too hard. But it occurs to me it might
not be necessary. The quadrant cannot be literally copied from the
parent SP like this commit does, but I think it can still be derived
from the parent. The upside is we don't need any special casing of
has_4_byte_gpte or hard-coding the quadrant in the eager page splitting
code, and we can still get rid of passing in the faulting address to
kvm_mmu_get_page().

Here's what it would (roughly) look like, applied on top of this commit:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6941b9b99a90..4184662b42bf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2110,9 +2110,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
        return sp;
 }

-static union kvm_mmu_page_role kvm_mmu_child_role(struct kvm_mmu_page *parent_sp,
-                                                 bool direct, u32 access)
+static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
 {
+       struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
        union kvm_mmu_page_role role;

        role = parent_sp->role;
@@ -2120,6 +2120,28 @@ static union kvm_mmu_page_role kvm_mmu_child_role(struct kvm_mmu_page *parent_sp
        role.access = access;
        role.direct = direct;

+       /*
+        * If the guest has 4-byte PTEs then that means it's using 32-bit,
+        * 2-level, non-PAE paging. KVM shadows such guests using 4 PAE page
+        * directories, each mapping 1/4 of the guest's linear address space
+        * (1GiB). The shadow pages for those 4 page directories are
+        * pre-allocated and assigned a separate quadrant in their role.
+        *
+        * Since we are allocating a child shadow page and there are only 2
+        * levels, this must be a PG_LEVEL_4K shadow page. Here the quadrant
+        * will either be 0 or 1 because it maps 1/2 of the address space mapped
+        * by the guest's PG_LEVEL_4K page table (or 4MiB huge page) that it
+        * is shadowing. In this case, the quadrant can be derived by the index
+        * of the SPTE that points to the new child shadow page in the page
+        * directory (parent_sp). Specifically, every 2 SPTEs in parent_sp
+        * shadow one half of a guest's page table (or 4MiB huge page) so the
+        * quadrant is just the parity of the index of the SPTE.
+        */
+       if (role.has_4_byte_gpte) {
+               BUG_ON(role.level != PG_LEVEL_4K);
+               role.quadrant = (sptep - parent_sp->spt) % 2;
+       }
+
        return role;
 }

@@ -2127,11 +2149,9 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
                                                 u64 *sptep, gfn_t gfn,
                                                 bool direct, u32 access)
 {
-       struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
        union kvm_mmu_page_role role;

-       role = kvm_mmu_child_role(parent_sp, direct, access);
-
+       role = kvm_mmu_child_role(sptep, direct, access);
        return kvm_mmu_get_page(vcpu, gfn, role);
 }

> 
> Indeed, this fails spectacularly when attempting to boot a 32-bit non-PAE kernel
> with shadow paging enabled.
> 
>  \���	���\���	��\���
>  	P��\��`
>  BUG: unable to handle page fault for address: ff9fa81c
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  *pde = 00000000
>  ����
>  Oops: 0000 [#1]��<���� SMP��<������<������<����
>  ��<����CPU: 0 PID: 0 Comm: swapper ��<����G        W         5.12.0 #10
>  ��<����EIP: memblock_add_range.isra.18.constprop.23d�r
>  ��<����Code: <83> 79 04 00 75 2c 83 38 01 75 06 83 78 08 00 74 02 0f 0b 89 11 8b
>  ��<����EAX: c2af24bc EBX: fdffffff ECX: ff9fa818 EDX: 02000000
>  ��<����ESI: 02000000 EDI: 00000000 EBP: c2909f30 ESP: c2909f0c
>  ��<����DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210006
>  ��<����CR0: 80050033 CR2: ff9fa81c CR3: 02b76000 CR4: 00040600
>  ��<����Call Trace:
>  ��<���� ? printkd�r
>  ��<���� ��<����memblock_reserved�r
>  ��<���� ? 0xc2000000
>  ��<���� ��<����setup_archd�r
>  ��<���� ? vprintk_defaultd�r
>  ��<���� ? vprintkd�r
>  ��<���� ��<����start_kerneld�r
>  ��<���� ��<����i386_start_kerneld�r
>  ��<���� ��<����startup_32_smpd�r
> 
>  ����
>  CR2: 00000000ff9fa81c
> 
>  ��<����EIP: memblock_add_range.isra.18.constprop.23d�r
>  ��<����Code: <83> 79 04 00 75 2c 83 38 01 75 06 83 78 08 00 74 02 0f 0b 89 11 8b
>  ��<����EAX: c2af24bc EBX: fdffffff ECX: ff9fa818 EDX: 02000000
>  ��<����ESI: 02000000 EDI: 00000000 EBP: c2909f30 ESP: c2909f0c
>  ��<����DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210006
>  ��<����CR0: 80050033 CR2: ff9fa81c CR3: 02b76000 CR4: 00040600
> 
> > number of parameters to kvm_mmu_get_page().
> > 
> > Preemptivel split out the role calculation to a separate function for
> 
> Preemptively.
> 
> > use in a following commit.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---

  parent reply	other threads:[~2022-03-04  0:22 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03  1:00 [PATCH 00/23] Extend Eager Page Splitting to the shadow MMU David Matlack
2022-02-03  1:00 ` [PATCH 01/23] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs David Matlack
2022-02-19  0:57   ` Sean Christopherson
2022-02-03  1:00 ` [PATCH 02/23] KVM: x86/mmu: Derive shadow MMU page role from parent David Matlack
2022-02-19  1:14   ` Sean Christopherson
2022-02-24 18:45     ` David Matlack
2022-03-04  0:22     ` David Matlack [this message]
2022-02-03  1:00 ` [PATCH 03/23] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions David Matlack
2022-02-19  1:25   ` Sean Christopherson
2022-02-24 18:54     ` David Matlack
2022-02-03  1:00 ` [PATCH 04/23] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages David Matlack
2022-02-03  1:00 ` [PATCH 05/23] KVM: x86/mmu: Pass memslot to kvm_mmu_create_sp() David Matlack
2022-02-03  1:00 ` [PATCH 06/23] KVM: x86/mmu: Separate shadow MMU sp allocation from initialization David Matlack
2022-02-16 19:37   ` Ben Gardon
2022-02-16 21:42     ` David Matlack
2022-02-03  1:00 ` [PATCH 07/23] KVM: x86/mmu: Move huge page split sp allocation code to mmu.c David Matlack
2022-02-03  1:00 ` [PATCH 08/23] KVM: x86/mmu: Use common code to free kvm_mmu_page structs David Matlack
2022-02-03  1:00 ` [PATCH 09/23] KVM: x86/mmu: Use common code to allocate kvm_mmu_page structs from vCPU caches David Matlack
2022-02-03  1:00 ` [PATCH 10/23] KVM: x86/mmu: Pass const memslot to rmap_add() David Matlack
2022-02-23 23:25   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 11/23] KVM: x86/mmu: Pass const memslot to kvm_mmu_init_sp() and descendants David Matlack
2022-02-23 23:27   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 12/23] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu David Matlack
2022-02-23 23:30   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 13/23] KVM: x86/mmu: Update page stats in __rmap_add() David Matlack
2022-02-23 23:32   ` Ben Gardon
2022-02-23 23:35     ` Ben Gardon
2022-02-03  1:00 ` [PATCH 14/23] KVM: x86/mmu: Cache the access bits of shadowed translations David Matlack
2022-02-28 20:30   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 15/23] KVM: x86/mmu: Pass access information to make_huge_page_split_spte() David Matlack
2022-02-28 20:32   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 16/23] KVM: x86/mmu: Zap collapsible SPTEs at all levels in the shadow MMU David Matlack
2022-02-28 20:39   ` Ben Gardon
2022-03-03 19:42     ` David Matlack
2022-02-03  1:00 ` [PATCH 17/23] KVM: x86/mmu: Pass bool flush parameter to drop_large_spte() David Matlack
2022-02-28 20:47   ` Ben Gardon
2022-03-03 19:52     ` David Matlack
2022-02-03  1:00 ` [PATCH 18/23] KVM: x86/mmu: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-02-28 21:09   ` Ben Gardon
2022-02-28 23:29     ` David Matlack
2022-02-03  1:00 ` [PATCH 19/23] KVM: Allow for different capacities in kvm_mmu_memory_cache structs David Matlack
2022-02-24 11:28   ` Marc Zyngier
2022-02-24 19:20     ` David Matlack
2022-03-04 21:59       ` David Matlack
2022-03-04 22:24         ` David Matlack
2022-03-05 16:55         ` Marc Zyngier
2022-03-07 23:49           ` David Matlack
2022-03-08  7:42             ` Marc Zyngier
2022-03-09 21:49             ` David Matlack
2022-03-10  8:30               ` Marc Zyngier
2022-02-03  1:00 ` [PATCH 20/23] KVM: Allow GFP flags to be passed when topping up MMU caches David Matlack
2022-02-28 21:12   ` Ben Gardon
2022-02-03  1:00 ` [PATCH 21/23] KVM: x86/mmu: Fully split huge pages that require extra pte_list_desc structs David Matlack
2022-02-28 21:22   ` Ben Gardon
2022-02-28 23:41     ` David Matlack
2022-03-01  0:37       ` Ben Gardon
2022-03-03 19:59         ` David Matlack
2022-02-03  1:00 ` [PATCH 22/23] KVM: x86/mmu: Split huge pages aliased by multiple SPTEs David Matlack
2022-02-03  1:00 ` [PATCH 23/23] KVM: selftests: Map x86_64 guest virtual memory with huge pages David Matlack
2022-03-07  5:21 ` [PATCH 00/23] Extend Eager Page Splitting to the shadow MMU Peter Xu
2022-03-07 23:39   ` David Matlack
2022-03-09  7:31     ` Peter Xu
2022-03-09 23:39       ` David Matlack
2022-03-10  7:03         ` Peter Xu
2022-03-10 19:26           ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiFbuDaMfNItGwLw@google.com \
    --to=dmatlack@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=chenhuacai@kernel.org \
    --cc=drjones@redhat.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=maciej.szmigiero@oracle.com \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).