kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Marc Zyngier <maz@kernel.org>, Albert Ou <aou@eecs.berkeley.edu>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS \(KVM/mips\)"
	<kvm@vger.kernel.org>, Huacai Chen <chenhuacai@kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS \(KVM/mips\)"
	<linux-mips@vger.kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	"open list:KERNEL VIRTUAL MACHINE FOR RISC-V \(KVM/riscv\)"
	<kvm-riscv@lists.infradead.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Ben Gardon <bgardon@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	maciej.szmigiero@oracle.com,
	"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 \(KVM/arm64\)"
	<kvmarm@lists.cs.columbia.edu>, Peter Feiner <pfeiner@google.com>
Subject: Re: [PATCH v3 00/23] KVM: Extend Eager Page Splitting to the shadow MMU
Date: Mon, 11 Apr 2022 17:12:40 +0000	[thread overview]
Message-ID: <YlRhiF1O71TWQr5r@google.com> (raw)
In-Reply-To: <20220401175554.1931568-1-dmatlack@google.com>

On Fri, Apr 01, 2022, David Matlack wrote:
> This series extends KVM's Eager Page Splitting to also split huge pages
> mapped by the shadow MMU, i.e. huge pages present in the memslot rmaps.
> This will be useful for configurations that use Nested Virtualization,
> disable the TDP MMU, or disable/lack TDP hardware support.
> 
> For background on Eager Page Splitting, see:
>  - Proposal: https://lore.kernel.org/kvm/CALzav=dV_U4r1K9oDq4esb4mpBQDQ2ROQ5zH5wV3KpOaZrRW-A@mail.gmail.com/
>  - TDP MMU support: https://lore.kernel.org/kvm/20220119230739.2234394-1-dmatlack@google.com/
> 
> Splitting huge pages mapped by the shadow MMU is more complicated than
> the TDP MMU, but it is also more important for performance as the shadow
> MMU handles huge page write-protection faults under the write lock.  See
> the Performance section for more details.
> 
> The extra complexity of splitting huge pages mapped by the shadow MMU
> comes from a few places:

I think we should restrict eager page splitting to the TDP MMU being enabled,
i.e. restrict shadow MMU support to nested MMUs.

A decent chunk of the churn and complexity in this series comes from having to
deal with the intersection of things no one cares about in practice (!TDP shadow
paging), and/or things we should be putting into maintenance-only mode (legacy MMU
with TDP enabled).  I see zero reason to support this for legacy shadow paging
without a very concrete, very performance sensitive use case, and the legacy MMU
with TDP should be a hard "no".

With those out of the way, unsync support can also be jettisoned, because barring
use cases I don't know about, hypervisors don't modify TDP entries in the same way
that kernels modify native page tables, i.e. don't benefit from allowing SPTEs to
go unsync.

The other feature that I think we should deprecate (which I'm pretty sure someone on
our team, maybe even you, is planning on proposing upstream) is support for zapping
KVM shadow pages for the shrinker.  In hindsight, we should have done that a few
years ago instead of fixing the bug that made KVM's support meaningful (see commit
ebdb292dac79 ("KVM: x86/mmu: Batch zap MMU pages when shrinking the slab").  Doing
that for nested MMUs only (or at least first) should be less controversial.

The other thing we want to do sooner than later is improve the scalability of the
nested MMU.  A relatively simple way to pick some juicy low hanging fruit, if we
drop the aforementioned features we don't actually need for nested MMUs, would be
to turn all of the tracking structures needed for handling a page fault into
per-root lists/structures, e.g. active_mmu_pages and mmu_page_hash.  Unless L1 is
doing something funky, there is unlikely to be overlap between nested TDP page
tables, i.e. per-root tracking shouldn't cause a memory explosion.

At that point, as a first step/stopgap toward a more scalable nested MMU implementation,
nested TDP page faults, zapping of obsolete pages (memslot updates), and eager page
splitting (I think) can take mmu_lock for read and then take a per-root spinlock.

At a bare minimum, taking mmu_lock for read would prevent a nested vCPU from blocking
the TDP MMU, which in itself should be a big win.  Zapping after a memslot updates
would not interfere at all with re-faulting memory since zapping the obsolete roots
would never get a lock conflict.  And for use cases that spin up a large number of small
L2 VMs, per-root locking will allow KVM to handle page faults for each L2 in parallel,
which could be a huge performance boost for select use cases.

Circling back to eager page splitting, this series could be reworked to take the
first step of forking FNAME(page_fault), FNAME(fetch) and kvm_mmu_get_page() in
order to provide the necessary path for reworking nested MMU page faults.  Then it
can remove unsync and shrinker support for nested MMUs.  With those gone,
dissecting the nested MMU variant of kvm_mmu_get_page() should be simpler/cleaner
than dealing with the existing kvm_mmu_get_page(), i.e. should eliminate at least
some of the complexity/churn.

> Performance
> -----------
> 
> To measure the performance impact of Eager Page Splitting I ran
> dirty_log_perf_test with tdp_mmu=N, various virtual CPU counts, 1GiB per
> vCPU, and backed by 1GiB HugeTLB memory. The amount of memory that was
> written to versus read was controlled with the -f option.
> 
> To measure the imapct of customer performance, we can look at the time
> it takes all vCPUs to dirty memory after dirty logging has been enabled.
> Without Eager Page Splitting enabled, such dirtying must take faults to
> split huge pages and bottleneck on the MMU lock.
> 
>              | Config: ept=Y, tdp_mmu=N, 100% writes                   |
>              | Config: ept=Y, tdp_mmu=N, 100% writes                   |
>              | Config: ept=Y, tdp_mmu=N, 100% writes initially-all-set |
>              | Config: ept=Y, tdp_mmu=N, 100% writes initially-all-set |
>              | Config: ept=N, tdp_mmu=Y, 100% writes                   |
>              | Config: ept=N, tdp_mmu=Y, 50% writes                    |
>              | Config: ept=N, tdp_mmu=Y, 5% writes                     |

IMO, to justify this there needs to be performance numbers for ept=Y, tdp_mmu=Y,
i.e. for the use case we actually care about.  I don't expect the outcome to be
any different, but it really should be explicitly tested.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  parent reply	other threads:[~2022-04-11 17:12 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 17:55 [PATCH v3 00/23] KVM: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-04-01 17:55 ` [PATCH v3 01/23] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs David Matlack
2022-04-01 17:55 ` [PATCH v3 02/23] KVM: x86/mmu: Use a bool for direct David Matlack
2022-04-08 22:24   ` Sean Christopherson
2022-04-01 17:55 ` [PATCH v3 03/23] KVM: x86/mmu: Derive shadow MMU page role from parent David Matlack
2022-04-01 17:55 ` [PATCH v3 04/23] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions David Matlack
2022-04-01 17:55 ` [PATCH v3 05/23] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages David Matlack
2022-04-01 17:55 ` [PATCH v3 06/23] KVM: x86/mmu: Pass memslot to kvm_mmu_new_shadow_page() David Matlack
2022-04-01 17:55 ` [PATCH v3 07/23] KVM: x86/mmu: Separate shadow MMU sp allocation from initialization David Matlack
2022-04-01 17:55 ` [PATCH v3 08/23] KVM: x86/mmu: Link spt to sp during allocation David Matlack
2022-04-01 17:55 ` [PATCH v3 09/23] KVM: x86/mmu: Move huge page split sp allocation code to mmu.c David Matlack
2022-04-01 17:55 ` [PATCH v3 10/23] KVM: x86/mmu: Use common code to free kvm_mmu_page structs David Matlack
2022-04-01 17:55 ` [PATCH v3 11/23] KVM: x86/mmu: Use common code to allocate shadow pages from vCPU caches David Matlack
2022-04-01 17:55 ` [PATCH v3 12/23] KVM: x86/mmu: Pass const memslot to rmap_add() David Matlack
2022-04-01 17:55 ` [PATCH v3 13/23] KVM: x86/mmu: Pass const memslot to init_shadow_page() and descendants David Matlack
2022-04-01 17:55 ` [PATCH v3 14/23] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu David Matlack
2022-04-01 17:55 ` [PATCH v3 15/23] KVM: x86/mmu: Update page stats in __rmap_add() David Matlack
2022-04-01 17:55 ` [PATCH v3 16/23] KVM: x86/mmu: Cache the access bits of shadowed translations David Matlack
2022-04-02  6:19   ` kernel test robot
2022-04-02  7:01   ` kernel test robot
2022-04-09  0:02   ` Sean Christopherson
2022-04-14 16:47     ` David Matlack
2022-04-01 17:55 ` [PATCH v3 17/23] KVM: x86/mmu: Extend make_huge_page_split_spte() for the shadow MMU David Matlack
2022-04-01 17:55 ` [PATCH v3 18/23] KVM: x86/mmu: Zap collapsible SPTEs at all levels in " David Matlack
2022-04-01 17:55 ` [PATCH v3 19/23] KVM: x86/mmu: Refactor drop_large_spte() David Matlack
2022-04-01 17:55 ` [PATCH v3 20/23] KVM: Allow for different capacities in kvm_mmu_memory_cache structs David Matlack
2022-04-20 10:55   ` Anup Patel
2022-04-21 16:19   ` Ben Gardon
2022-04-21 16:33     ` David Matlack
2022-04-01 17:55 ` [PATCH v3 21/23] KVM: Allow GFP flags to be passed when topping up MMU caches David Matlack
2022-04-01 17:55 ` [PATCH v3 22/23] KVM: x86/mmu: Support Eager Page Splitting in the shadow MMU David Matlack
2022-04-09  0:39   ` Sean Christopherson
2022-04-14 16:50     ` David Matlack
2022-04-01 17:55 ` [PATCH v3 23/23] KVM: selftests: Map x86_64 guest virtual memory with huge pages David Matlack
2022-04-11 17:12 ` Sean Christopherson [this message]
2022-04-11 17:54   ` [PATCH v3 00/23] KVM: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-04-11 20:12     ` Sean Christopherson
2022-04-11 23:41       ` David Matlack
2022-04-12  0:39         ` Sean Christopherson
2022-04-12 16:49           ` David Matlack
2022-04-13  1:02             ` Sean Christopherson
2022-04-13 17:57               ` David Matlack
2022-04-13 18:28                 ` Sean Christopherson
2022-04-13 21:22                   ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YlRhiF1O71TWQr5r@google.com \
    --to=seanjc@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=bgardon@google.com \
    --cc=chenhuacai@kernel.org \
    --cc=dmatlack@google.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-mips@vger.kernel.org \
    --cc=maciej.szmigiero@oracle.com \
    --cc=maz@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=pfeiner@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).