Re: [RFC PATCH 12/15] KVM: x86/mmu: Split large pages when dirty logging is enabled

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Peter Xu <peterx@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, Ben Gardon <bgardon@google.com>,
	Joerg Roedel <joro@8bytes.org>, Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Janis Schoetterl-Glausch <scgl@linux.vnet.ibm.com>,
	Junaid Shahid <junaids@google.com>,
	Oliver Upton <oupton@google.com>,
	Harish Barathvajasankar <hbarath@google.com>,
	Peter Shier <pshier@google.com>
Subject: Re: [RFC PATCH 12/15] KVM: x86/mmu: Split large pages when dirty logging is enabled
Date: Thu, 2 Dec 2021 18:42:54 +0000	[thread overview]
Message-ID: <YakTrkA6xzD5dzyN@google.com> (raw)
In-Reply-To: <CALzav=dDEhU3uN9CofYQqCukT3QJUm+pjRz2WTr-Ss9TNVBgLg@mail.gmail.com>

On Thu, Dec 02, 2021, David Matlack wrote:
> On Wed, Dec 1, 2021 at 3:37 PM Sean Christopherson <seanjc@google.com> wrote:
> > It's not just the extra objects, it's the overall complexity that bothers me.
> > Complexity isn't really the correct word, it's more that as written, the logic
> > is spread over several files and is disingenuous from the perspective that the
> > split_cache is in kvm->arch, which implies persistence, but the cache are
> > completely torn down after evey memslot split.
> >
> > I suspect part of the problem is that the code is trying to plan for a future
> > where nested MMUs also support splitting large pages.  Usually I'm all for that
> > sort of thing, but in this case it creates a lot of APIs that should not exist,
> > either because the function is not needed at all, or because it's a helper buried
> > in tdp_mmu.c.  E.g. assert_split_caches_invariants() is overkill.
> >
> > That's solvable by refactoring and shuffling code, but using kvm_mmu_memory_cache
> > still feels wrong.  The caches don't fully solve the might_sleep() problem since
> > the loop still has to drop mmu_lock purely because it needs to allocate memory,
> 
> I thought dropping the lock to allocate memory was a good thing. It
> reduces the length of time we hold the RCU read lock and mmu_lock in
> read mode. Plus it avoids the retry-with-reclaim and lets us reuse the
> existing sp allocation code.

It's not a simple reuse though, e.g. it needs new logic to detect when the caches
are empty, requires a variant of tdp_mmu_iter_cond_resched(), needs its own instance
of caches and thus initialization/destruction of the caches, etc...

> Eager page splitting itself does not need to be that performant since
> it's not on the critical path of vCPU execution. But holding the MMU
> lock can negatively affect vCPU performance.
> 
> But your preference is to allocate without dropping the lock when possible. Why?

Because they're two different things.  Lock contention is already handled by
tdp_mmu_iter_cond_resched().  If mmu_lock is not contended, holding it for a long
duration is a complete non-issue.

Dropping mmu_lock means restarting the walk at the root because a different task
may have zapped/changed upper level entries.  If every allocation is dropping
mmu_lock, that adds up to a lot of extra memory accesses, especially when using
5-level paging.

Batching allocations via mmu_caches mostly works around that problem, but IMO
it's more complex overall than the retry-on-failure approach because it bleeds
core details into several locations, e.g. the split logic needs to know intimate
details of kvm_mmu_memory_cache, and we end up with two (or one complex) versions
of tdp_mmu_iter_cond_resched().

In general, I also dislike relying on magic numbers (the capacity of the cache)
for performance.  At best, we have to justify the magic number, now and in the
future.  At worst, someone will have a use case that doesn't play nice with KVM's
chosen magic number and then we have to do more tuning, e.g. see the PTE prefetch
stuff where the magic number of '8' (well, 7) ran out of gas for modern usage.
I don't actually think tuning will be problematic for this case, but I'd rather
avoid the discussion entirely if possible.

I'm not completely opposed to using kvm_mmu_memory_cache to batch allocations,
but I think we should do so if and only if batching has measurably better
performance for things we care about.  E.g. if eager splitting takes n% longer
under heavy memory pressure, but vCPUs aren't impacted, do we care?

next prev parent reply	other threads:[~2021-12-02 18:43 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19 23:57 [RFC PATCH 00/15] KVM: x86/mmu: Eager Page Splitting for the TDP MMU David Matlack
2021-11-19 23:57 ` [RFC PATCH 01/15] KVM: x86/mmu: Rename rmap_write_protect to kvm_vcpu_write_protect_gfn David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-26 12:18   ` Peter Xu
2021-11-19 23:57 ` [RFC PATCH 02/15] KVM: x86/mmu: Rename __rmap_write_protect to rmap_write_protect David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-26 12:18   ` Peter Xu
2021-11-19 23:57 ` [RFC PATCH 03/15] KVM: x86/mmu: Automatically update iter->old_spte if cmpxchg fails David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-30 23:25     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 04/15] KVM: x86/mmu: Factor out logic to atomically install a new page table David Matlack
2021-11-22 18:52   ` Ben Gardon
2021-11-30 23:27     ` David Matlack
2021-12-01 19:13   ` Sean Christopherson
2021-12-01 21:52     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 05/15] KVM: x86/mmu: Abstract mmu caches out to a separate struct David Matlack
2021-11-22 18:55   ` Ben Gardon
2021-11-22 18:55     ` Ben Gardon
2021-11-30 23:28     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 06/15] KVM: x86/mmu: Derive page role from parent David Matlack
2021-11-20 12:53   ` Paolo Bonzini
2021-11-27  2:07     ` Lai Jiangshan
2021-11-27 10:26       ` Paolo Bonzini
2021-11-30 23:31     ` David Matlack
2021-12-01  0:45       ` Sean Christopherson
2021-12-01 21:56         ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 07/15] KVM: x86/mmu: Pass in vcpu->arch.mmu_caches instead of vcpu David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 08/15] KVM: x86/mmu: Helper method to check for large and present sptes David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-12-01 18:34   ` Sean Christopherson
2021-12-01 21:13     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 09/15] KVM: x86/mmu: Move restore_acc_track_spte to spte.c David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 10/15] KVM: x86/mmu: Abstract need_resched logic from tdp_mmu_iter_cond_resched David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 11/15] KVM: x86/mmu: Refactor tdp_mmu iterators to take kvm_mmu_page root David Matlack
2021-11-22 18:56   ` Ben Gardon
2021-11-19 23:57 ` [RFC PATCH 12/15] KVM: x86/mmu: Split large pages when dirty logging is enabled David Matlack
2021-11-22  5:05   ` Nikunj A. Dadhania
2021-11-30 23:33     ` David Matlack
2021-11-22 19:30   ` Ben Gardon
2021-11-30 23:44     ` David Matlack
2021-11-26 12:01   ` Peter Xu
2021-11-30 23:56     ` David Matlack
2021-12-01  1:00       ` Sean Christopherson
2021-12-01  1:29         ` David Matlack
2021-12-01  2:29           ` Peter Xu
2021-12-01 18:29             ` Sean Christopherson
2021-12-01 21:36               ` David Matlack
2021-12-01 23:37                 ` Sean Christopherson
2021-12-02 17:41                   ` David Matlack
2021-12-02 18:42                     ` Sean Christopherson [this message]
2021-12-03  0:00                       ` David Matlack
2021-12-03  1:07                         ` Sean Christopherson
2021-12-03 17:22                           ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 13/15] KVM: x86/mmu: Split large pages during CLEAR_DIRTY_LOG David Matlack
2021-11-26 12:17   ` Peter Xu
2021-12-01  0:16     ` David Matlack
2021-12-01  0:17       ` David Matlack
2021-12-01  4:03         ` Peter Xu
2021-12-01 22:14           ` David Matlack
2021-12-03  4:57             ` Peter Xu
2021-12-01 19:22   ` Sean Christopherson
2021-12-01 19:49     ` Ben Gardon
2021-12-01 20:16       ` Sean Christopherson
2021-12-01 22:11         ` Ben Gardon
2021-12-01 22:17     ` David Matlack
2021-11-19 23:57 ` [RFC PATCH 14/15] KVM: x86/mmu: Add tracepoint for splitting large pages David Matlack
2021-11-19 23:57 ` [RFC PATCH 15/15] KVM: x86/mmu: Update page stats when " David Matlack
2021-12-01 19:36   ` Sean Christopherson
2021-12-01 21:11     ` David Matlack
2021-11-26 14:13 ` [RFC PATCH 00/15] KVM: x86/mmu: Eager Page Splitting for the TDP MMU Peter Xu
2021-11-30 23:22   ` David Matlack
2021-12-01  4:10     ` Peter Xu
2021-12-01  4:19       ` Peter Xu
2021-12-01 21:46       ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YakTrkA6xzD5dzyN@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=hbarath@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pshier@google.com \
    --cc=scgl@linux.vnet.ibm.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).