From: Sean Christopherson <seanjc@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] KVM: x86/mmu: Shrink pte_list_desc size when KVM is using TDP
Date: Tue, 12 Jul 2022 22:53:48 +0000 [thread overview]
Message-ID: <Ys37fNK6uQ+YTcBh@google.com> (raw)
In-Reply-To: <Ys33RtxeDz0egEM0@xz-m1.local>
On Tue, Jul 12, 2022, Peter Xu wrote:
> On Fri, Jun 24, 2022 at 11:27:34PM +0000, Sean Christopherson wrote:
> > Dynamically size struct pte_list_desc's array of sptes based on whether
> > or not KVM is using TDP. Commit dc1cff969101 ("KVM: X86: MMU: Tune
> > PTE_LIST_EXT to be bigger") bumped the number of entries in order to
> > improve performance when using shadow paging, but its analysis that the
> > larger size would not affect TDP was wrong. Consuming pte_list_desc
> > objects for nested TDP is indeed rare, but _allocating_ objects is not,
> > as KVM allocates 40 objects for each per-vCPU cache. Reducing the size
> > from 128 bytes to 32 bytes reduces that per-vCPU cost from 5120 bytes to
> > 1280, and also provides similar savings when eager page splitting for
> > nested MMUs kicks in.
> >
> > The per-vCPU overhead could be further reduced by using a custom, smaller
> > capacity for the per-vCPU caches, but that's more of an "and" than
> > an "or" change, e.g. it wouldn't help the eager page split use case.
> >
> > Set the list size to the bare minimum without completely defeating the
> > purpose of an array (and because pte_list_add() assumes the array is at
> > least two entries deep). A larger size, e.g. 4, would reduce the number
> > of "allocations", but those "allocations" only become allocations in
> > truth if a single vCPU depletes its cache to where a topup is needed,
> > i.e. if a single vCPU "allocates" 30+ lists. Conversely, those 2 extra
> > entries consume 16 bytes * 40 * nr_vcpus in the caches the instant nested
> > TDP is used.
> >
> > In the unlikely event that performance of aliased gfns for nested TDP
> > really is (or becomes) a priority for oddball workloads, KVM could add a
> > knob to let the admin tune the array size for their environment.
> >
> > Note, KVM also unnecessarily tops up the per-vCPU caches even when not
> > using rmaps; this can also be addressed separately.
>
> The only possible way of using pte_list_desc when tdp=1 is when the
> hypervisor tries to map the same host pages with different GPAs?
Yes, if by "host pages" you mean L1 GPAs. It happens if the L1 VMM maps multiple
L2 GFNs to a single L1 GFN, in which case KVM's nTDP shadow MMU needs to rmap
that single L1 GFN to multiple L2 GFNs.
> And we don't really have a real use case of that, or.. do we?
QEMU does it during boot/pre-boot when BIOS remaps the flash region into the lower
1mb, i.e. aliases high GPAs to low GPAs.
> Sorry to start with asking questions, it's just that if we know that
> pte_list_desc is probably not gonna be used then could we simply skip the
> cache layer as a whole? IOW, we don't make the "array size of pte list
> desc" dynamic, instead we make the whole "pte list desc cache layer"
> dynamic. Is it possible?
Not really? It's theoretically possible, but it'd require pre-checking that aren't
aliases, and to do that race free we'd have to do it under mmu_lock, which means
having to support bailing from the page fault to topup the cache. The memory
overhead for the cache isn't so significant that it's worth that level of complexity.
next prev parent reply other threads:[~2022-07-12 22:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-24 23:27 [PATCH 0/4] KVM: x86/mmu: pte_list_desc fix and cleanups Sean Christopherson
2022-06-24 23:27 ` [PATCH 1/4] KVM: x86/mmu: Track the number entries in a pte_list_desc with a ulong Sean Christopherson
2022-06-24 23:27 ` [PATCH 2/4] KVM: x86/mmu: Defer "full" MMU setup until after vendor hardware_setup() Sean Christopherson
2022-06-25 0:16 ` David Matlack
2022-06-27 15:40 ` Sean Christopherson
2022-06-27 22:50 ` David Matlack
2022-07-12 21:56 ` Peter Xu
2022-07-14 18:23 ` Sean Christopherson
2022-06-24 23:27 ` [PATCH 3/4] KVM: x86/mmu: Shrink pte_list_desc size when KVM is using TDP Sean Christopherson
2022-07-12 22:35 ` Peter Xu
2022-07-12 22:53 ` Sean Christopherson [this message]
2022-07-13 0:24 ` Peter Xu
2022-07-14 18:43 ` Sean Christopherson
2022-06-24 23:27 ` [PATCH 4/4] KVM: x86/mmu: Topup pte_list_desc cache iff VM is using rmaps Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ys37fNK6uQ+YTcBh@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox