From: Sean Christopherson <seanjc@google.com>
To: Yong He <zhuangel570@gmail.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org, wanpengli@tencent.com,
alexyonghe@tencent.com, junaids@google.com
Subject: Re: [PATCH 2/2] KVM: x86: introduce cache configurations for previous CR3s
Date: Tue, 29 Oct 2024 08:14:50 -0700 [thread overview]
Message-ID: <ZyD76t8kY3dvO6Yg@google.com> (raw)
In-Reply-To: <20241029031400.622854-3-alexyonghe@tencent.com>
On Tue, Oct 29, 2024, Yong He wrote:
> From: Yong He <alexyonghe@tencent.com>
>
> Introduce prev_roots_num param, so that we use more cache of
> previous CR3/root_hpa pairs, which help us to reduce shadow
> page table evict and rebuild overhead.
>
> Signed-off-by: Yong He <alexyonghe@tencent.com>
> ---
...
> +uint __read_mostly prev_roots_num = KVM_MMU_NUM_PREV_ROOTS;
> +EXPORT_SYMBOL_GPL(prev_roots_num);
> +module_param_cb(prev_roots_num, &prev_roots_num_ops,
> + &prev_roots_num, 0644);
Allowing the variable to be changed while KVM is running is unsafe.
I also think a module param is the wrong way to try to allow for bigger caches.
The caches themselves are relatively cheap, at 16 bytes per entry. And I doubt
the cost of searching a larger cache in fast_pgd_switch() would have a measurable
impact, since the most recently used roots will be at the front of the cache,
i.e. only near-misses and misses will be affected.
The only potential downside to larger caches I can think of, is that keeping
root_count elevated would make it more difficult to reclaim shadow pages from
roots that are no longer relevant to the guest. kvm_mmu_zap_oldest_mmu_pages()
in particular would refuse to reclaim roots. That shouldn't be problematic for
legacy shadow paging, because KVM doesn't recursively zap shadow pages. But for
nested TDP, mmu_page_zap_pte() frees the entire tree, in the common case that
child SPTEs aren't shared across multiple trees (common in legacy shadow paging,
extremely uncommon in nested TDP).
And for the nested TDP issue, if it's actually a problem, I would *love* to
solve that problem by making KVM's forced reclaim more sophisticated. E.g. one
idea would be to kick all vCPUs if the maximum number of pages has been reached,
have each vCPU purge old roots from prev_roots, and then reclaim unused roots.
It would be a bit more complicated than that, as KVM would need a way to ensure
forward progress, e.g. if the shadow pages limit has been reach with a single
root. But even then, kvm_mmu_zap_oldest_mmu_pages() could be made a _lot_ smarter.
TL;DR: what if we simply bump the number of cached roots to ~16?
next prev parent reply other threads:[~2024-10-29 15:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-29 3:13 [PATCH 0/2] Introduce configuration for LRU cache of previous CR3s Yong He
2024-10-29 3:13 ` [PATCH 1/2] KVM: x86: expand the " Yong He
2024-10-29 14:40 ` Sean Christopherson
2024-10-30 12:08 ` zhuangel570
2024-10-29 3:14 ` [PATCH 2/2] KVM: x86: introduce cache configurations for " Yong He
2024-10-29 15:14 ` Sean Christopherson [this message]
2024-10-30 12:51 ` zhuangel570
2024-11-06 1:42 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZyD76t8kY3dvO6Yg@google.com \
--to=seanjc@google.com \
--cc=alexyonghe@tencent.com \
--cc=junaids@google.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=wanpengli@tencent.com \
--cc=zhuangel570@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox