From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com,
robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com
Subject: Re: [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored
Date: Fri, 25 Aug 2023 15:47:11 -0700 [thread overview]
Message-ID: <ZOkvbzR0Sft1lnD1@google.com> (raw)
In-Reply-To: <20230714065454.20688-1-yan.y.zhao@intel.com>
On Fri, Jul 14, 2023, Yan Zhao wrote:
> +/*
> + * Add @range into kvm->arch.mtrr_zap_list and sort the list in
> + * "length" ascending + "start" descending order, so that
> + * ranges consuming more zap cycles can be dequeued later and their
> + * chances of being found duplicated are increased.
Wrap comments as close to 80 chars as possible.
> + */
> +static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range)
> +{
> + struct list_head *head = &kvm->arch.mtrr_zap_list;
> + u64 len = range->end - range->start;
> + struct mtrr_zap_range *cur, *n;
> + bool added = false;
> +
> + spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +
> + if (list_empty(head)) {
> + list_add(&range->node, head);
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> + return;
Make this
goto out;
or
goto out_unlock;
and then do the same instead of the break; in the loop. Then "added" goes away
and there's a single unlock.
> + }
> +
> + list_for_each_entry_safe(cur, n, head, node) {
This shouldn't need to use the _safe() variant, it's not deleting anything.
> + u64 cur_len = cur->end - cur->start;
> +
> + if (len < cur_len)
> + break;
> +
> + if (len > cur_len)
> + continue;
> +
> + if (range->start > cur->start)
> + break;
> +
> + if (range->start < cur->start)
> + continue;
Looking at kvm_zap_mtrr_zap_list(), wouldn't we be better off sorting by start,
and then batching in kvm_zap_mtrr_zap_list()? And maybe make the batching "fuzzy"
for fixed MTRRs? I.e. if KVM is zapping any fixed MTRRs, zap all fixed MTRR ranges
even if there's a gap.
> +
> + /* equal len & start, no need to add */
> + added = true;
> + kfree(range);
Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
At a minimum, I think kvm_add_mtrr_zap_list() should do the allocation. That'll
dedup a decount amount of code.
At the risk of rehashing the old memslots implementation, I think we should simply
have a statically sized array in struct kvm to hold "range to zap". E.g. use 16
entries, bin all fixed MTRRs into a single range, and if the remaining 15 fill up,
purge and fall back to a full zap.
128 bytes per VM is totally acceptable, especially since we're burning waaay
more than that to deal with per-vCPU MTRRs. And a well-behaved guest should have
identical MTRRs across all vCPUs, or maybe at worst one config for the BSP and
one for APs.
> + break;
> + }
> +
> + if (!added)
> + list_add_tail(&range->node, &cur->node);
> +
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +}
> +
> +static void kvm_zap_mtrr_zap_list(struct kvm *kvm)
> +{
> + struct list_head *head = &kvm->arch.mtrr_zap_list;
> + struct mtrr_zap_range *cur = NULL;
> +
> + spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +
> + while (!list_empty(head)) {
> + u64 start, end;
> +
> + cur = list_first_entry(head, typeof(*cur), node);
> + start = cur->start;
> + end = cur->end;
> + list_del(&cur->node);
> + kfree(cur);
Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +
> + kvm_zap_gfn_range(kvm, start, end);
> +
> + spin_lock(&kvm->arch.mtrr_zap_list_lock);
> + }
> +
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +}
> +
> +static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
> +{
> + if (atomic_cmpxchg_acquire(&kvm->arch.mtrr_zapping, 0, 1) == 0) {
> + kvm_zap_mtrr_zap_list(kvm);
> + atomic_set_release(&kvm->arch.mtrr_zapping, 0);
> + return;
> + }
> +
> + while (atomic_read(&kvm->arch.mtrr_zapping))
> + cpu_relax();
> +}
> +
> +static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
> + gfn_t gfn_start, gfn_t gfn_end)
> +{
> + struct mtrr_zap_range *range;
> +
> + range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
> + if (!range)
> + goto fail;
> +
> + range->start = gfn_start;
> + range->end = gfn_end;
> +
> + kvm_add_mtrr_zap_list(vcpu->kvm, range);
> +
> + kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
> + return;
> +
> +fail:
> + kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
> +}
> +
> +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
Rather than provide a one-liner, add something like
void kvm_mtrr_cr0_cd_changed(struct kvm_vcpu *vcpu)
{
if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
return;
return kvm_zap_gfn_range(vcpu, 0, -1ull);
}
that avoids the comically long function name, and keeps the MTRR logic more
contained in the MTRR code.
> +{
> + return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
Meh, just zap 0 => ~0ull. That 51:0 happens to be the theoretical max gfn on
x86 is coincidence (AFAIK). And if the guest.MAXPHYADDR < 52, shifting ~0ull
still doesn't yield a "legal" gfn.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 32cc8bfaa5f1..bb79154cf465 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -943,7 +943,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>
> if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> - kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> + kvm_honors_guest_mtrrs_zap_on_cd_toggle(vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
>
> @@ -12310,6 +12310,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> kvm->arch.guest_can_read_msr_platform_info = true;
> kvm->arch.enable_pmu = enable_pmu;
>
> + spin_lock_init(&kvm->arch.mtrr_zap_list_lock);
> + INIT_LIST_HEAD(&kvm->arch.mtrr_zap_list);
> +
> #if IS_ENABLED(CONFIG_HYPERV)
> spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> kvm->arch.hv_root_tdp = INVALID_PAGE;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index e7733dc4dccc..56d8755b2560 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -315,6 +315,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> int page_num);
> void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
> u8 *type, bool *ipat);
> +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu);
> bool kvm_vector_hashing_enabled(void);
> void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
> int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
> --
> 2.17.1
>
next prev parent reply other threads:[~2023-08-25 22:48 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-14 6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
2023-07-14 6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
2023-10-07 7:00 ` Like Xu
2023-10-09 19:52 ` Sean Christopherson
2023-10-09 21:27 ` Sean Christopherson
2023-10-09 21:36 ` Sean Christopherson
2023-10-10 3:46 ` Yan Zhao
2023-10-11 0:08 ` Sean Christopherson
2023-10-11 1:47 ` Yan Zhao
2023-07-14 6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
2023-07-14 6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
2023-07-14 6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
2023-07-14 6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
2023-07-14 6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
2023-07-14 6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
2023-08-25 21:43 ` Sean Christopherson
2023-09-04 7:41 ` Yan Zhao
2023-07-14 6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
2023-08-25 21:46 ` Sean Christopherson
2023-09-04 7:46 ` Yan Zhao
2023-07-14 6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
2023-08-25 22:47 ` Sean Christopherson [this message]
2023-09-04 8:24 ` Yan Zhao
2023-07-14 6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
2023-08-25 23:13 ` Sean Christopherson
2023-09-04 8:37 ` Yan Zhao
2023-07-14 6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
2023-08-25 23:15 ` Sean Christopherson
2023-09-04 8:39 ` Yan Zhao
2023-07-14 6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
2023-08-25 21:34 ` Sean Christopherson
2023-09-04 7:31 ` Yan Zhao
2023-09-05 22:31 ` Sean Christopherson
2023-09-06 0:50 ` Yan Zhao
2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
2023-09-04 8:48 ` Yan Zhao
2023-10-05 1:29 ` Sean Christopherson
2023-10-05 2:19 ` Huang, Kai
2023-10-05 2:28 ` Sean Christopherson
2023-10-06 0:50 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOkvbzR0Sft1lnD1@google.com \
--to=seanjc@google.com \
--cc=chao.gao@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=robert.hoo.linux@gmail.com \
--cc=yan.y.zhao@intel.com \
--cc=yuan.yao@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).