From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com,
robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com
Subject: Re: [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored
Date: Fri, 25 Aug 2023 15:47:11 -0700 [thread overview]
Message-ID: <ZOkvbzR0Sft1lnD1@google.com> (raw)
In-Reply-To: <20230714065454.20688-1-yan.y.zhao@intel.com>
On Fri, Jul 14, 2023, Yan Zhao wrote:
> +/*
> + * Add @range into kvm->arch.mtrr_zap_list and sort the list in
> + * "length" ascending + "start" descending order, so that
> + * ranges consuming more zap cycles can be dequeued later and their
> + * chances of being found duplicated are increased.
Wrap comments as close to 80 chars as possible.
> + */
> +static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range)
> +{
> + struct list_head *head = &kvm->arch.mtrr_zap_list;
> + u64 len = range->end - range->start;
> + struct mtrr_zap_range *cur, *n;
> + bool added = false;
> +
> + spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +
> + if (list_empty(head)) {
> + list_add(&range->node, head);
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> + return;
Make this
goto out;
or
goto out_unlock;
and then do the same instead of the break; in the loop. Then "added" goes away
and there's a single unlock.
> + }
> +
> + list_for_each_entry_safe(cur, n, head, node) {
This shouldn't need to use the _safe() variant, it's not deleting anything.
> + u64 cur_len = cur->end - cur->start;
> +
> + if (len < cur_len)
> + break;
> +
> + if (len > cur_len)
> + continue;
> +
> + if (range->start > cur->start)
> + break;
> +
> + if (range->start < cur->start)
> + continue;
Looking at kvm_zap_mtrr_zap_list(), wouldn't we be better off sorting by start,
and then batching in kvm_zap_mtrr_zap_list()? And maybe make the batching "fuzzy"
for fixed MTRRs? I.e. if KVM is zapping any fixed MTRRs, zap all fixed MTRR ranges
even if there's a gap.
> +
> + /* equal len & start, no need to add */
> + added = true;
> + kfree(range);
Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
At a minimum, I think kvm_add_mtrr_zap_list() should do the allocation. That'll
dedup a decount amount of code.
At the risk of rehashing the old memslots implementation, I think we should simply
have a statically sized array in struct kvm to hold "range to zap". E.g. use 16
entries, bin all fixed MTRRs into a single range, and if the remaining 15 fill up,
purge and fall back to a full zap.
128 bytes per VM is totally acceptable, especially since we're burning waaay
more than that to deal with per-vCPU MTRRs. And a well-behaved guest should have
identical MTRRs across all vCPUs, or maybe at worst one config for the BSP and
one for APs.
> + break;
> + }
> +
> + if (!added)
> + list_add_tail(&range->node, &cur->node);
> +
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +}
> +
> +static void kvm_zap_mtrr_zap_list(struct kvm *kvm)
> +{
> + struct list_head *head = &kvm->arch.mtrr_zap_list;
> + struct mtrr_zap_range *cur = NULL;
> +
> + spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +
> + while (!list_empty(head)) {
> + u64 start, end;
> +
> + cur = list_first_entry(head, typeof(*cur), node);
> + start = cur->start;
> + end = cur->end;
> + list_del(&cur->node);
> + kfree(cur);
Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +
> + kvm_zap_gfn_range(kvm, start, end);
> +
> + spin_lock(&kvm->arch.mtrr_zap_list_lock);
> + }
> +
> + spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +}
> +
> +static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
> +{
> + if (atomic_cmpxchg_acquire(&kvm->arch.mtrr_zapping, 0, 1) == 0) {
> + kvm_zap_mtrr_zap_list(kvm);
> + atomic_set_release(&kvm->arch.mtrr_zapping, 0);
> + return;
> + }
> +
> + while (atomic_read(&kvm->arch.mtrr_zapping))
> + cpu_relax();
> +}
> +
> +static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
> + gfn_t gfn_start, gfn_t gfn_end)
> +{
> + struct mtrr_zap_range *range;
> +
> + range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
> + if (!range)
> + goto fail;
> +
> + range->start = gfn_start;
> + range->end = gfn_end;
> +
> + kvm_add_mtrr_zap_list(vcpu->kvm, range);
> +
> + kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
> + return;
> +
> +fail:
> + kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
> +}
> +
> +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
Rather than provide a one-liner, add something like
void kvm_mtrr_cr0_cd_changed(struct kvm_vcpu *vcpu)
{
if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
return;
return kvm_zap_gfn_range(vcpu, 0, -1ull);
}
that avoids the comically long function name, and keeps the MTRR logic more
contained in the MTRR code.
> +{
> + return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
Meh, just zap 0 => ~0ull. That 51:0 happens to be the theoretical max gfn on
x86 is coincidence (AFAIK). And if the guest.MAXPHYADDR < 52, shifting ~0ull
still doesn't yield a "legal" gfn.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 32cc8bfaa5f1..bb79154cf465 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -943,7 +943,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>
> if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> - kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> + kvm_honors_guest_mtrrs_zap_on_cd_toggle(vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
>
> @@ -12310,6 +12310,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> kvm->arch.guest_can_read_msr_platform_info = true;
> kvm->arch.enable_pmu = enable_pmu;
>
> + spin_lock_init(&kvm->arch.mtrr_zap_list_lock);
> + INIT_LIST_HEAD(&kvm->arch.mtrr_zap_list);
> +
> #if IS_ENABLED(CONFIG_HYPERV)
> spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> kvm->arch.hv_root_tdp = INVALID_PAGE;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index e7733dc4dccc..56d8755b2560 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -315,6 +315,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> int page_num);
> void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
> u8 *type, bool *ipat);
> +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu);
> bool kvm_vector_hashing_enabled(void);
> void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
> int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
> --
> 2.17.1
>
next prev parent reply other threads:[~2023-08-25 22:48 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-14 6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
2023-07-14 6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
2023-10-07 7:00 ` Like Xu
2023-10-09 19:52 ` Sean Christopherson
2023-10-09 21:27 ` Sean Christopherson
2023-10-09 21:36 ` Sean Christopherson
2023-10-10 3:46 ` Yan Zhao
2023-10-11 0:08 ` Sean Christopherson
2023-10-11 1:47 ` Yan Zhao
2023-07-14 6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
2023-07-14 6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
2023-07-14 6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
2023-07-14 6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
2023-07-14 6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
2023-07-14 6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
2023-08-25 21:43 ` Sean Christopherson
2023-09-04 7:41 ` Yan Zhao
2023-07-14 6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
2023-08-25 21:46 ` Sean Christopherson
2023-09-04 7:46 ` Yan Zhao
2023-07-14 6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
2023-08-25 22:47 ` Sean Christopherson [this message]
2023-09-04 8:24 ` Yan Zhao
2023-07-14 6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
2023-08-25 23:13 ` Sean Christopherson
2023-09-04 8:37 ` Yan Zhao
2023-07-14 6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
2023-08-25 23:15 ` Sean Christopherson
2023-09-04 8:39 ` Yan Zhao
2023-07-14 6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
2023-08-25 21:34 ` Sean Christopherson
2023-09-04 7:31 ` Yan Zhao
2023-09-05 22:31 ` Sean Christopherson
2023-09-06 0:50 ` Yan Zhao
2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
2023-09-04 8:48 ` Yan Zhao
2023-10-05 1:29 ` Sean Christopherson
2023-10-05 2:19 ` Huang, Kai
2023-10-05 2:28 ` Sean Christopherson
2023-10-06 0:50 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOkvbzR0Sft1lnD1@google.com \
--to=seanjc@google.com \
--cc=chao.gao@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=robert.hoo.linux@gmail.com \
--cc=yan.y.zhao@intel.com \
--cc=yuan.yao@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.