[PATCH 6/9] KVM: x86: do not report a vCPU as preempted outside instruction boundaries

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rishabh Bhatnagar <risbhat@amazon.com>
To: <gregkh@linuxfoundation.org>, <stable@vger.kernel.org>
Cc: <lee@kernel.org>, <seanjc@google.com>, <kvm@vger.kernel.org>,
	<bp@alien8.de>, <mingo@redhat.com>, <tglx@linutronix.de>,
	<pbonzini@redhat.com>, <vkuznets@redhat.com>,
	<wanpengli@tencent.com>, <jmattson@google.com>, <joro@8bytes.org>,
	Jann Horn <jannh@google.com>,
	Rishabh Bhatnagar <risbhat@amazon.com>,
	Allen Pais <apais@linux.microsoft.com>
Subject: [PATCH 6/9] KVM: x86: do not report a vCPU as preempted outside instruction boundaries
Date: Wed, 10 May 2023 18:15:44 +0000	[thread overview]
Message-ID: <20230510181547.22451-7-risbhat@amazon.com> (raw)
In-Reply-To: <20230510181547.22451-1-risbhat@amazon.com>

From: Paolo Bonzini <pbonzini@redhat.com>

commit 6cd88243c7e03845a450795e134b488fc2afb736 upstream.

If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access.  A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.

To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary.  A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.

It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed.  However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses.  Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[risbhat@amazon.com: Use VCPU_STAT to expose preemption_reported and
preemption_other debugfs entries.]
Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com>
Tested-by: Allen Pais <apais@linux.microsoft.com>
Acked-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/vmx.c          |  1 +
 arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
 4 files changed, 28 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2452c28b3e69..3e9f1c820edb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -553,6 +553,7 @@ struct kvm_vcpu_arch {
 	u64 ia32_misc_enable_msr;
 	u64 smbase;
 	u64 smi_count;
+	bool at_instruction_boundary;
 	bool tpr_access_reporting;
 	bool xsaves_enabled;
 	u64 ia32_xss;
@@ -1061,6 +1062,8 @@ struct kvm_vcpu_stat {
 	u64 req_event;
 	u64 halt_poll_success_ns;
 	u64 halt_poll_fail_ns;
+	u64 preemption_reported;
+	u64 preemption_other;
 };
 
 struct x86_instruction_info;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5775983fec56..7b2b61309d8a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3983,6 +3983,8 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 
 static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
+	if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_INTR)
+		vcpu->arch.at_instruction_boundary = true;
 }
 
 static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2c5d8b9f9873..16943e923902 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6510,6 +6510,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 		return;
 
 	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+	vcpu->arch.at_instruction_boundary = true;
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 80231f62e129..116a225fb26e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -231,6 +231,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("l1d_flush", l1d_flush),
 	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
 	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT("preemption_reported", preemption_reported),
+	VCPU_STAT("preemption_other", preemption_other),
 	VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped),
 	VM_STAT("mmu_pte_write", mmu_pte_write),
 	VM_STAT("mmu_pde_zapped", mmu_pde_zapped),
@@ -4095,6 +4097,19 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	struct kvm_memslots *slots;
 	static const u8 preempted = KVM_VCPU_PREEMPTED;
 
+	/*
+	 * The vCPU can be marked preempted if and only if the VM-Exit was on
+	 * an instruction boundary and will not trigger guest emulation of any
+	 * kind (see vcpu_run).  Vendor specific code controls (conservatively)
+	 * when this is true, for example allowing the vCPU to be marked
+	 * preempted if and only if the VM-Exit was due to a host interrupt.
+	 */
+	if (!vcpu->arch.at_instruction_boundary) {
+		vcpu->stat.preemption_other++;
+		return;
+	}
+
+	vcpu->stat.preemption_reported++;
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
@@ -9399,6 +9414,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 	vcpu->arch.l1tf_flush_l1d = true;
 
 	for (;;) {
+		/*
+		 * If another guest vCPU requests a PV TLB flush in the middle
+		 * of instruction emulation, the rest of the emulation could
+		 * use a stale page translation. Assume that any code after
+		 * this point can start executing an instruction.
+		 */
+		vcpu->arch.at_instruction_boundary = false;
 		if (kvm_vcpu_running(vcpu)) {
 			r = vcpu_enter_guest(vcpu);
 		} else {
-- 
2.39.2

next prev parent reply	other threads:[~2023-05-10 18:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-10 18:15 [PATCH 0/9] KVM backports to 5.10 Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 1/9] KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 2/9] KVM: x86: Fix recording of guest steal time / preempted status Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 3/9] KVM: Fix steal time asm constraints Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 4/9] KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put() Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 5/9] KVM: x86: do not set st->preempted when going back to user space Rishabh Bhatnagar
2023-05-10 18:15 ` Rishabh Bhatnagar [this message]
2023-05-10 18:15 ` [PATCH 7/9] KVM: x86: revalidate steal time cache if MSR value changes Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 8/9] KVM: x86: do not report preemption if the steal time cache is stale Rishabh Bhatnagar
2023-05-10 18:15 ` [PATCH 9/9] KVM: x86: move guest_pv_has out of user_access section Rishabh Bhatnagar
2023-05-15 12:47 ` [PATCH 0/9] KVM backports to 5.10 Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2022-09-09 18:55 Rishabh Bhatnagar
2022-09-09 18:55 ` [PATCH 6/9] KVM: x86: do not report a vCPU as preempted outside instruction boundaries Rishabh Bhatnagar
2022-09-09 18:13 [PATCH 0/9] KVM backports to 5.10 Rishabh Bhatnagar
2022-09-09 18:13 ` [PATCH 6/9] KVM: x86: do not report a vCPU as preempted outside instruction boundaries Rishabh Bhatnagar

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2452c28b3e6 dfblob:3e9f1c820ed dfblob:5775983fec5
dfblob:7b2b61309d8 dfblob:2c5d8b9f987 dfblob:16943e92390
dfblob:80231f62e12 dfblob:116a225fb26 )
 OR (
bs:"[PATCH 6/9] KVM: x86: do not report a vCPU as preempted outside instruction boundaries" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230510181547.22451-7-risbhat@amazon.com \
    --to=risbhat@amazon.com \
    --cc=apais@linux.microsoft.com \
    --cc=bp@alien8.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=lee@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.