All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jann Horn <jannh@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.18 08/35] KVM: x86: do not report a vCPU as preempted outside instruction boundaries
Date: Tue,  9 Aug 2022 20:00:37 +0200	[thread overview]
Message-ID: <20220809175515.374218393@linuxfoundation.org> (raw)
In-Reply-To: <20220809175515.046484486@linuxfoundation.org>

From: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 6cd88243c7e03845a450795e134b488fc2afb736 ]

If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access.  A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.

To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary.  A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.

It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed.  However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses.  Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/vmx.c          |  1 +
 arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
 4 files changed, 28 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4ff36610af6a..9fdaa847d4b6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -651,6 +651,7 @@ struct kvm_vcpu_arch {
 	u64 ia32_misc_enable_msr;
 	u64 smbase;
 	u64 smi_count;
+	bool at_instruction_boundary;
 	bool tpr_access_reporting;
 	bool xsaves_enabled;
 	bool xfd_no_write_intercept;
@@ -1289,6 +1290,8 @@ struct kvm_vcpu_stat {
 	u64 nested_run;
 	u64 directed_yield_attempted;
 	u64 directed_yield_successful;
+	u64 preemption_reported;
+	u64 preemption_other;
 	u64 guest_mode;
 };
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6bfb0b0e66bd..c667214c630b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4166,6 +4166,8 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 
 static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
+	if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_INTR)
+		vcpu->arch.at_instruction_boundary = true;
 }
 
 static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4b6a0268c78e..597c3c08da50 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6630,6 +6630,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 		return;
 
 	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+	vcpu->arch.at_instruction_boundary = true;
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 53b6fdf30c99..df74ec51c7f3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -291,6 +291,8 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
 	STATS_DESC_COUNTER(VCPU, nested_run),
 	STATS_DESC_COUNTER(VCPU, directed_yield_attempted),
 	STATS_DESC_COUNTER(VCPU, directed_yield_successful),
+	STATS_DESC_COUNTER(VCPU, preemption_reported),
+	STATS_DESC_COUNTER(VCPU, preemption_other),
 	STATS_DESC_ICOUNTER(VCPU, guest_mode)
 };
 
@@ -4607,6 +4609,19 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	struct kvm_memslots *slots;
 	static const u8 preempted = KVM_VCPU_PREEMPTED;
 
+	/*
+	 * The vCPU can be marked preempted if and only if the VM-Exit was on
+	 * an instruction boundary and will not trigger guest emulation of any
+	 * kind (see vcpu_run).  Vendor specific code controls (conservatively)
+	 * when this is true, for example allowing the vCPU to be marked
+	 * preempted if and only if the VM-Exit was due to a host interrupt.
+	 */
+	if (!vcpu->arch.at_instruction_boundary) {
+		vcpu->stat.preemption_other++;
+		return;
+	}
+
+	vcpu->stat.preemption_reported++;
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
@@ -10363,6 +10378,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 	vcpu->arch.l1tf_flush_l1d = true;
 
 	for (;;) {
+		/*
+		 * If another guest vCPU requests a PV TLB flush in the middle
+		 * of instruction emulation, the rest of the emulation could
+		 * use a stale page translation. Assume that any code after
+		 * this point can start executing an instruction.
+		 */
+		vcpu->arch.at_instruction_boundary = false;
 		if (kvm_vcpu_running(vcpu)) {
 			r = vcpu_enter_guest(vcpu);
 		} else {
-- 
2.35.1




  parent reply	other threads:[~2022-08-09 18:20 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09 18:00 [PATCH 5.18 00/35] 5.18.17-rc1 review Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 01/35] x86/speculation: Make all RETbleed mitigations 64-bit only Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 02/35] block: fix default IO priority handling again Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 03/35] tools/vm/slabinfo: Handle files in debugfs Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 04/35] ACPI: video: Force backlight native for some TongFang devices Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 05/35] ACPI: video: Shortening quirk list by identifying Clevo by board_name only Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 06/35] ACPI: APEI: Better fix to avoid spamming the console with old error logs Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 07/35] crypto: arm64/poly1305 - fix a read out-of-bound Greg Kroah-Hartman
2022-08-09 18:00 ` Greg Kroah-Hartman [this message]
2022-08-09 18:00 ` [PATCH 5.18 09/35] KVM: x86: do not set st->preempted when going back to user space Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 10/35] KVM: selftests: Make hyperv_clock selftest more stable Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 11/35] KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 12/35] entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 13/35] KVM: x86: disable preemption while updating apicv inhibition Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 14/35] KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 15/35] KVM: selftests: Restrict test region to 48-bit physical addresses when using nested Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 16/35] tools/kvm_stat: fix display of error when multiple processes are found Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 17/35] selftests: KVM: Handle compiler optimizations in ucall Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 18/35] KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user() Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 19/35] arm64: set UXN on swapper page tables Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 20/35] btrfs: zoned: prevent allocation from previous data relocation BG Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 21/35] btrfs: zoned: fix critical section of relocation inode writeback Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 22/35] btrfs: zoned: drop optimization of zone finish Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 23/35] Bluetooth: hci_qca: Return wakeup for qca_wakeup Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 24/35] Bluetooth: hci_bcm: Add BCM4349B1 variant Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 25/35] Bluetooth: hci_bcm: Add DT compatible for CYW55572 Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 26/35] dt-bindings: bluetooth: broadcom: Add BCM4349B1 DT binding Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 27/35] Bluetooth: btusb: Add support of IMC Networks PID 0x3568 Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 28/35] Bluetooth: btusb: Add Realtek RTL8852C support ID 0x04CA:0x4007 Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 29/35] Bluetooth: btusb: Add Realtek RTL8852C support ID 0x04C5:0x1675 Greg Kroah-Hartman
2022-08-09 18:00 ` [PATCH 5.18 30/35] Bluetooth: btusb: Add Realtek RTL8852C support ID 0x0CB8:0xC558 Greg Kroah-Hartman
2022-08-09 18:01 ` [PATCH 5.18 31/35] Bluetooth: btusb: Add Realtek RTL8852C support ID 0x13D3:0x3587 Greg Kroah-Hartman
2022-08-09 18:01 ` [PATCH 5.18 32/35] Bluetooth: btusb: Add Realtek RTL8852C support ID 0x13D3:0x3586 Greg Kroah-Hartman
2022-08-09 18:01 ` [PATCH 5.18 33/35] macintosh/adb: fix oob read in do_adb_query() function Greg Kroah-Hartman
2022-08-09 18:01 ` [PATCH 5.18 34/35] x86/speculation: Add RSB VM Exit protections Greg Kroah-Hartman
2022-08-09 18:01 ` [PATCH 5.18 35/35] x86/speculation: Add LFENCE to RSB fill sequence Greg Kroah-Hartman
2022-08-09 21:47 ` [PATCH 5.18 00/35] 5.18.17-rc1 review Florian Fainelli
2022-08-10  6:16 ` Naresh Kamboju
2022-08-10 12:54 ` Ron Economos
2022-08-10 13:26 ` Sudip Mukherjee (Codethink)
2022-08-10 13:33 ` Guenter Roeck
2022-08-10 14:17 ` Justin Forbes
2022-08-10 14:25 ` Jon Hunter
2022-08-10 14:31 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220809175515.374218393@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.