patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Omar Sandoval <osandov@fb.com>,
	Sean Christopherson <seanjc@google.com>
Subject: [PATCH 6.18 10/29] KVM: SVM: Dont skip unrelated instruction if INT3/INTO is replaced
Date: Wed, 10 Dec 2025 16:30:20 +0900	[thread overview]
Message-ID: <20251210072944.652611812@linuxfoundation.org> (raw)
In-Reply-To: <20251210072944.363788552@linuxfoundation.org>

6.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Omar Sandoval <osandov@fb.com>

commit 4da3768e1820cf15cced390242d8789aed34f54d upstream.

When re-injecting a soft interrupt from an INT3, INT0, or (select) INTn
instruction, discard the exception and retry the instruction if the code
stream is changed (e.g. by a different vCPU) between when the CPU
executes the instruction and when KVM decodes the instruction to get the
next RIP.

As effectively predicted by commit 6ef88d6e36c2 ("KVM: SVM: Re-inject
INT3/INTO instead of retrying the instruction"), failure to verify that
the correct INTn instruction was decoded can effectively clobber guest
state due to decoding the wrong instruction and thus specifying the
wrong next RIP.

The bug most often manifests as "Oops: int3" panics on static branch
checks in Linux guests.  Enabling or disabling a static branch in Linux
uses the kernel's "text poke" code patching mechanism.  To modify code
while other CPUs may be executing that code, Linux (temporarily)
replaces the first byte of the original instruction with an int3 (opcode
0xcc), then patches in the new code stream except for the first byte,
and finally replaces the int3 with the first byte of the new code
stream.  If a CPU hits the int3, i.e. executes the code while it's being
modified, then the guest kernel must look up the RIP to determine how to
handle the #BP, e.g. by emulating the new instruction.  If the RIP is
incorrect, then this lookup fails and the guest kernel panics.

The bug reproduces almost instantly by hacking the guest kernel to
repeatedly check a static branch[1] while running a drgn script[2] on
the host to constantly swap out the memory containing the guest's TSS.

[1]: https://gist.github.com/osandov/44d17c51c28c0ac998ea0334edf90b5a
[2]: https://gist.github.com/osandov/10e45e45afa29b11e0c7209247afc00b

Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Cc: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://patch.msgid.link/1cc6dcdf36e3add7ee7c8d90ad58414eeb6c3d34.1762278762.git.osandov@fb.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kvm_host.h |    9 +++++++++
 arch/x86/kvm/svm/svm.c          |   24 +++++++++++++-----------
 arch/x86/kvm/x86.c              |   21 +++++++++++++++++++++
 3 files changed, 43 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2143,6 +2143,11 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
  *			     the gfn, i.e. retrying the instruction will hit a
  *			     !PRESENT fault, which results in a new shadow page
  *			     and sends KVM back to square one.
+ *
+ * EMULTYPE_SKIP_SOFT_INT - Set in combination with EMULTYPE_SKIP to only skip
+ *                          an instruction if it could generate a given software
+ *                          interrupt, which must be encoded via
+ *                          EMULTYPE_SET_SOFT_INT_VECTOR().
  */
 #define EMULTYPE_NO_DECODE	    (1 << 0)
 #define EMULTYPE_TRAP_UD	    (1 << 1)
@@ -2153,6 +2158,10 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
 #define EMULTYPE_PF		    (1 << 6)
 #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
 #define EMULTYPE_WRITE_PF_TO_SP	    (1 << 8)
+#define EMULTYPE_SKIP_SOFT_INT	    (1 << 9)
+
+#define EMULTYPE_SET_SOFT_INT_VECTOR(v)	((u32)((v) & 0xff) << 16)
+#define EMULTYPE_GET_SOFT_INT_VECTOR(e)	(((e) >> 16) & 0xff)
 
 static inline bool kvm_can_emulate_event_vectoring(int emul_type)
 {
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -272,6 +272,7 @@ static void svm_set_interrupt_shadow(str
 }
 
 static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
+					   int emul_type,
 					   bool commit_side_effects)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -293,7 +294,7 @@ static int __svm_skip_emulated_instructi
 		if (unlikely(!commit_side_effects))
 			old_rflags = svm->vmcb->save.rflags;
 
-		if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+		if (!kvm_emulate_instruction(vcpu, emul_type))
 			return 0;
 
 		if (unlikely(!commit_side_effects))
@@ -311,11 +312,13 @@ done:
 
 static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
-	return __svm_skip_emulated_instruction(vcpu, true);
+	return __svm_skip_emulated_instruction(vcpu, EMULTYPE_SKIP, true);
 }
 
-static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu)
+static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu, u8 vector)
 {
+	const int emul_type = EMULTYPE_SKIP | EMULTYPE_SKIP_SOFT_INT |
+			      EMULTYPE_SET_SOFT_INT_VECTOR(vector);
 	unsigned long rip, old_rip = kvm_rip_read(vcpu);
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -331,7 +334,7 @@ static int svm_update_soft_interrupt_rip
 	 * in use, the skip must not commit any side effects such as clearing
 	 * the interrupt shadow or RFLAGS.RF.
 	 */
-	if (!__svm_skip_emulated_instruction(vcpu, !nrips))
+	if (!__svm_skip_emulated_instruction(vcpu, emul_type, !nrips))
 		return -EIO;
 
 	rip = kvm_rip_read(vcpu);
@@ -367,7 +370,7 @@ static void svm_inject_exception(struct
 	kvm_deliver_exception_payload(vcpu, ex);
 
 	if (kvm_exception_is_soft(ex->vector) &&
-	    svm_update_soft_interrupt_rip(vcpu))
+	    svm_update_soft_interrupt_rip(vcpu, ex->vector))
 		return;
 
 	svm->vmcb->control.event_inj = ex->vector
@@ -3633,11 +3636,12 @@ static bool svm_set_vnmi_pending(struct
 
 static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 {
+	struct kvm_queued_interrupt *intr = &vcpu->arch.interrupt;
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u32 type;
 
-	if (vcpu->arch.interrupt.soft) {
-		if (svm_update_soft_interrupt_rip(vcpu))
+	if (intr->soft) {
+		if (svm_update_soft_interrupt_rip(vcpu, intr->nr))
 			return;
 
 		type = SVM_EVTINJ_TYPE_SOFT;
@@ -3645,12 +3649,10 @@ static void svm_inject_irq(struct kvm_vc
 		type = SVM_EVTINJ_TYPE_INTR;
 	}
 
-	trace_kvm_inj_virq(vcpu->arch.interrupt.nr,
-			   vcpu->arch.interrupt.soft, reinjected);
+	trace_kvm_inj_virq(intr->nr, intr->soft, reinjected);
 	++vcpu->stat.irq_injections;
 
-	svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr |
-				       SVM_EVTINJ_VALID | type;
+	svm->vmcb->control.event_inj = intr->nr | SVM_EVTINJ_VALID | type;
 }
 
 void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9337,6 +9337,23 @@ static bool is_vmware_backdoor_opcode(st
 	return false;
 }
 
+static bool is_soft_int_instruction(struct x86_emulate_ctxt *ctxt,
+				    int emulation_type)
+{
+	u8 vector = EMULTYPE_GET_SOFT_INT_VECTOR(emulation_type);
+
+	switch (ctxt->b) {
+	case 0xcc:
+		return vector == BP_VECTOR;
+	case 0xcd:
+		return vector == ctxt->src.val;
+	case 0xce:
+		return vector == OF_VECTOR;
+	default:
+		return false;
+	}
+}
+
 /*
  * Decode an instruction for emulation.  The caller is responsible for handling
  * code breakpoints.  Note, manually detecting code breakpoints is unnecessary
@@ -9447,6 +9464,10 @@ int x86_emulate_instruction(struct kvm_v
 	 * injecting single-step #DBs.
 	 */
 	if (emulation_type & EMULTYPE_SKIP) {
+		if (emulation_type & EMULTYPE_SKIP_SOFT_INT &&
+		    !is_soft_int_instruction(ctxt, emulation_type))
+			return 0;
+
 		if (ctxt->mode != X86EMUL_MODE_PROT64)
 			ctxt->eip = (u32)ctxt->_eip;
 		else



  parent reply	other threads:[~2025-12-10  7:36 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  7:30 [PATCH 6.18 00/29] 6.18.1-rc1 review Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 01/29] Documentation: process: Also mention Sasha Levin as stable tree maintainer Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 02/29] jbd2: avoid bug_on in jbd2_journal_get_create_access() when file system corrupted Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 03/29] ext4: refresh inline data size before write operations Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 04/29] ksmbd: ipc: fix use-after-free in ipc_msg_send_request Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 05/29] locking/spinlock/debug: Fix data-race in do_raw_write_lock Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 06/29] crypto: zstd - fix double-free in per-CPU stream cleanup Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 07/29] ext4: add i_data_sem protection in ext4_destroy_inline_data_nolock() Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 08/29] rust_binder: fix race condition on death_list Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 09/29] comedi: pcl818: fix null-ptr-deref in pcl818_ai_cancel() Greg Kroah-Hartman
2025-12-10  7:30 ` Greg Kroah-Hartman [this message]
2025-12-10  7:30 ` [PATCH 6.18 11/29] USB: serial: option: add Foxconn T99W760 Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 12/29] USB: serial: option: add Telit Cinterion FE910C04 new compositions Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 13/29] USB: serial: option: move Telit 0x10c7 composition in the right place Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 14/29] USB: serial: ftdi_sio: match on interface number for jtag Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 15/29] serial: add support of CPCI cards Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 16/29] dt-bindings: serial: rsci: Drop "uart-has-rtscts: false" Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 17/29] serial: sh-sci: Fix deadlock during RSCI FIFO overrun error Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 18/29] USB: serial: belkin_sa: fix TIOCMBIS and TIOCMBIC Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 19/29] USB: serial: kobil_sct: " Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 20/29] Documentation/rtla: rename common_xxx.rst files to common_xxx.txt Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 21/29] wifi: rtl8xxxu: Add USB ID 2001:3328 for D-Link AN3U rev. A1 Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 22/29] wifi: rtw88: Add USB ID 2001:3329 for D-Link AC13U " Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 23/29] iio: adc: ad4080: fix chip identification Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 24/29] comedi: c6xdigio: Fix invalid PNP driver unregistration Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 25/29] comedi: multiq3: sanitize config options in multiq3_attach() Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 26/29] comedi: check devices attached status in compat ioctls Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 27/29] staging: rtl8723bs: fix out-of-bounds read in rtw_get_ie() parser Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 28/29] staging: rtl8723bs: fix stack buffer overflow in OnAssocReq IE parsing Greg Kroah-Hartman
2025-12-10  7:30 ` [PATCH 6.18 29/29] staging: rtl8723bs: fix out-of-bounds read in OnBeacon ESR " Greg Kroah-Hartman
2025-12-10 10:15 ` [PATCH 6.18 00/29] 6.18.1-rc1 review Brett A C Sheffield
2025-12-10 12:17 ` Takeshi Ogasawara
2025-12-10 13:18 ` Jeffrin Thalakkottoor
2025-12-10 14:01 ` Achill Gilgenast
2025-12-10 14:32 ` Peter Schneider
2025-12-10 19:42 ` Florian Fainelli
2025-12-10 21:02   ` Dileep malepu
2025-12-10 21:49   ` Ronald Warsow
2025-12-10 20:43 ` Hardik Garg
2025-12-10 21:54 ` Ron Economos
2025-12-11  6:44 ` Naresh Kamboju
2025-12-11  9:02 ` Mark Brown
2025-12-12  9:25 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251210072944.652611812@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=osandov@fb.com \
    --cc=patches@lists.linux.dev \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).