public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: SVM: Don't skip unrelated instruction if INT3 is replaced
@ 2025-10-27 23:06 Omar Sandoval
  2025-10-28  0:24 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Omar Sandoval @ 2025-10-27 23:06 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, kvm; +Cc: Gregory Price, kernel-team

From: Omar Sandoval <osandov@fb.com>

We've been seeing guest VM kernel panics with "Oops: int3" coming from
static branch checks. I found that the reported RIP is one instruction
after where it's supposed to be, and I determined that when this
happens, the RIP was injected by __svm_skip_emulated_instruction() on
the host when a nested page fault was hit while vectoring an int3.

Static branches use the smp_text_poke mechanism, which works by
temporarily inserting an int3, then overwriting it. In the meantime,
smp_text_poke_int3_handler() relies on getting an accurate RIP.

This temporary int3 from smp_text_poke is triggering the exact scenario
described in the fixes commit: "if the guest executes INTn (no KVM
injection), an exit occurs while vectoring the INTn, and the guest
modifies the code stream while the exit is being handled, KVM will
compute the incorrect next_rip due to "skipping" the wrong instruction."

I'm able to reproduce this almost instantly by patching the guest kernel
to hammer on a static branch [1] while a drgn script [2] on the host
constantly swaps out the memory containing the guest's Task State
Segment.

The fixes commit also suggests a workaround: "A future enhancement to
make this less awful would be for KVM to detect that the decoded
instruction is not the correct INTn and drop the to-be-injected soft
event." This implements that workaround.

[1]: https://gist.github.com/osandov/44d17c51c28c0ac998ea0334edf90b5a
[2]: https://gist.github.com/osandov/10e45e45afa29b11e0c7209247afc00b

Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Based on Linus's current tree.

 arch/x86/kvm/svm/svm.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 153c12dbf3eb..4d72c308b50b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -271,8 +271,29 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 
 }
 
+static bool emulated_instruction_matches_vector(struct kvm_vcpu *vcpu,
+						unsigned int vector)
+{
+	switch (vector) {
+	case BP_VECTOR:
+		return vcpu->arch.emulate_ctxt->b == 0xcc ||
+		       (vcpu->arch.emulate_ctxt->b == 0xcd &&
+			vcpu->arch.emulate_ctxt->src.val == BP_VECTOR);
+	case OF_VECTOR:
+		return vcpu->arch.emulate_ctxt->b == 0xce;
+	default:
+		return false;
+	}
+}
+
+/*
+ * If vector != 0, then this skips the instruction only if the instruction could
+ * generate an interrupt with that vector. If not, then it fails, indicating
+ * that the instruction should be retried.
+ */
 static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
-					   bool commit_side_effects)
+					   bool commit_side_effects,
+					   unsigned int vector)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	unsigned long old_rflags;
@@ -293,8 +314,18 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
 		if (unlikely(!commit_side_effects))
 			old_rflags = svm->vmcb->save.rflags;
 
-		if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+		if (vector == 0) {
+			if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+				return 0;
+		} else if (x86_decode_emulated_instruction(vcpu, EMULTYPE_SKIP,
+							   NULL,
+							   0) != EMULATION_OK ||
+			   !emulated_instruction_matches_vector(vcpu, vector) ||
+			   !kvm_emulate_instruction(vcpu,
+						    EMULTYPE_SKIP |
+						    EMULTYPE_NO_DECODE)) {
 			return 0;
+		}
 
 		if (unlikely(!commit_side_effects))
 			svm->vmcb->save.rflags = old_rflags;
@@ -311,7 +342,7 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
 
 static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
-	return __svm_skip_emulated_instruction(vcpu, true);
+	return __svm_skip_emulated_instruction(vcpu, true, 0);
 }
 
 static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu)
@@ -331,7 +362,8 @@ static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu)
 	 * in use, the skip must not commit any side effects such as clearing
 	 * the interrupt shadow or RFLAGS.RF.
 	 */
-	if (!__svm_skip_emulated_instruction(vcpu, !nrips))
+	if (!__svm_skip_emulated_instruction(vcpu, !nrips,
+					     vcpu->arch.exception.vector))
 		return -EIO;
 
 	rip = kvm_rip_read(vcpu);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-29  0:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 23:06 [PATCH] KVM: SVM: Don't skip unrelated instruction if INT3 is replaced Omar Sandoval
2025-10-28  0:24 ` Sean Christopherson
2025-10-28  6:44   ` Omar Sandoval
2025-10-29  0:18     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox