* [PATCH v2] KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced
@ 2025-10-29 21:50 Omar Sandoval
2025-10-31 22:44 ` Sean Christopherson
0 siblings, 1 reply; 3+ messages in thread
From: Omar Sandoval @ 2025-10-29 21:50 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, kvm; +Cc: Gregory Price, kernel-team
From: Omar Sandoval <osandov@fb.com>
When re-injecting a soft interrupt from an INT3, INT0, or (select) INTn
instruction, discard the exception and retry the instruction if the code
stream is changed (e.g. by a different vCPU) between when the CPU
executes the instruction and when KVM decodes the instruction to get the
next RIP.
As effectively predicted by commit 6ef88d6e36c2 ("KVM: SVM: Re-inject
INT3/INTO instead of retrying the instruction"), failure to verify that
the correct INTn instruction was decoded can effectively clobber guest
state due to decoding the wrong instruction and thus specifying the
wrong next RIP.
The bug most often manifests as "Oops: int3" panics on static branch
checks in Linux guests. Enabling or disabling a static branch in Linux
uses the kernel's "text poke" code patching mechanism. To modify code
while other CPUs may be executing that code, Linux (temporarily)
replaces the first byte of the original instruction with an int3 (opcode
0xcc), then patches in the new code stream except for the first byte,
and finally replaces the int3 with the first byte of the new code
stream. If a CPU hits the int3, i.e. executes the code while it's being
modified, then the guest kernel must look up the RIP to determine how to
handle the #BP, e.g. by emulating the new instruction. If the RIP is
incorrect, then this lookup fails and the guest kernel panics.
The bug reproduces almost instantly by hacking the guest kernel to
repeatedly check a static branch[1] while running a drgn script[2] on
the host to constantly swap out the memory containing the guest's TSS.
[1]: https://gist.github.com/osandov/44d17c51c28c0ac998ea0334edf90b5a
[2]: https://gist.github.com/osandov/10e45e45afa29b11e0c7209247afc00b
Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Cc: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Changes from v1 (https://lore.kernel.org/all/71043b76fc073af0fb27493a8e8d7f38c3c782c0.1761606191.git.osandov@fb.com/):
- Incorporated Sean's suggested commit message (with some edits) and
approach for implementing this in the generic KVM code instead (adding
a comment for EMULTYPE_SKIP_SOFT_INT).
- Rebased on Linus's tree as of e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6.
arch/x86/include/asm/kvm_host.h | 9 +++++++++
arch/x86/kvm/svm/svm.c | 24 +++++++++++++-----------
arch/x86/kvm/x86.c | 21 +++++++++++++++++++++
3 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48598d017d6f..f0a61615b67b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2143,6 +2143,11 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
* the gfn, i.e. retrying the instruction will hit a
* !PRESENT fault, which results in a new shadow page
* and sends KVM back to square one.
+ *
+ * EMULTYPE_SKIP_SOFT_INT - Set in combination with EMULTYPE_SKIP to only skip
+ * an instruction if it could generate a given software
+ * interrupt, which must be encoded via
+ * EMULTYPE_SET_SOFT_INT_VECTOR().
*/
#define EMULTYPE_NO_DECODE (1 << 0)
#define EMULTYPE_TRAP_UD (1 << 1)
@@ -2153,6 +2158,10 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
#define EMULTYPE_PF (1 << 6)
#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
#define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
+#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
+
+#define EMULTYPE_SET_SOFT_INT_VECTOR(v) (((v) & 0xff) << 16)
+#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
static inline bool kvm_can_emulate_event_vectoring(int emul_type)
{
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 153c12dbf3eb..a675585e6b1d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -272,6 +272,7 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
}
static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
+ int emul_type,
bool commit_side_effects)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -293,7 +294,7 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
if (unlikely(!commit_side_effects))
old_rflags = svm->vmcb->save.rflags;
- if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+ if (!kvm_emulate_instruction(vcpu, emul_type))
return 0;
if (unlikely(!commit_side_effects))
@@ -311,11 +312,13 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
- return __svm_skip_emulated_instruction(vcpu, true);
+ return __svm_skip_emulated_instruction(vcpu, EMULTYPE_SKIP, true);
}
-static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu)
+static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu, u8 vector)
{
+ const int emul_type = EMULTYPE_SKIP | EMULTYPE_SKIP_SOFT_INT |
+ EMULTYPE_GET_SOFT_INT_VECTOR(vector);
unsigned long rip, old_rip = kvm_rip_read(vcpu);
struct vcpu_svm *svm = to_svm(vcpu);
@@ -331,7 +334,7 @@ static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu)
* in use, the skip must not commit any side effects such as clearing
* the interrupt shadow or RFLAGS.RF.
*/
- if (!__svm_skip_emulated_instruction(vcpu, !nrips))
+ if (!__svm_skip_emulated_instruction(vcpu, emul_type, !nrips))
return -EIO;
rip = kvm_rip_read(vcpu);
@@ -367,7 +370,7 @@ static void svm_inject_exception(struct kvm_vcpu *vcpu)
kvm_deliver_exception_payload(vcpu, ex);
if (kvm_exception_is_soft(ex->vector) &&
- svm_update_soft_interrupt_rip(vcpu))
+ svm_update_soft_interrupt_rip(vcpu, ex->vector))
return;
svm->vmcb->control.event_inj = ex->vector
@@ -3637,11 +3640,12 @@ static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)
static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
{
+ struct kvm_queued_interrupt *intr = &vcpu->arch.interrupt;
struct vcpu_svm *svm = to_svm(vcpu);
u32 type;
- if (vcpu->arch.interrupt.soft) {
- if (svm_update_soft_interrupt_rip(vcpu))
+ if (intr->soft) {
+ if (svm_update_soft_interrupt_rip(vcpu, intr->nr))
return;
type = SVM_EVTINJ_TYPE_SOFT;
@@ -3649,12 +3653,10 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
type = SVM_EVTINJ_TYPE_INTR;
}
- trace_kvm_inj_virq(vcpu->arch.interrupt.nr,
- vcpu->arch.interrupt.soft, reinjected);
+ trace_kvm_inj_virq(intr->nr, intr->soft, reinjected);
++vcpu->stat.irq_injections;
- svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr |
- SVM_EVTINJ_VALID | type;
+ svm->vmcb->control.event_inj = intr->nr | SVM_EVTINJ_VALID | type;
}
void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4b5d2d09634..9dc66cca154d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9338,6 +9338,23 @@ static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt)
return false;
}
+static bool is_soft_int_instruction(struct x86_emulate_ctxt *ctxt,
+ int emulation_type)
+{
+ u8 vector = EMULTYPE_GET_SOFT_INT_VECTOR(emulation_type);
+
+ switch (ctxt->b) {
+ case 0xcc:
+ return vector == BP_VECTOR;
+ case 0xcd:
+ return vector == ctxt->src.val;
+ case 0xce:
+ return vector == OF_VECTOR;
+ default:
+ return false;
+ }
+}
+
/*
* Decode an instruction for emulation. The caller is responsible for handling
* code breakpoints. Note, manually detecting code breakpoints is unnecessary
@@ -9448,6 +9465,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
* injecting single-step #DBs.
*/
if (emulation_type & EMULTYPE_SKIP) {
+ if (emulation_type & EMULTYPE_SKIP_SOFT_INT &&
+ !is_soft_int_instruction(ctxt, emulation_type))
+ return 0;
+
if (ctxt->mode != X86EMUL_MODE_PROT64)
ctxt->eip = (u32)ctxt->_eip;
else
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced
2025-10-29 21:50 [PATCH v2] KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced Omar Sandoval
@ 2025-10-31 22:44 ` Sean Christopherson
2025-11-04 17:29 ` Omar Sandoval
0 siblings, 1 reply; 3+ messages in thread
From: Sean Christopherson @ 2025-10-31 22:44 UTC (permalink / raw)
To: Omar Sandoval; +Cc: Paolo Bonzini, kvm, Gregory Price, kernel-team
On Wed, Oct 29, 2025, Omar Sandoval wrote:
> @@ -2153,6 +2158,10 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
> #define EMULTYPE_PF (1 << 6)
> #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
> #define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
> +#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
> +
> +#define EMULTYPE_SET_SOFT_INT_VECTOR(v) (((v) & 0xff) << 16)
> +#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
>
> static inline bool kvm_can_emulate_event_vectoring(int emul_type)
> {
...
> +static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu, u8 vector)
> {
> + const int emul_type = EMULTYPE_SKIP | EMULTYPE_SKIP_SOFT_INT |
> + EMULTYPE_GET_SOFT_INT_VECTOR(vector);
Apparently our friendly neighborhood test bots[*] are the only ones that tested
this :-)
This should be EMULTYPE_SET_SOFT_INT_VECTOR()
^
|
And I suspect EMULTYPE_SET_SOFT_INT_VECTOR() needs to cast (v) to a u32 so as
not to overflow the shift.
[*] https://lore.kernel.org/all/202510310909.y5ClH2qW-lkp@intel.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced
2025-10-31 22:44 ` Sean Christopherson
@ 2025-11-04 17:29 ` Omar Sandoval
0 siblings, 0 replies; 3+ messages in thread
From: Omar Sandoval @ 2025-11-04 17:29 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, Gregory Price, kernel-team
On Fri, Oct 31, 2025 at 03:44:13PM -0700, Sean Christopherson wrote:
> On Wed, Oct 29, 2025, Omar Sandoval wrote:
> > @@ -2153,6 +2158,10 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
> > #define EMULTYPE_PF (1 << 6)
> > #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
> > #define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
> > +#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
> > +
> > +#define EMULTYPE_SET_SOFT_INT_VECTOR(v) (((v) & 0xff) << 16)
> > +#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
> >
> > static inline bool kvm_can_emulate_event_vectoring(int emul_type)
> > {
>
> ...
>
> > +static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu, u8 vector)
> > {
> > + const int emul_type = EMULTYPE_SKIP | EMULTYPE_SKIP_SOFT_INT |
> > + EMULTYPE_GET_SOFT_INT_VECTOR(vector);
>
> Apparently our friendly neighborhood test bots[*] are the only ones that tested
> this :-)
>
> This should be EMULTYPE_SET_SOFT_INT_VECTOR()
> ^
> |
Oof, I'm disappointed that I missed that. FWIW, I did test this with my
reproducer, and it passed, only because this buggy version will always
retry an INT3. I don't have a test case for the original bug that commit
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
instruction") fixed.
> And I suspect EMULTYPE_SET_SOFT_INT_VECTOR() needs to cast (v) to a u32 so as
> not to overflow the shift.
u8 (unsigned char) gets promoted to int before the shift, so this won't
overflow, but we might as well make smatch happy.
I'll send a new version.
Thanks,
Omar
> [*] https://lore.kernel.org/all/202510310909.y5ClH2qW-lkp@intel.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-04 17:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29 21:50 [PATCH v2] KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced Omar Sandoval
2025-10-31 22:44 ` Sean Christopherson
2025-11-04 17:29 ` Omar Sandoval
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).