From: Sean Christopherson <seanjc@google.com>
To: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Wu Zongyo <wuzongyo@mail.ustc.edu.cn>
Subject: Re: [PATCH 0/2] KVM: SVM: Fix unexpected #UD on INT3 in SEV guests
Date: Wed, 23 Aug 2023 13:03:33 -0700 [thread overview]
Message-ID: <ZOZmFe7MT7zwrf/c@google.com> (raw)
In-Reply-To: <d183c3f2-d94d-5f22-184d-eab80f9d0fe8@amd.com>
On Wed, Aug 23, 2023, Tom Lendacky wrote:
> On 8/22/23 10:14, Sean Christopherson wrote:
> > On Tue, Aug 22, 2023, Tom Lendacky wrote:
> > > On 8/10/23 18:49, Sean Christopherson wrote:
> > > > Fix a bug where KVM injects a bogus #UD for SEV guests when trying to skip
> > > > an INT3 as part of re-injecting the associated #BP that got kinda sorta
> > > > intercepted due to a #NPF occuring while vectoring/delivering the #BP.
> > > >
> > > > I haven't actually confirmed that patch 1 fixes the bug, as it's a
> > > > different change than what I originally proposed. I'm 99% certain it will
> > > > work, but I definitely need verification that it fixes the problem
> > > >
> > > > Patch 2 is a tangentially related cleanup to make NRIPS a requirement for
> > > > enabling SEV, e.g. so that we don't ever get "bug" reports of SEV guests
> > > > not working when NRIPS is disabled.
> > > >
> > > > Sean Christopherson (2):
> > > > KVM: SVM: Don't inject #UD if KVM attempts emulation of SEV guest w/o
> > > > insn
> > > > KVM: SVM: Require nrips support for SEV guests (and beyond)
> > > >
> > > > arch/x86/kvm/svm/sev.c | 2 +-
> > > > arch/x86/kvm/svm/svm.c | 37 ++++++++++++++++++++-----------------
> > > > arch/x86/kvm/svm/svm.h | 1 +
> > > > 3 files changed, 22 insertions(+), 18 deletions(-)
> > >
> > > We ran some stress tests against a version of the kernel without this fix
> > > and we're able to reproduce the issue, but not reliably, after a few hours.
> > > With this patch, it has not reproduced after running for a week.
> > >
> > > Not as reliable a scenario as the original reporter, but this looks like it
> > > resolves the issue.
> >
> > Thanks Tom! I'll apply this for v6.6, that'll give us plenty of time to change
> > course if necessary.
>
> I may have spoke to soon... When the #UD was triggered it was here:
>
> [ 0.118524] Spectre V2 : Enabling Restricted Speculation for firmware calls
> [ 0.118524] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
> [ 0.118524] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
> [ 0.118524] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 0.118524] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.2-amdsos-build50-ubuntu-20.04+ #1
> [ 0.118524] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 0.118524] RIP: 0010:int3_selftest_ip+0x0/0x60
> [ 0.118524] Code: b9 25 05 00 00 48 c7 c2 e8 7c 80 b0 48 c7 c6 fe 1c d3 b0 48 c7 c7 f0 7d da b0 e8 4c 2c 0b ff e8 75 da 15 ff 0f 0b 48 8d 7d f4 <cc> 90 90 90 90 83 7d f4 01 74 2f 80 3d 39 7f a8 00 00 74 24 b9 34
>
>
> Now (after about a week) we've encountered a hang here:
>
> [ 0.106216] Spectre V2 : Enabling Restricted Speculation for firmware calls
> [ 0.106216] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
> [ 0.106216] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
>
> It is in the very same spot and so I wonder if the return false (without
> queuing a #UD) is causing an infinite loop here that appears as a guest
> hang. Whereas, we have some systems running the first patch that you
> created that have not hit this hang.
>
> But I'm not sure why or how this patch could cause the guest hang. I
> would think that the retry of the instruction would resolve everything
> and the guest would continue. Unfortunately, the guest was killed, so I'll
> try to reproduce and get a dump or trace points of the VM to see what is
> going on.
Gah, it's because x86_emulate_instruction() returns '1' and not '0' when
svm_can_emulate_instruction() returns false. svm_update_soft_interrupt_rip()
would then continue with the injection, i.e. inject #BP on the INT3 RIP, not on
the RIP following the INT3, which would cause this check to fail
if (regs->ip - INT3_INSN_SIZE != selftest)
return NOTIFY_DONE;
and eventually send do_trap_no_signal() to die().
I distinctly remember seeing the return value problem when writing the patch, but
missed that it would result in KVM injecting the unexpected #BP.
I punted on trying to properly fix this by having can_emulate_instruction()
differentiate between "retry insn" and "inject exception", because that change
is painfully invasive and I though I could get away with the simple fix. Drat.
I think the best option is to add a "temporary" patch so that the fix for @stable
is short, sweet, and safe, and then do the can_emulate_instruction() cleanup that
I was avoiding.
E.g. this as patch 2/4 (or maybe 2/5) of this series:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7cb5ef5835c2..8457a36b44c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -364,6 +364,8 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
}
+static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
+ void *insn, int insn_len);
static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
bool commit_side_effects)
@@ -384,6 +386,14 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
}
if (!svm->next_rip) {
+ /*
+ * FIXME: Drop this when kvm_emulate_instruction() does the
+ * right thing and treats "can't emulate" as outright failure
+ * for EMULTYPE_SKIP.
+ */
+ if (!svm_can_emulate_instruction(vcpu, EMULTYPE_SKIP, NULL, 0))
+ return 0;
+
if (unlikely(!commit_side_effects))
old_rflags = svm->vmcb->save.rflags;
next prev parent reply other threads:[~2023-08-23 20:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 23:49 [PATCH 0/2] KVM: SVM: Fix unexpected #UD on INT3 in SEV guests Sean Christopherson
2023-08-10 23:49 ` [PATCH 1/2] KVM: SVM: Don't inject #UD if KVM attempts emulation of SEV guest w/o insn Sean Christopherson
2023-08-10 23:49 ` [PATCH 2/2] KVM: SVM: Require nrips support for SEV guests (and beyond) Sean Christopherson
2023-08-22 13:55 ` [PATCH 0/2] KVM: SVM: Fix unexpected #UD on INT3 in SEV guests Tom Lendacky
2023-08-22 15:14 ` Sean Christopherson
2023-08-23 16:18 ` Tom Lendacky
2023-08-23 20:03 ` Sean Christopherson [this message]
2023-08-23 20:36 ` Tom Lendacky
2023-08-23 21:12 ` Sean Christopherson
2023-08-24 16:08 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOZmFe7MT7zwrf/c@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=thomas.lendacky@amd.com \
--cc=wuzongyo@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.