From: Sean Christopherson <seanjc@google.com>
To: Srikanth Aithal <sraithal@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>,
Paolo Bonzini <pbonzini@redhat.com>,
open list <linux-kernel@vger.kernel.org>,
KVM <kvm@vger.kernel.org>
Subject: Re: SEV-ES guest shutdown: linux-next regression with QEMU 10.2.2; smp>1
Date: Tue, 7 Apr 2026 13:38:02 -0700 [thread overview]
Message-ID: <adVrKvvDZ-Ca29rb@google.com> (raw)
In-Reply-To: <adVLL0NKFN_2rPND@google.com>
On Tue, Apr 07, 2026, Sean Christopherson wrote:
> On Tue, Apr 07, 2026, Srikanth Aithal wrote:
> > On 4/1/2026 9:48 PM, Sean Christopherson wrote:
> > > > - With older host kernels (< next-20260304) + QEMU 10.2.2 → clean shutdown
> > > > (no hang, no termination message, QEMU exits normally).
> > > > - With linux-next (next-20260331) + QEMU 10.2.2 → hang at the register dump
> > > > after "reboot: Power down"; only Ctrl+C triggers the "SEV-ES guest requested
> > > > termination: 0x0:0x0" message.
> > > > - With linux-next + QEMU master (or 10.2.2 + cherry-pick of 56d89db2cfd8) →
> > > > no hang (the termination is converted to a guest panic instead).
> > >
> > > What guest kernel are you using? Bisecting to that commit for just the *host*
> > > kernel is baffling. I could see it preventing KVM from loading or something, but
> > > it should be completely out of scope with respect to guest activity.
> > >
> > > How are you initiating shutdown withing the guest? What's the full QEMU command
> > > line?
> > >
> > > Can you also provide the OVMF image? E.g. in case the hang occurs in EFI runtime
> > > services or something.
> > >
> > > I want to get this sorted out before the merge window and so would prefer not to
> > > delay root causing this by a week or more.
> >
> >
> > The bisection was actually performed on the guest kernel (not the host).
> > Sorry for the confusion.
>
> LOL, that changes things slightly. I reproduced this on my end (finally built
> a new version of OVMF that supports SEV-ES). I'll take it from here.
Ok, this is hilarious. TL;DR: Bugs for everyone!
QEMU is flawed and advertises SVM support to the guest even though it can't be
supported in practice. As per my side note when I posted the patch for commit
ccd85d90ce09 ("KVM: SVM: Treat SVM as unsupported when running as an SEV guest"),
I _did_ get a nested guest running under SEV:
: FWIW, I did get nested SVM working on SEV by decrypting all structures
: that are shadowed by L0, albeit with many restrictions. So even though
: there's unlikely to be a legitimate use case, I don't think KVM (as L0)
: needs to be changed to disallow nSVM for SEV guests, userspace is
: ultimately the one that should hide SVM from L1.
And so didn't modify KVM (as L0) to go out of its way to prevent advertising SVM
to SEV+ guests.
But while hacking KVM (as L1) to support running as an SEV guest is feasible
(and absolutely ridiculous), running as an SEV-ES guest is outright impossible
(without a paravirt interface) as emulating VMLOAD, VMSAVE, VMRUN, etc. requires
access to guest register state.
Anyways, before commit 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton
logic to virt subsystem"), the "emergency" callbacks would run if and only if
KVM was fully loaded. And thanks to commit ccd85d90ce09, KVM would refuse to
load when running as an SEV+ guest.
When the emergency code got moved to core kernel, I forgot about the whole "SVM
might be advertised to SEV+ guests" wrinkle, and so x86_svm_init() can succeed
because it only looks for X86_FEATURE_SVM. Which _should_ be "fine", because
absent a downstream user like KVM, EFER.SVME should never be set and thus
x86_svm_disable_virtualization_cpu() should be an expensive nop.
Except SEV-ES (the architecture) has a nasty little virtualization hole due to
the requirement that interception of EFER (and other MSRs) be disabled (because
the untrusted hypervisor can't set e.g. EFER.LME on behalf o the guest). The
hole is that hardware doesn't hide EFER.SVME, and so the SEV-ES guest sees
EFER.SVME=1, thinks it has enabled virtualization, and attempts STGI. Which
also _should_ be fine.
But wait, there's more! In its infinite paranoia, the #VC handler doesn't
consider the possibiliyt that maybe, just maybe, the kernel may execute an
instruction that triggers a #VC and isn't known to vc_check_opcode_bytes(). And
so attempting to execute STGI escalates what should be a benign #UD (thanks to
exception fixup) into a termination. Which is quite frustrating, as the immediate
termination makes debugging painful as the VM grinds to a halt before anything
actually gets sent to dmesg. *sigh*
Note, the saving grace in all of this is that SEV-ES-the-hardware isn't completely
hosed: it silently ignores attempts to clear EFER.SVME from the guest (clearing
EFER.SVME while in guest mode is architecturally undefined behavior).
So, the immediate fix is to treat SVM as unsupported if SEV+ is enabled.
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index c898f16fe612..f647557d38ac 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -269,7 +269,8 @@ static __init int x86_svm_init(void)
.emergency_disable_virtualization_cpu = x86_svm_emergency_disable_virtualization_cpu,
};
- if (!cpu_feature_enabled(X86_FEATURE_SVM))
+ if (!cpu_feature_enabled(X86_FEATURE_SVM) ||
+ cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return -EOPNOTSUPP;
memcpy(&virt_ops, &svm_ops, sizeof(virt_ops));
Alternatively, we could force-clear X86_FEATURE_SVM, but the above is much safer
as a fixup for 7.1
Longer term, IMO we need something like this as well:
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index 58b2f985d546..3c15b77427d2 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -92,7 +92,8 @@ static enum es_result vc_check_opcode_bytes(struct es_em_ctxt *ctxt,
sev_printk(KERN_ERR "Wrong/unhandled opcode bytes: 0x%x, exit_code: 0x%lx, rIP: 0x%lx\n",
opcode, exit_code, ctxt->regs->ip);
- return ES_UNSUPPORTED;
+ ctxt->fi.vector = UD_VECTOR;
+ return ES_EXCEPTION;
}
static bool vc_decoding_needed(unsigned long exit_code)
so that unknown instructions get "normal" handling instead of immediate death.
[*] https://lore.kernel.org/all/20210202212017.2486595-1-seanjc@google.com
prev parent reply other threads:[~2026-04-07 20:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 16:08 SEV-ES guest shutdown: linux-next regression with QEMU 10.2.2; smp>1 Aithal, Srikanth
2026-03-20 16:11 ` Paolo Bonzini
2026-03-23 13:10 ` Tom Lendacky
2026-04-01 11:24 ` Aithal, Srikanth
2026-04-01 12:53 ` Tom Lendacky
2026-04-01 14:19 ` Aithal, Srikanth
2026-04-01 16:18 ` Sean Christopherson
2026-04-01 17:39 ` Tom Lendacky
2026-04-07 8:51 ` Aithal, Srikanth
2026-04-07 18:21 ` Sean Christopherson
2026-04-07 20:38 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adVrKvvDZ-Ca29rb@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=sraithal@amd.com \
--cc=thomas.lendacky@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox