* [PATCH kernel] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest
@ 2023-09-25 4:23 Alexey Kardashevskiy
2023-09-25 8:41 ` Ingo Molnar
0 siblings, 1 reply; 4+ messages in thread
From: Alexey Kardashevskiy @ 2023-09-25 4:23 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Tom Lendacky, Alexey Kardashevskiy
For certain intercepts an SNP guest uses the GHCB protocol to talk to
the hypervisor from the #VC handler. The protocol requires a shared page so
there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI
handler triggers a #VC, there is another "backup" GHCB page which stores
the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent.
The vc_raw_handle_exception() handler manages main and backup GHCB pages
via __sev_get_ghcb/__sev_put_ghcb.
This works fine for #VC and occasional NMIs. This does not work so fine if
the #VC handler causes intercept + another #VC, if NMI arrives during
the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE.
The problem place is the #VC CPUID handler. Running perf in the SNP guest
crashes with:
Kernel panic - not syncing: Unable to handle #VC exception! GHCB and Backup GHCB are already in use
vc_raw_handle_exception #1: exit_code 72 (CPUID) eax d ecx 1
We lock the main GHCB and while it is locked we get to
snp_cpuid_postprocess() which executes "rdmsr" of MSR_IA32_XSS==0xda0 which
triggers:
vc_raw_handle_exception #2: exit_code 7c (MSR) ecx da0
Here we lock the backup ghcb.
And then PMC NMI comes which cannot complete as there is no GHCB page left
to use:
CPU: 5 PID: 566 Comm: touch Not tainted 6.5.0-rc2-aik-ad9c-g7413e71d3dcf-dirty #27
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown
Call Trace:
<NMI>
dump_stack_lvl+0x44/0x60
panic+0x222/0x310
____sev_get_ghcb+0x21e/0x220
__sev_es_nmi_complete+0x28/0xf0
exc_nmi+0x1ac/0x1c0
end_repeat_nmi+0x16/0x67
...
</NMI>
<TASK>
vc_raw_handle_exception+0x9e/0x2c0
kernel_exc_vmm_communication+0x4d/0xa0
asm_exc_vmm_communication+0x31/0x60
RIP: 0010:snp_cpuid+0x2ad/0x420
Drop one #VC by replacing "rdmsr" with GHCB's VMGEXIT to read the value from
the hypervisor.
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Cc: x86@kernel.org
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
This is made on top of (which has the "efi/unaccepted: Make sure unaccepted table is mapped"
fix for booting SNP):
b996cbe1203c (tip/master) 15 hours ago Ingo Molnar Merge branch into tip/master: 'x86/tdx'
plus:
https://lore.kernel.org/lkml/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com/
---
arch/x86/kernel/sev-shared.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index e73c90c9cc5b..399219de5a9b 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -477,11 +477,19 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
if (leaf->subfn == 1) {
/* Get XSS value if XSAVES is enabled. */
if (leaf->eax & BIT(3)) {
- unsigned long lo, hi;
-
- asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
- : "c" (MSR_IA32_XSS));
- xss = (hi << 32) | lo;
+ /*
+ * Since we're here, it is SNP and rdmsr will trigger
+ * another #VC and waste one of just two GHCB pages.
+ * Skip the intercept and do direct hypercall.
+ */
+ ghcb_set_rcx(ghcb, MSR_IA32_XSS);
+ if (sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0) != ES_OK)
+ return -EINVAL;
+
+ xss = (ghcb->save.rdx << 32) | ghcb->save.rax;
+
+ /* Invalidate qwords for likely another following GHCB call */
+ vc_ghcb_invalidate(ghcb);
}
/*
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH kernel] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest 2023-09-25 4:23 [PATCH kernel] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest Alexey Kardashevskiy @ 2023-09-25 8:41 ` Ingo Molnar 2023-09-25 9:23 ` Alexey Kardashevskiy 0 siblings, 1 reply; 4+ messages in thread From: Ingo Molnar @ 2023-09-25 8:41 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: x86, linux-kernel, Tom Lendacky * Alexey Kardashevskiy <aik@amd.com> wrote: > For certain intercepts an SNP guest uses the GHCB protocol to talk to > the hypervisor from the #VC handler. The protocol requires a shared page so > there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI > handler triggers a #VC, there is another "backup" GHCB page which stores > the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. > The vc_raw_handle_exception() handler manages main and backup GHCB pages > via __sev_get_ghcb/__sev_put_ghcb. > > This works fine for #VC and occasional NMIs. This does not work so fine if > the #VC handler causes intercept + another #VC, if NMI arrives during > the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. > The problem place is the #VC CPUID handler. Running perf in the SNP guest > crashes with: > > Kernel panic - not syncing: Unable to handle #VC exception! GHCB and Backup GHCB are already in use > > vc_raw_handle_exception #1: exit_code 72 (CPUID) eax d ecx 1 > We lock the main GHCB and while it is locked we get to > snp_cpuid_postprocess() which executes "rdmsr" of MSR_IA32_XSS==0xda0 which > triggers: > > vc_raw_handle_exception #2: exit_code 7c (MSR) ecx da0 > Here we lock the backup ghcb. > > And then PMC NMI comes which cannot complete as there is no GHCB page left > to use: > > CPU: 5 PID: 566 Comm: touch Not tainted 6.5.0-rc2-aik-ad9c-g7413e71d3dcf-dirty #27 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown > Call Trace: > <NMI> > dump_stack_lvl+0x44/0x60 > panic+0x222/0x310 > ____sev_get_ghcb+0x21e/0x220 > __sev_es_nmi_complete+0x28/0xf0 > exc_nmi+0x1ac/0x1c0 > end_repeat_nmi+0x16/0x67 > ... > </NMI> > <TASK> > vc_raw_handle_exception+0x9e/0x2c0 > kernel_exc_vmm_communication+0x4d/0xa0 > asm_exc_vmm_communication+0x31/0x60 > RIP: 0010:snp_cpuid+0x2ad/0x420 > > Drop one #VC by replacing "rdmsr" with GHCB's VMGEXIT to read the value from > the hypervisor. > > Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers") > Cc: x86@kernel.org > Signed-off-by: Alexey Kardashevskiy <aik@amd.com> > --- > > This is made on top of (which has the "efi/unaccepted: Make sure unaccepted table is mapped" > fix for booting SNP): > b996cbe1203c (tip/master) 15 hours ago Ingo Molnar Merge branch into tip/master: 'x86/tdx' > > plus: > https://lore.kernel.org/lkml/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com/ > --- > arch/x86/kernel/sev-shared.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c > index e73c90c9cc5b..399219de5a9b 100644 > --- a/arch/x86/kernel/sev-shared.c > +++ b/arch/x86/kernel/sev-shared.c > @@ -477,11 +477,19 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt, > if (leaf->subfn == 1) { > /* Get XSS value if XSAVES is enabled. */ > if (leaf->eax & BIT(3)) { > - unsigned long lo, hi; > - > - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) > - : "c" (MSR_IA32_XSS)); > - xss = (hi << 32) | lo; > + /* > + * Since we're here, it is SNP and rdmsr will trigger > + * another #VC and waste one of just two GHCB pages. > + * Skip the intercept and do direct hypercall. > + */ > + ghcb_set_rcx(ghcb, MSR_IA32_XSS); > + if (sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0) != ES_OK) > + return -EINVAL; > + > + xss = (ghcb->save.rdx << 32) | ghcb->save.rax; > + > + /* Invalidate qwords for likely another following GHCB call */ > + vc_ghcb_invalidate(ghcb); Ok, so I agree with this fix, but could you please reduce the ugliness of this open-coded RDMSR by factoring out this sequence into a new helper, called rdmsr_GHCB() or so, with a similar signature as rdmsr(), where rdmsr_GHCB() emulates RDMSR behavior via a hypercall? That makes this workaround to reduce nesting a lot easier to read & maintain in the longer run. Thanks, Ingo ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH kernel] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest 2023-09-25 8:41 ` Ingo Molnar @ 2023-09-25 9:23 ` Alexey Kardashevskiy 2023-09-25 10:03 ` Ingo Molnar 0 siblings, 1 reply; 4+ messages in thread From: Alexey Kardashevskiy @ 2023-09-25 9:23 UTC (permalink / raw) To: Ingo Molnar; +Cc: x86, linux-kernel, Tom Lendacky On 25/9/23 18:41, Ingo Molnar wrote: > > * Alexey Kardashevskiy <aik@amd.com> wrote: > >> For certain intercepts an SNP guest uses the GHCB protocol to talk to >> the hypervisor from the #VC handler. The protocol requires a shared page so >> there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI >> handler triggers a #VC, there is another "backup" GHCB page which stores >> the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. >> The vc_raw_handle_exception() handler manages main and backup GHCB pages >> via __sev_get_ghcb/__sev_put_ghcb. >> >> This works fine for #VC and occasional NMIs. This does not work so fine if >> the #VC handler causes intercept + another #VC, if NMI arrives during >> the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. >> The problem place is the #VC CPUID handler. Running perf in the SNP guest >> crashes with: >> >> Kernel panic - not syncing: Unable to handle #VC exception! GHCB and Backup GHCB are already in use >> >> vc_raw_handle_exception #1: exit_code 72 (CPUID) eax d ecx 1 >> We lock the main GHCB and while it is locked we get to >> snp_cpuid_postprocess() which executes "rdmsr" of MSR_IA32_XSS==0xda0 which >> triggers: >> >> vc_raw_handle_exception #2: exit_code 7c (MSR) ecx da0 >> Here we lock the backup ghcb. >> >> And then PMC NMI comes which cannot complete as there is no GHCB page left >> to use: >> >> CPU: 5 PID: 566 Comm: touch Not tainted 6.5.0-rc2-aik-ad9c-g7413e71d3dcf-dirty #27 >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown >> Call Trace: >> <NMI> >> dump_stack_lvl+0x44/0x60 >> panic+0x222/0x310 >> ____sev_get_ghcb+0x21e/0x220 >> __sev_es_nmi_complete+0x28/0xf0 >> exc_nmi+0x1ac/0x1c0 >> end_repeat_nmi+0x16/0x67 >> ... >> </NMI> >> <TASK> >> vc_raw_handle_exception+0x9e/0x2c0 >> kernel_exc_vmm_communication+0x4d/0xa0 >> asm_exc_vmm_communication+0x31/0x60 >> RIP: 0010:snp_cpuid+0x2ad/0x420 >> >> Drop one #VC by replacing "rdmsr" with GHCB's VMGEXIT to read the value from >> the hypervisor. >> >> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers") >> Cc: x86@kernel.org >> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >> --- >> >> This is made on top of (which has the "efi/unaccepted: Make sure unaccepted table is mapped" >> fix for booting SNP): >> b996cbe1203c (tip/master) 15 hours ago Ingo Molnar Merge branch into tip/master: 'x86/tdx' >> >> plus: >> https://lore.kernel.org/lkml/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com/ >> --- >> arch/x86/kernel/sev-shared.c | 18 +++++++++++++----- >> 1 file changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c >> index e73c90c9cc5b..399219de5a9b 100644 >> --- a/arch/x86/kernel/sev-shared.c >> +++ b/arch/x86/kernel/sev-shared.c >> @@ -477,11 +477,19 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt, >> if (leaf->subfn == 1) { >> /* Get XSS value if XSAVES is enabled. */ >> if (leaf->eax & BIT(3)) { >> - unsigned long lo, hi; >> - >> - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) >> - : "c" (MSR_IA32_XSS)); >> - xss = (hi << 32) | lo; >> + /* >> + * Since we're here, it is SNP and rdmsr will trigger >> + * another #VC and waste one of just two GHCB pages. >> + * Skip the intercept and do direct hypercall. >> + */ >> + ghcb_set_rcx(ghcb, MSR_IA32_XSS); >> + if (sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0) != ES_OK) >> + return -EINVAL; >> + >> + xss = (ghcb->save.rdx << 32) | ghcb->save.rax; >> + >> + /* Invalidate qwords for likely another following GHCB call */ >> + vc_ghcb_invalidate(ghcb); > > Ok, so I agree with this fix, but could you please reduce the ugliness > of this open-coded RDMSR by factoring out this sequence into a new > helper, called rdmsr_GHCB() or so, with a similar signature as > rdmsr(), where rdmsr_GHCB() emulates RDMSR behavior via a hypercall? > > That makes this workaround to reduce nesting a lot easier to read & maintain > in the longer run. Currently it is: #define rdmsr(msr, low, high) Will the below signature do? #define rdmsr_GHCB(msr, low, high, ghcb, ctxt) {( \ int __ret; \ ghcb_set_rcx((ghcb), (msr)); \ __ret = sev_es_ghcb_hv_call((ghcb), (ctxt), SVM_EXIT_MSR, 0, 0); \ if (__ret = ES_OK) { \ low = (ghcb)->save.rax; \ high = (ghcb)->save.rdx; \ } \ __ret; )} rdmsr() does not return a value but rdmsr_GHCB() has to, can even make it static inline function, should I? Thanks, > > Thanks, > > Ingo -- Alexey ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH kernel] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest 2023-09-25 9:23 ` Alexey Kardashevskiy @ 2023-09-25 10:03 ` Ingo Molnar 0 siblings, 0 replies; 4+ messages in thread From: Ingo Molnar @ 2023-09-25 10:03 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: x86, linux-kernel, Tom Lendacky * Alexey Kardashevskiy <aik@amd.com> wrote: > > That makes this workaround to reduce nesting a lot easier to read & maintain in the > > longer run. > > Currently it is: > > #define rdmsr(msr, low, high) > > Will the below signature do? > > #define rdmsr_GHCB(msr, low, high, ghcb, ctxt) {( \ > int __ret; \ > ghcb_set_rcx((ghcb), (msr)); \ > __ret = sev_es_ghcb_hv_call((ghcb), (ctxt), SVM_EXIT_MSR, 0, 0); \ > if (__ret = ES_OK) { \ > low = (ghcb)->save.rax; \ > high = (ghcb)->save.rdx; \ > } \ > __ret; )} Sounds about right. Small nit: please prettify the macro a bit for readability, ie. something like: #define rdmsr_safe_GHCB(msr, low, high, ghcb, ctxt) {( \ int __ret; \ \ ghcb_set_rcx((ghcb), (msr)); \ __ret = sev_es_ghcb_hv_call((ghcb), (ctxt), SVM_EXIT_MSR, 0, 0); \ if (__ret = ES_OK) { \ low = (ghcb)->save.rax; \ high = (ghcb)->save.rdx; \ } \ __ret; )} > rdmsr() does not return a value but rdmsr_GHCB() has to, can even make it > static inline function, should I? It's a bit like rdmsr_safe()? My above naming reflects this. I'd keep it a CPP macro, it's simple enough, the other MSR accessors are doing it too, and to make sure this isn't instrumented the wrong way... Thanks, Ingo ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-09-25 10:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-25 4:23 [PATCH kernel] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest Alexey Kardashevskiy 2023-09-25 8:41 ` Ingo Molnar 2023-09-25 9:23 ` Alexey Kardashevskiy 2023-09-25 10:03 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox