kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
@ 2025-06-09 13:23 Andrey Zhadchenko
  2025-06-09 16:12 ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey Zhadchenko @ 2025-06-09 13:23 UTC (permalink / raw)
  To: pbonzini, zhao1.liu, mtosatti; +Cc: qemu-devel, kvm, den, andrey.drobyshev

When hotplugging vCPUs to the Windows vms, we observed strange instance
crash on Intel(R) Xeon(R) CPU E3-1230 v6:
panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff', arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'

Presumably, Windows thinks that hotplugged CPU is not "equivalent enough"
to the previous ones. The problem lies within msr 3a. During the startup,
Windows assigns some value to this register. During the hotplug it
expects similar value on the new vCPU in msr 3a. But by default it
is zero.

   CPU 0/KVM-16856   [007] .......   380.398695: kvm_msr: msr_read 3a = 0x0        <debug_before_write>
   CPU 0/KVM-16856   [007] .......   380.398696: kvm_msr: msr_write 3a = 0x40005
   CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_read 3a = 0x0        <debug_before_write>
   CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_write 3a = 0x40005
   CPU 2/KVM-16858   [006] .......   380.398963: kvm_msr: msr_read 3a = 0x0        <debug_before_write>
   CPU 2/KVM-16858   [006] .......   380.398964: kvm_msr: msr_write 3a = 0x40005
   CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_read 3a = 0x0        <debug_before_write>
   CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_write 3a = 0x40005
   CPU 0/KVM-16856   [001] .......   384.497714: kvm_msr: msr_read 3a = 0x40005
   CPU 0/KVM-16856   [001] .......   384.497716: kvm_msr: msr_read 3a = 0x40005
   CPU 1/KVM-16857   [007] .......   384.934791: kvm_msr: msr_read 3a = 0x40005
   CPU 1/KVM-16857   [007] .......   384.934793: kvm_msr: msr_read 3a = 0x40005
   CPU 2/KVM-16858   [002] .......   384.977871: kvm_msr: msr_read 3a = 0x40005
   CPU 2/KVM-16858   [002] .......   384.977873: kvm_msr: msr_read 3a = 0x40005
   CPU 3/KVM-16859   [006] .......   385.021217: kvm_msr: msr_read 3a = 0x40005
   CPU 3/KVM-16859   [006] .......   385.021220: kvm_msr: msr_read 3a = 0x40005
   CPU 4/KVM-17500   [002] .......   453.733743: kvm_msr: msr_read 3a = 0x0        <- new vcpu, Windows wants to see 0x40005 here instead of default value>
   CPU 4/KVM-17500   [002] .......   453.733745: kvm_msr: msr_read 3a = 0x0

Bit #18 probably means that Intel SGX is supported, because disabling
it via CPU arguments results is successfull hotplug (and msr value 0x5).

At least Win2k16, Win2k19, Win2k22 are affected. This a Windows bug, but
in my opinion, given the broad range of affected OSes, it is worth to
have a hack.

This patch introduces new CPU option: QEMU will copy msr 3a value from
the first vCPU during the hotplug. This problem may not be limited to
SGX feature, so the whole register is copied.
By default the option is set to auto and hyper-v is used as Windows
indicator to enable this new feature.

Resolves: #2669
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---
 target/i386/cpu.c     |  2 ++
 target/i386/cpu.h     |  3 +++
 target/i386/kvm/kvm.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 40aefb38f6..5c02f0962d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -9389,6 +9389,8 @@ static const Property x86_cpu_properties[] = {
     DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
                      true),
     DEFINE_PROP_BOOL("x-l1-cache-per-thread", X86CPU, l1_cache_per_core, true),
+    DEFINE_PROP_ON_OFF_AUTO("kvm-win-hack-sgx-cpu-hotplug", X86CPU,
+                            kvm_win_hack_sgx_cpu_hotplug, ON_OFF_AUTO_AUTO),
 };
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 545851cbde..0505d3d1cd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2301,6 +2301,9 @@ struct ArchCPU {
     /* Forcefully disable KVM PV features not exposed in guest CPUIDs */
     bool kvm_pv_enforce_cpuid;
 
+    /* Copy msr 3a on cpu hotplug */
+    OnOffAuto kvm_win_hack_sgx_cpu_hotplug;
+
     /* Number of physical address bits supported */
     uint32_t phys_bits;
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 56a6b9b638..c1e7d15e2e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5266,6 +5266,42 @@ static int kvm_get_nested_state(X86CPU *cpu)
     return ret;
 }
 
+static int kvm_win_hack_hotplug_with_sgx(CPUState *cs)
+{
+    DeviceState *dev = DEVICE(cs);
+    X86CPU *cpu = X86_CPU(cs);
+    int ret;
+
+    /*
+     * If CPU supports Intel SGX, Windows guests expect readmsr 0x3a after
+     * hotplug to have some bits set, just like on other vCPUs. Unfortunately
+     * by default it is zero and other vCPUs registers are filled by Windows
+     * itself during startup.
+     * Just copy the value from another vCPU.
+     */
+
+    if (cpu->kvm_win_hack_sgx_cpu_hotplug == ON_OFF_AUTO_OFF ||
+        (cpu->kvm_win_hack_sgx_cpu_hotplug == ON_OFF_AUTO_AUTO &&
+        !hyperv_enabled(cpu))) {
+        return 0;
+    }
+
+    if (cpu->env.msr_ia32_feature_control) {
+        return 0;
+    }
+
+    if (IS_INTEL_CPU(&cpu->env) && dev->hotplugged && first_cpu) {
+        ret = kvm_get_one_msr(X86_CPU(first_cpu),
+                              MSR_IA32_FEATURE_CONTROL,
+                              &cpu->env.msr_ia32_feature_control);
+        if (ret != 1) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -5273,6 +5309,13 @@ int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    if (level == KVM_PUT_FULL_STATE) {
+        ret = kvm_win_hack_hotplug_with_sgx(cpu);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     /*
      * Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets out of VMX
      * root operation upon vCPU reset. kvm_put_msr_feature_control() should also
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
  2025-06-09 13:23 [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX Andrey Zhadchenko
@ 2025-06-09 16:12 ` Paolo Bonzini
  2025-06-09 16:26   ` Denis V. Lunev
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2025-06-09 16:12 UTC (permalink / raw)
  To: Andrey Zhadchenko, zhao1.liu, mtosatti
  Cc: qemu-devel, kvm, den, andrey.drobyshev

On 6/9/25 15:23, Andrey Zhadchenko wrote:
> When hotplugging vCPUs to the Windows vms, we observed strange instance
> crash on Intel(R) Xeon(R) CPU E3-1230 v6:
> panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff', arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'
> 
> Presumably, Windows thinks that hotplugged CPU is not "equivalent enough"
> to the previous ones. The problem lies within msr 3a. During the startup,
> Windows assigns some value to this register. During the hotplug it
> expects similar value on the new vCPU in msr 3a. But by default it
> is zero.

If I understand correctly, you checked that it's Windows that writes 
0x40005 to the MSR on non-hotplugged CPUs.

>     CPU 0/KVM-16856   [007] .......   380.398695: kvm_msr: msr_read 3a = 0x0
>     CPU 0/KVM-16856   [007] .......   380.398696: kvm_msr: msr_write 3a = 0x40005
>     CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_read 3a = 0x0
>     CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_write 3a = 0x40005
>     CPU 2/KVM-16858   [006] .......   380.398963: kvm_msr: msr_read 3a = 0x0
>     CPU 2/KVM-16858   [006] .......   380.398964: kvm_msr: msr_write 3a = 0x40005
>     CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_read 3a = 0x0
>     CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_write 3a = 0x40005

This is a random chcek happening, like the one below:

>     CPU 0/KVM-16856   [001] .......   384.497714: kvm_msr: msr_read 3a = 0x40005
>     CPU 0/KVM-16856   [001] .......   384.497716: kvm_msr: msr_read 3a = 0x40005
>     CPU 1/KVM-16857   [007] .......   384.934791: kvm_msr: msr_read 3a = 0x40005
>     CPU 1/KVM-16857   [007] .......   384.934793: kvm_msr: msr_read 3a = 0x40005
>     CPU 2/KVM-16858   [002] .......   384.977871: kvm_msr: msr_read 3a = 0x40005
>     CPU 2/KVM-16858   [002] .......   384.977873: kvm_msr: msr_read 3a = 0x40005
>     CPU 3/KVM-16859   [006] .......   385.021217: kvm_msr: msr_read 3a = 0x40005
>     CPU 3/KVM-16859   [006] .......   385.021220: kvm_msr: msr_read 3a = 0x40005
>     CPU 4/KVM-17500   [002] .......   453.733743: kvm_msr: msr_read 3a = 0x0        <- new vcpu, Windows wants to see 0x40005 here instead of default value>
>     CPU 4/KVM-17500   [002] .......   453.733745: kvm_msr: msr_read 3a = 0x0
> 
> Bit #18 probably means that Intel SGX is supported, because disabling
> it via CPU arguments results is successfull hotplug (and msr value 0x5).

What is the trace like in this case?  Does Windows "accept" 0x0 and 
write 0x5?

Does anything in edk2 run during the hotplug process (on real hardware 
it does, because the whole hotplug is managed via SMM)?  If so maybe 
that could be a better place to write the value.

So many questions, but I'd really prefer to avoid this hack if the only 
reason for it is SGX...

Paolo

> This patch introduces new CPU option: QEMU will copy msr 3a value from
> the first vCPU during the hotplug. This problem may not be limited to
> SGX feature, so the whole register is copied.
> By default the option is set to auto and hyper-v is used as Windows
> indicator to enable this new feature.
> 
> Resolves: #2669
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> ---
>   target/i386/cpu.c     |  2 ++
>   target/i386/cpu.h     |  3 +++
>   target/i386/kvm/kvm.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 48 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 40aefb38f6..5c02f0962d 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -9389,6 +9389,8 @@ static const Property x86_cpu_properties[] = {
>       DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
>                        true),
>       DEFINE_PROP_BOOL("x-l1-cache-per-thread", X86CPU, l1_cache_per_core, true),
> +    DEFINE_PROP_ON_OFF_AUTO("kvm-win-hack-sgx-cpu-hotplug", X86CPU,
> +                            kvm_win_hack_sgx_cpu_hotplug, ON_OFF_AUTO_AUTO),
>   };
>   
>   #ifndef CONFIG_USER_ONLY
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 545851cbde..0505d3d1cd 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2301,6 +2301,9 @@ struct ArchCPU {
>       /* Forcefully disable KVM PV features not exposed in guest CPUIDs */
>       bool kvm_pv_enforce_cpuid;
>   
> +    /* Copy msr 3a on cpu hotplug */
> +    OnOffAuto kvm_win_hack_sgx_cpu_hotplug;
> +
>       /* Number of physical address bits supported */
>       uint32_t phys_bits;
>   
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 56a6b9b638..c1e7d15e2e 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5266,6 +5266,42 @@ static int kvm_get_nested_state(X86CPU *cpu)
>       return ret;
>   }
>   
> +static int kvm_win_hack_hotplug_with_sgx(CPUState *cs)
> +{
> +    DeviceState *dev = DEVICE(cs);
> +    X86CPU *cpu = X86_CPU(cs);
> +    int ret;
> +
> +    /*
> +     * If CPU supports Intel SGX, Windows guests expect readmsr 0x3a after
> +     * hotplug to have some bits set, just like on other vCPUs. Unfortunately
> +     * by default it is zero and other vCPUs registers are filled by Windows
> +     * itself during startup.
> +     * Just copy the value from another vCPU.
> +     */
> +
> +    if (cpu->kvm_win_hack_sgx_cpu_hotplug == ON_OFF_AUTO_OFF ||
> +        (cpu->kvm_win_hack_sgx_cpu_hotplug == ON_OFF_AUTO_AUTO &&
> +        !hyperv_enabled(cpu))) {
> +        return 0;
> +    }
> +
> +    if (cpu->env.msr_ia32_feature_control) {
> +        return 0;
> +    }
> +
> +    if (IS_INTEL_CPU(&cpu->env) && dev->hotplugged && first_cpu) {
> +        ret = kvm_get_one_msr(X86_CPU(first_cpu),
> +                              MSR_IA32_FEATURE_CONTROL,
> +                              &cpu->env.msr_ia32_feature_control);
> +        if (ret != 1) {
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>   int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
>   {
>       X86CPU *x86_cpu = X86_CPU(cpu);
> @@ -5273,6 +5309,13 @@ int kvm_arch_put_registers(CPUState *cpu, int level, Error **errp)
>   
>       assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
>   
> +    if (level == KVM_PUT_FULL_STATE) {
> +        ret = kvm_win_hack_hotplug_with_sgx(cpu);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +
>       /*
>        * Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets out of VMX
>        * root operation upon vCPU reset. kvm_put_msr_feature_control() should also


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
  2025-06-09 16:12 ` Paolo Bonzini
@ 2025-06-09 16:26   ` Denis V. Lunev
  2025-06-09 16:39     ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Denis V. Lunev @ 2025-06-09 16:26 UTC (permalink / raw)
  To: Paolo Bonzini, Andrey Zhadchenko, zhao1.liu, mtosatti
  Cc: qemu-devel, kvm, andrey.drobyshev

On 6/9/25 18:12, Paolo Bonzini wrote:
> On 6/9/25 15:23, Andrey Zhadchenko wrote:
>> When hotplugging vCPUs to the Windows vms, we observed strange instance
>> crash on Intel(R) Xeon(R) CPU E3-1230 v6:
>> panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff', 
>> arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'
>>
>> Presumably, Windows thinks that hotplugged CPU is not "equivalent 
>> enough"
>> to the previous ones. The problem lies within msr 3a. During the 
>> startup,
>> Windows assigns some value to this register. During the hotplug it
>> expects similar value on the new vCPU in msr 3a. But by default it
>> is zero.
>
> If I understand correctly, you checked that it's Windows that writes 
> 0x40005 to the MSR on non-hotplugged CPUs.
>
>>     CPU 0/KVM-16856   [007] ....... 380.398695: kvm_msr: msr_read 3a 
>> = 0x0
>>     CPU 0/KVM-16856   [007] .......   380.398696: kvm_msr: msr_write 
>> 3a = 0x40005
>>     CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_read 
>> 3a = 0x0
>>     CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_write 
>> 3a = 0x40005
>>     CPU 2/KVM-16858   [006] .......   380.398963: kvm_msr: msr_read 
>> 3a = 0x0
>>     CPU 2/KVM-16858   [006] .......   380.398964: kvm_msr: msr_write 
>> 3a = 0x40005
>>     CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_read 
>> 3a = 0x0
>>     CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_write 
>> 3a = 0x40005
>
> This is a random chcek happening, like the one below:
>
>>     CPU 0/KVM-16856   [001] ....... 384.497714: kvm_msr: msr_read 3a 
>> = 0x40005
>>     CPU 0/KVM-16856   [001] .......   384.497716: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 1/KVM-16857   [007] .......   384.934791: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 1/KVM-16857   [007] .......   384.934793: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 2/KVM-16858   [002] .......   384.977871: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 2/KVM-16858   [002] .......   384.977873: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 3/KVM-16859   [006] .......   385.021217: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 3/KVM-16859   [006] .......   385.021220: kvm_msr: msr_read 
>> 3a = 0x40005
>>     CPU 4/KVM-17500   [002] .......   453.733743: kvm_msr: msr_read 
>> 3a = 0x0        <- new vcpu, Windows wants to see 0x40005 here 
>> instead of default value>
>>     CPU 4/KVM-17500   [002] .......   453.733745: kvm_msr: msr_read 
>> 3a = 0x0
>>
>> Bit #18 probably means that Intel SGX is supported, because disabling
>> it via CPU arguments results is successfull hotplug (and msr value 0x5).
>
> What is the trace like in this case?  Does Windows "accept" 0x0 and 
> write 0x5?
>
> Does anything in edk2 run during the hotplug process (on real hardware 
> it does, because the whole hotplug is managed via SMM)? If so maybe 
> that could be a better place to write the value.
>
> So many questions, but I'd really prefer to avoid this hack if the 
> only reason for it is SGX...
>
This problem was originally reported in the scope of
     https://gitlab.com/qemu-project/qemu/-/issues/2669
and is fairly reproducible on
   vendor_id    : GenuineIntel
   cpu family    : 6
   model        : 158
   model name    : Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz
   stepping    : 9
   microcode    : 0xf4
We are blocked completely without this patch on our test
cluster with this hardware.

BSOD is namely the following:

|MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED (3e) ||The system has 
multiple processors, but they are asymmetric in relation ||to one 
another. In order to be symmetric all processors must be of ||the same 
type and level. For example, trying to mix a Pentium level ||processor 
with an 80486 would cause this BugCheck. ||Arguments: ||Arg1: 
0000046d359bbdff ||Arg2: 0000056d359bbdff ||Arg3: 0000000000000000 
||Arg4: 0000000000000000|

|STACK_TEXT: ||ffff9b81`085768e0 fffff802`adadfa45 : ffff9b81`085771b8 
00000000`00000000 ffff9b81`08577160 00000000`00000004 : 
nt!KiStartDynamicProcessor+0x417 ||ffff9b81`085770e0 fffff809`d2c11c08 : 
ffffab8c`dbbcb820 ffffab8c`e4561e40 ffffab8c`e4561e40 fffff809`d2c0c340 
: nt!KeStartDynamicProcessor+0x69 ||ffff9b81`08577110 fffff809`d2be4363 
: 00000000`00000001 fffff802`ad6e0000 00000000`00000004 
fffff802`00000004 : ACPI!ACPIProcessorStartDevice+0x275b8 
||ffff9b81`085771a0 fffff809`d2ac98e2 : 00000000`00000007 
ffffab8c`e2f14970 ffffab8c`e2783c60 00000000`00000000 : 
ACPI!ACPIDispatchIrp+0x223 ||(Inline Function) --------`-------- : 
--------`-------- --------`-------- --------`-------- --------`-------- 
: Wdf01000!FxIrp::CallDriver+0x14 
[d:\rs1\minkernel\wdf\framework\shared\inc\private\km\fxirpkm.hpp @ 85] 
||ffff9b81`08577220 fffff809`d2acc431 : ffffab8c`e2783c60 
ffffab8c`e2f14970 00000000`00000002 00000000`00000000 : 
Wdf01000!FxPkgFdo::PnpSendStartDeviceDownTheStackOverload+0xd2 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgfdo.cpp @ 
1100] ||ffff9b81`08577290 fffff809`d2ac6a89 : ffffab8c`e2f14970 
00000000`00000106 00000000`00000105 fffff809`d2b43290 : 
Wdf01000!FxPkgPnp::PnpEventInitStarting+0x11 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp 
@ 1328] ||(Inline Function) --------`-------- : --------`-------- 
--------`-------- --------`-------- --------`-------- : 
Wdf01000!FxPkgPnp::PnpEnterNewState+0xda 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp 
@ 1234] ||ffff9b81`085772c0 fffff809`d2ac41a8 : ffffab8c`e2f14ac8 
ffff9b81`00000000 ffffab8c`e2f14aa0 00000000`00000001 : 
Wdf01000!FxPkgPnp::PnpProcessEventInner+0x1c9 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp 
@ 1150] ||ffff9b81`08577370 fffff809`d2ad6e9e : 00000000`00000000 
ffff9b81`08577479 00000000`00000000 ffffab8c`e2a40270 : 
Wdf01000!FxPkgPnp::PnpProcessEvent+0x158 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp 
@ 933] ||ffff9b81`08577410 fffff809`d2aa3e7f : ffffab8c`e2f14970 
ffff9b81`08577479 00000000`00000000 ffffab8c`e2783c60 : 
Wdf01000!FxPkgPnp::_PnpStartDevice+0x1e 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 
1845] ||ffff9b81`08577440 fffff809`d2aa34f5 : ffffab8c`e2783c60 
ffffab8c`e2f14970 ffffab8c`e2783c60 fffff802`00000003 : 
Wdf01000!FxPkgPnp::Dispatch+0xef 
[d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 
654] ||(Inline Function) --------`-------- : --------`-------- 
--------`-------- --------`-------- --------`-------- : 
Wdf01000!DispatchWorker+0xdf 
[d:\rs1\minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1572] 
||(Inline Function) --------`-------- : --------`-------- 
--------`-------- --------`-------- --------`-------- : 
Wdf01000!FxDevice::Dispatch+0xeb 
[d:\rs1\minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1586] 
||ffff9b81`085774e0 fffff802`ad908d7d : ffffab8c`e2f10e20 
ffff9b81`08577604 00000000`00000000 00000000`00000000 : 
Wdf01000!FxDevice::DispatchWithLock+0x155 
[d:\rs1\minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1430] 
||ffff9b81`085775d0 fffff802`ad5512f6 : ffffab8c`e4561e40 
00000000`00000001 ffffab8c`e2b95bf0 00000000`00000000 : 
nt!PnpAsynchronousCall+0xe5 ||ffff9b81`08577610 fffff802`ad57f738 : 
00000000`00000000 ffffab8c`e4561e40 fffff802`ad550e14 fffff802`ad550e14 
: nt!PnpSendIrp+0x92 ||ffff9b81`08577680 fffff802`ad9084c7 : 
ffffab8c`e28b8190 ffffab8c`e2b95bf0 00000000`00000000 00000000`00000000 
: nt!PnpStartDevice+0x88 ||ffff9b81`08577710 fffff802`ad8ff8c3 : 
ffffab8c`e28b8190 ffff9b81`085778e0 00000000`00000000 ffffab8c`e28b8190 
: nt!PnpStartDeviceNode+0xdb ||ffff9b81`085777a0 fffff802`ad96670d : 
ffffab8c`e28b8190 00000000`00000001 00000000`00000001 ffffab8c`dba25d30 
: nt!PipProcessStartPhase1+0x53 ||ffff9b81`085777e0 fffff802`ad9063ae : 
ffffab8c`e227d990 00000000`00000000 ffff9b81`08577b19 fffff802`ad966c17 
: nt!PipProcessDevNodeTree+0x401 ||ffff9b81`08577a60 fffff802`ad550176 : 
00000001`00000003 00000000`00000000 00000000`00000000 00000000`00000000 
: nt!PiProcessReenumeration+0xa6 ||ffff9b81`08577ab0 fffff802`ad4ff6b9 : 
ffffab8c`dc66e800 fffff802`ad7ae380 fffff802`ad8502c0 fffff802`ad8502c0 
: nt!PnpDeviceActionWorker+0x166 ||ffff9b81`08577b80 fffff802`ad5957b9 : 
ffffab8c`dc66e800 00000000`00000080 ffffab8c`db6b33c0 ffffab8c`dc66e800 
: nt!ExpWorkerThread+0xe9 ||ffff9b81`08577c10 fffff802`ad5f6966 : 
ffff9b81`08100180 ffffab8c`dc66e800 fffff802`ad595778 00000000`00000000 
: nt!PspSystemThreadStartup+0x41 ||ffff9b81`08577c60 00000000`00000000 : 
ffff9b81`08578000 ffff9b81`08572000 00000000`00000000 00000000`00000000 
: nt!KiStartSystemThread+0x16|

Linux by itself handles this well and assigns MSRs properly (we observe
corresponding set_msr on the hotplugged CPU).

Den

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
  2025-06-09 16:26   ` Denis V. Lunev
@ 2025-06-09 16:39     ` Sean Christopherson
  2025-06-09 17:54       ` Andrey Zhadchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2025-06-09 16:39 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Paolo Bonzini, Andrey Zhadchenko, zhao1.liu, mtosatti, qemu-devel,
	kvm, andrey.drobyshev

On Mon, Jun 09, 2025, Denis V. Lunev wrote:
> On 6/9/25 18:12, Paolo Bonzini wrote:
> > On 6/9/25 15:23, Andrey Zhadchenko wrote:
> > > When hotplugging vCPUs to the Windows vms, we observed strange instance
> > > crash on Intel(R) Xeon(R) CPU E3-1230 v6:
> > > panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff',
> > > arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'
> > > 
> > > Presumably, Windows thinks that hotplugged CPU is not "equivalent
> > > enough"
> > > to the previous ones. The problem lies within msr 3a. During the
> > > startup,
> > > Windows assigns some value to this register. During the hotplug it
> > > expects similar value on the new vCPU in msr 3a. But by default it
> > > is zero.
> > 
> > If I understand correctly, you checked that it's Windows that writes
> > 0x40005 to the MSR on non-hotplugged CPUs.

...

> > > Bit #18 probably means that Intel SGX is supported, because disabling
> > > it via CPU arguments results is successfull hotplug (and msr value 0x5).
> > 
> > What is the trace like in this case?  Does Windows "accept" 0x0 and
> > write 0x5?
> > 
> > Does anything in edk2 run during the hotplug process (on real hardware
> > it does, because the whole hotplug is managed via SMM)? If so maybe that
> > could be a better place to write the value.

Yeah, I would expect firmware to write and lock IA32_FEATURE_CONTROL.

> > So many questions, but I'd really prefer to avoid this hack if the only
> > reason for it is SGX...

Does your setup actually support SGX?  I.e. expose EPC sections to the guest?
If not, can't you simply disable SGX in CPUID?

> Linux by itself handles this well and assigns MSRs properly (we observe
> corresponding set_msr on the hotplugged CPU).

Linux is much more tolerant of oddities, and quite a bit of effort went into
making sure that IA32_FEATURE_CONTROL was initialized if firmware left it unlocked.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
  2025-06-09 16:39     ` Sean Christopherson
@ 2025-06-09 17:54       ` Andrey Zhadchenko
  2025-06-09 18:25         ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey Zhadchenko @ 2025-06-09 17:54 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: zhao1.liu, mtosatti, qemu-devel, kvm, andrey.drobyshev,
	Denis V. Lunev



On 6/9/25 18:39, Sean Christopherson wrote:

> 
> On Mon, Jun 09, 2025, Denis V. Lunev wrote:
>> On 6/9/25 18:12, Paolo Bonzini wrote:
>>> On 6/9/25 15:23, Andrey Zhadchenko wrote:
>>>> When hotplugging vCPUs to the Windows vms, we observed strange instance
>>>> crash on Intel(R) Xeon(R) CPU E3-1230 v6:
>>>> panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff',
>>>> arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'
>>>>
>>>> Presumably, Windows thinks that hotplugged CPU is not "equivalent
>>>> enough"
>>>> to the previous ones. The problem lies within msr 3a. During the
>>>> startup,
>>>> Windows assigns some value to this register. During the hotplug it
>>>> expects similar value on the new vCPU in msr 3a. But by default it
>>>> is zero.
>>>
>>> If I understand correctly, you checked that it's Windows that writes
>>> 0x40005 to the MSR on non-hotplugged CPUs.
> 
> ...

Actually no, it may also be firmware.
We are only sure that it is Windows code that crashes the vm.

> 
>>>> Bit #18 probably means that Intel SGX is supported, because disabling
>>>> it via CPU arguments results is successfull hotplug (and msr value 0x5).
>>>
>>> What is the trace like in this case?  Does Windows "accept" 0x0 and
>>> write 0x5?

It 'accepts' 0x0, but does not write anything there.

>>>
>>> Does anything in edk2 run during the hotplug process (on real hardware
>>> it does, because the whole hotplug is managed via SMM)? If so maybe that
>>> could be a better place to write the value.
> 
> Yeah, I would expect firmware to write and lock IA32_FEATURE_CONTROL.
> 
>>> So many questions, but I'd really prefer to avoid this hack if the only
>>> reason for it is SGX...
> 
> Does your setup actually support SGX?  I.e. expose EPC sections to the guest?
> If not, can't you simply disable SGX in CPUID?

We do not have any TYPE_MEMORY_BACKEND_EPC objects in our default 
config, but have the following: 
sgx=on,sgx1=on,sgx-debug=on,sgx-mode64=on,sgx-provisionkey=on,sgx-tokenkey=on
We found this during testing, and it can be disabled on our testing 
setup without any worries indeed.
I have no data whether someone actually sets it properly in the wild, 
which may still be possible.

> 
>> Linux by itself handles this well and assigns MSRs properly (we observe
>> corresponding set_msr on the hotplugged CPU).

I think Linux, at least old 4.4, does not write msr on hotplug. Anyway 
it hotplugs fine and tolerates different value unlike Windows

> 
> Linux is much more tolerant of oddities, and quite a bit of effort went into
> making sure that IA32_FEATURE_CONTROL was initialized if firmware left it unlocked.

Thanks everyone for the ideas. I focused on Windows too much and did not 
investigate into firmware, so perhaps this is rather a firmware problem?
I think by default we are using seaBIOS, not ovmf/edk2. I will update 
after some testing with different configurations.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
  2025-06-09 17:54       ` Andrey Zhadchenko
@ 2025-06-09 18:25         ` Sean Christopherson
  2025-06-12 12:23           ` Andrey Zhadchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2025-06-09 18:25 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: Paolo Bonzini, zhao1.liu, mtosatti, qemu-devel, kvm,
	andrey.drobyshev, Denis V. Lunev

On Mon, Jun 09, 2025, Andrey Zhadchenko wrote:
> On 6/9/25 18:39, Sean Christopherson wrote:
> > On Mon, Jun 09, 2025, Denis V. Lunev wrote:
> > > > Does anything in edk2 run during the hotplug process (on real hardware
> > > > it does, because the whole hotplug is managed via SMM)? If so maybe that
> > > > could be a better place to write the value.
> > 
> > Yeah, I would expect firmware to write and lock IA32_FEATURE_CONTROL.
> > 
> > > > So many questions, but I'd really prefer to avoid this hack if the only
> > > > reason for it is SGX...
> > 
> > Does your setup actually support SGX?  I.e. expose EPC sections to the guest?
> > If not, can't you simply disable SGX in CPUID?
> 
> We do not have any TYPE_MEMORY_BACKEND_EPC objects in our default config,
> but have the following:
> sgx=on,sgx1=on,sgx-debug=on,sgx-mode64=on,sgx-provisionkey=on,sgx-tokenkey=on
> We found this during testing, and it can be disabled on our testing setup
> without any worries indeed.
> I have no data whether someone actually sets it properly in the wild, which
> may still be possible.

The reason I ask is because on bare metal, I'm pretty sure SGX is incompatible
with true CPU hotplug.  It can work for the virtualization case, but I wouldn't
be all that surprised if the answer here is "don't do that".

> > > Linux by itself handles this well and assigns MSRs properly (we observe
> > > corresponding set_msr on the hotplugged CPU).
> 
> I think Linux, at least old 4.4, does not write msr on hotplug.

Yeah, it's a newer thing.  5.6+ should initialize IA32_FEATURE_CONTROL if it's
left unlocked (commit 1db2a6e1e29f ("x86/intel: Initialize IA32_FEAT_CTL MSR at boot").

> Anyway it hotplugs fine and tolerates different value unlike Windows

Heh, probably only because the VM isn't actively using KVM at the time of hotplug.
In pre-5.6 kernels, i.e. without the aforementioned handling, KVM (in the guest)
would refuse to load (though the hotplug would still work).  But if the guest is
actively running (nested) VMs at the time of hotplug, the hotplugged vCPUs would
hit a #GP when attempting to do VMXON, and would likely crash the kernel.

> > Linux is much more tolerant of oddities, and quite a bit of effort went into
> > making sure that IA32_FEATURE_CONTROL was initialized if firmware left it unlocked.
> 
> Thanks everyone for the ideas. I focused on Windows too much and did not
> investigate into firmware, so perhaps this is rather a firmware problem?
> I think by default we are using seaBIOS, not ovmf/edk2. I will update after
> some testing with different configurations.

Generally speaking, firmware is expected to set and lock IA32_FEATURE_CONTROL.
But of course firmware doesn't always behave as expected, hence the hardening that
was added by commit 1db2a6e1e29f to avoid blowing up when running on weird/buggy
firmware.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX
  2025-06-09 18:25         ` Sean Christopherson
@ 2025-06-12 12:23           ` Andrey Zhadchenko
  0 siblings, 0 replies; 7+ messages in thread
From: Andrey Zhadchenko @ 2025-06-12 12:23 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: zhao1.liu, mtosatti, qemu-devel, kvm, andrey.drobyshev,
	Denis V. Lunev

Apparently, looks like it is a firmware bug.
Both seaBIOS/OVMF set IA32_FEATURE_CONTROL only during init from 
qemu-provided etc/msr_feature_control.
So probably the fix should be done in the firmware then.

On 6/9/25 20:25, Sean Christopherson wrote:

> On Mon, Jun 09, 2025, Andrey Zhadchenko wrote:
>> On 6/9/25 18:39, Sean Christopherson wrote:
>>> On Mon, Jun 09, 2025, Denis V. Lunev wrote:
>>>>> Does anything in edk2 run during the hotplug process (on real hardware
>>>>> it does, because the whole hotplug is managed via SMM)? If so maybe that
>>>>> could be a better place to write the value.
>>>
>>> Yeah, I would expect firmware to write and lock IA32_FEATURE_CONTROL.
>>>
>>>>> So many questions, but I'd really prefer to avoid this hack if the only
>>>>> reason for it is SGX...
>>>
>>> Does your setup actually support SGX?  I.e. expose EPC sections to the guest?
>>> If not, can't you simply disable SGX in CPUID?
>>
>> We do not have any TYPE_MEMORY_BACKEND_EPC objects in our default config,
>> but have the following:
>> sgx=on,sgx1=on,sgx-debug=on,sgx-mode64=on,sgx-provisionkey=on,sgx-tokenkey=on
>> We found this during testing, and it can be disabled on our testing setup
>> without any worries indeed.
>> I have no data whether someone actually sets it properly in the wild, which
>> may still be possible.
> 
> The reason I ask is because on bare metal, I'm pretty sure SGX is incompatible
> with true CPU hotplug.  It can work for the virtualization case, but I wouldn't
> be all that surprised if the answer here is "don't do that".
> 
>>>> Linux by itself handles this well and assigns MSRs properly (we observe
>>>> corresponding set_msr on the hotplugged CPU).
>>
>> I think Linux, at least old 4.4, does not write msr on hotplug.
> 
> Yeah, it's a newer thing.  5.6+ should initialize IA32_FEATURE_CONTROL if it's
> left unlocked (commit 1db2a6e1e29f ("x86/intel: Initialize IA32_FEAT_CTL MSR at boot").
> 
>> Anyway it hotplugs fine and tolerates different value unlike Windows
> 
> Heh, probably only because the VM isn't actively using KVM at the time of hotplug.
> In pre-5.6 kernels, i.e. without the aforementioned handling, KVM (in the guest)
> would refuse to load (though the hotplug would still work).  But if the guest is
> actively running (nested) VMs at the time of hotplug, the hotplugged vCPUs would
> hit a #GP when attempting to do VMXON, and would likely crash the kernel.
> 
>>> Linux is much more tolerant of oddities, and quite a bit of effort went into
>>> making sure that IA32_FEATURE_CONTROL was initialized if firmware left it unlocked.
>>
>> Thanks everyone for the ideas. I focused on Windows too much and did not
>> investigate into firmware, so perhaps this is rather a firmware problem?
>> I think by default we are using seaBIOS, not ovmf/edk2. I will update after
>> some testing with different configurations.
> 
> Generally speaking, firmware is expected to set and lock IA32_FEATURE_CONTROL.
> But of course firmware doesn't always behave as expected, hence the hardening that
> was added by commit 1db2a6e1e29f to avoid blowing up when running on weird/buggy
> firmware.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-06-12 12:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 13:23 [PATCH] target/i386: KVM: add hack for Windows vCPU hotplug with SGX Andrey Zhadchenko
2025-06-09 16:12 ` Paolo Bonzini
2025-06-09 16:26   ` Denis V. Lunev
2025-06-09 16:39     ` Sean Christopherson
2025-06-09 17:54       ` Andrey Zhadchenko
2025-06-09 18:25         ` Sean Christopherson
2025-06-12 12:23           ` Andrey Zhadchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).