qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
@ 2024-05-24 20:00 Zide Chen
  2024-05-24 20:00 ` [PATCH V2 1/3] vl: Allow multiple -overcommit commands Zide Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Zide Chen @ 2024-05-24 20:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, mst, thuth, cfontana, xiaoyao.li, qemu-trivial,
	Zide Chen

Currently, if running "-overcommit cpu-pm=on" on hosts that don't
have MWAIT support, the MWAIT/MONITOR feature is advertised to the
guest and executing MWAIT/MONITOR on the guest triggers #UD.

V2:
- [PATCH 1]: took Thomas' suggestion for more generic fix
- [PATCH 2/3]: no changes

Zide Chen (3):
  vl: Allow multiple -overcommit commands
  target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()

 system/vl.c               |  4 ++--
 target/i386/cpu.c         | 24 ++++++++++++------------
 target/i386/host-cpu.c    | 12 ------------
 target/i386/kvm/kvm-cpu.c | 12 +++++++++---
 4 files changed, 23 insertions(+), 29 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V2 1/3] vl: Allow multiple -overcommit commands
  2024-05-24 20:00 [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Zide Chen
@ 2024-05-24 20:00 ` Zide Chen
  2024-05-27  5:19   ` Thomas Huth
  2024-05-30 13:39   ` Zhao Liu
  2024-05-24 20:00 ` [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features Zide Chen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 29+ messages in thread
From: Zide Chen @ 2024-05-24 20:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, mst, thuth, cfontana, xiaoyao.li, qemu-trivial,
	Zide Chen

Both cpu-pm and mem-lock are related to system resource overcommit, but
they are separate from each other, in terms of how they are realized,
and of course, they are applied to different system resources.

It's tempting to use separate command lines to specify their behavior.
e.g., in the following example, the cpu-pm command is quietly
overwritten, and it's not easy to notice it without careful inspection.

  --overcommit mem-lock=on
  --overcommit cpu-pm=on

Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---

v2:

Thanks to Thomas' suggestion, changed to this better approach, which
is more generic and can handle situations like: "enabled the option in
the config file, and now you'd like to disable it on the command line
again".

 system/vl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index a3eede5fa5b8..dfa6cdd9283b 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -3545,8 +3545,8 @@ void qemu_init(int argc, char **argv)
                 if (!opts) {
                     exit(1);
                 }
-                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
-                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
+                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
+                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
                 break;
             case QEMU_OPTION_compat:
                 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-05-24 20:00 [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Zide Chen
  2024-05-24 20:00 ` [PATCH V2 1/3] vl: Allow multiple -overcommit commands Zide Chen
@ 2024-05-24 20:00 ` Zide Chen
  2024-05-31  6:30   ` Zhao Liu
  2024-05-24 20:00 ` [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() Zide Chen
  2024-05-28  9:23 ` [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Igor Mammedov
  3 siblings, 1 reply; 29+ messages in thread
From: Zide Chen @ 2024-05-24 20:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, mst, thuth, cfontana, xiaoyao.li, qemu-trivial,
	Zide Chen

cpu_exec_realizefn which calls the accel-specific realizefn may expand
features.  e.g., some accel-specific options may require extra features
to be enabled, and it's appropriate to expand these features in accel-
specific realizefn.

One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.

Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
that it won't expose features not supported by the host.

Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.c         | 24 ++++++++++++------------
 target/i386/kvm/kvm-cpu.c |  1 -
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc2dceb647fa..a1c1c785bd2f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
         }
     }
 
+    /*
+     * note: the call to the framework needs to happen after feature expansion,
+     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
+     * These may be set by the accel-specific code,
+     * and the results are subsequently checked / assumed in this function.
+     */
+    cpu_exec_realizefn(cs, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
 
     if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
@@ -7625,18 +7637,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 
     x86_cpu_set_sgxlepubkeyhash(env);
 
-    /*
-     * note: the call to the framework needs to happen after feature expansion,
-     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
-     * These may be set by the accel-specific code,
-     * and the results are subsequently checked / assumed in this function.
-     */
-    cpu_exec_realizefn(cs, &local_err);
-    if (local_err != NULL) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
     if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
         g_autofree char *name = x86_cpu_class_get_model_name(xcc);
         error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index f76972e47e61..3adcedf0dbc3 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -50,7 +50,6 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
      * nothing else has been set by the user (or by accelerators) in
      * cpu->ucode_rev and cpu->phys_bits, and updates the CPUID results in
      * mwait.ecx.
-     * This accel realization code also assumes cpu features are already expanded.
      *
      * realize order:
      *
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()
  2024-05-24 20:00 [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Zide Chen
  2024-05-24 20:00 ` [PATCH V2 1/3] vl: Allow multiple -overcommit commands Zide Chen
  2024-05-24 20:00 ` [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features Zide Chen
@ 2024-05-24 20:00 ` Zide Chen
  2024-05-31  6:53   ` Zhao Liu
  2024-05-28  9:23 ` [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Igor Mammedov
  3 siblings, 1 reply; 29+ messages in thread
From: Zide Chen @ 2024-05-24 20:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, mst, thuth, cfontana, xiaoyao.li, qemu-trivial,
	Zide Chen

It seems not a good idea to expand features in host_cpu_realizefn,
which is for host CPU only.  Additionally, cpu-pm option is KVM
specific, and it's cleaner to put it in kvm_cpu_realizefn(), together
with the WAITPKG code.

Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/host-cpu.c    | 12 ------------
 target/i386/kvm/kvm-cpu.c | 11 +++++++++--
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 280e427c017c..8b8bf5afeccf 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void)
     return host_phys_bits;
 }
 
-static void host_cpu_enable_cpu_pm(X86CPU *cpu)
-{
-    CPUX86State *env = &cpu->env;
-
-    host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
-               &cpu->mwait.ecx, &cpu->mwait.edx);
-    env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
-}
-
 static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
 {
     uint32_t host_phys_bits = host_cpu_phys_bits();
@@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp)
     X86CPU *cpu = X86_CPU(cs);
     CPUX86State *env = &cpu->env;
 
-    if (cpu->max_features && enable_cpu_pm) {
-        host_cpu_enable_cpu_pm(cpu);
-    }
     if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
         uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
 
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 3adcedf0dbc3..197c892da89a 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -64,9 +64,16 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
      *   cpu_common_realizefn() (via xcc->parent_realize)
      */
     if (cpu->max_features) {
-        if (enable_cpu_pm && kvm_has_waitpkg()) {
-            env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
+        if (enable_cpu_pm) {
+            if (kvm_has_waitpkg()) {
+                env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
+            }
+
+            host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
+                       &cpu->mwait.ecx, &cpu->mwait.edx);
+            env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
         }
+
         if (cpu->ucode_rev == 0) {
             cpu->ucode_rev =
                 kvm_arch_get_supported_msr_feature(kvm_state,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
  2024-05-24 20:00 ` [PATCH V2 1/3] vl: Allow multiple -overcommit commands Zide Chen
@ 2024-05-27  5:19   ` Thomas Huth
  2024-05-30 14:01     ` Zhao Liu
  2024-05-30 13:39   ` Zhao Liu
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2024-05-27  5:19 UTC (permalink / raw)
  To: Zide Chen, qemu-devel; +Cc: pbonzini, mst, cfontana, xiaoyao.li, qemu-trivial

On 24/05/2024 22.00, Zide Chen wrote:
> Both cpu-pm and mem-lock are related to system resource overcommit, but
> they are separate from each other, in terms of how they are realized,
> and of course, they are applied to different system resources.
> 
> It's tempting to use separate command lines to specify their behavior.
> e.g., in the following example, the cpu-pm command is quietly
> overwritten, and it's not easy to notice it without careful inspection.
> 
>    --overcommit mem-lock=on
>    --overcommit cpu-pm=on
> 
> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> 
> v2:
> 
> Thanks to Thomas' suggestion, changed to this better approach, which
> is more generic and can handle situations like: "enabled the option in
> the config file, and now you'd like to disable it on the command line
> again".
> 
>   system/vl.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/system/vl.c b/system/vl.c
> index a3eede5fa5b8..dfa6cdd9283b 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -3545,8 +3545,8 @@ void qemu_init(int argc, char **argv)
>                   if (!opts) {
>                       exit(1);
>                   }
> -                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
> -                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
> +                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
> +                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
>                   break;
>               case QEMU_OPTION_compat:
>                   {

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-24 20:00 [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Zide Chen
                   ` (2 preceding siblings ...)
  2024-05-24 20:00 ` [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() Zide Chen
@ 2024-05-28  9:23 ` Igor Mammedov
  2024-05-28 18:16   ` Chen, Zide
  3 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-05-28  9:23 UTC (permalink / raw)
  To: Zide Chen
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial

On Fri, 24 May 2024 13:00:14 -0700
Zide Chen <zide.chen@intel.com> wrote:

> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> guest and executing MWAIT/MONITOR on the guest triggers #UD.

this is missing proper description how do you trigger issue
with reproducer and detailed description why guest sees MWAIT
when it's not supported by host.

> 
> V2:
> - [PATCH 1]: took Thomas' suggestion for more generic fix
> - [PATCH 2/3]: no changes
> 
> Zide Chen (3):
>   vl: Allow multiple -overcommit commands
>   target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
>   target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()
> 
>  system/vl.c               |  4 ++--
>  target/i386/cpu.c         | 24 ++++++++++++------------
>  target/i386/host-cpu.c    | 12 ------------
>  target/i386/kvm/kvm-cpu.c | 12 +++++++++---
>  4 files changed, 23 insertions(+), 29 deletions(-)
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-28  9:23 ` [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Igor Mammedov
@ 2024-05-28 18:16   ` Chen, Zide
  2024-05-29 12:46     ` Igor Mammedov
  0 siblings, 1 reply; 29+ messages in thread
From: Chen, Zide @ 2024-05-28 18:16 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial



On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> On Fri, 24 May 2024 13:00:14 -0700
> Zide Chen <zide.chen@intel.com> wrote:
> 
>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> 
> this is missing proper description how do you trigger issue
> with reproducer and detailed description why guest sees MWAIT
> when it's not supported by host.

If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the
following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
that it doesn't have a chance to check MWAIT against host features and
will be advertised to the guest regardless of whether it's supported by
the host or not.

x86_cpu_realizefn()
  x86_cpu_filter_features()
  cpu_exec_realizefn()
    kvm_cpu_realizefn
      host_cpu_realizefn
        host_cpu_enable_cpu_pm
          env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;


If it's not supported by the host, executing MONITOR or MWAIT
instructions from the guest triggers #UD, no matter MWAIT_EXITING
control is set or not.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-28 18:16   ` Chen, Zide
@ 2024-05-29 12:46     ` Igor Mammedov
  2024-05-29 17:31       ` Chen, Zide
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-05-29 12:46 UTC (permalink / raw)
  To: Chen, Zide
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial

On Tue, 28 May 2024 11:16:59 -0700
"Chen, Zide" <zide.chen@intel.com> wrote:

> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> > On Fri, 24 May 2024 13:00:14 -0700
> > Zide Chen <zide.chen@intel.com> wrote:
> >   
> >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> >> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
> > 
> > this is missing proper description how do you trigger issue
> > with reproducer and detailed description why guest sees MWAIT
> > when it's not supported by host.  
> 
> If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the
it's bette to provide full QEMU CLI and host/guest kernels used and what
hardware was used if it's relevant so others can reproduce problem.

> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> that it doesn't have a chance to check MWAIT against host features and
> will be advertised to the guest regardless of whether it's supported by
> the host or not.
> 
> x86_cpu_realizefn()
>   x86_cpu_filter_features()
>   cpu_exec_realizefn()
>     kvm_cpu_realizefn
>       host_cpu_realizefn
>         host_cpu_enable_cpu_pm
>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> 
> 
> If it's not supported by the host, executing MONITOR or MWAIT
> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> control is set or not.

If I recall right, kvm was able to emulate mwait/monitor.
So question is why it leads to exception instead?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-29 12:46     ` Igor Mammedov
@ 2024-05-29 17:31       ` Chen, Zide
  2024-05-30 13:54         ` Zhao Liu
  0 siblings, 1 reply; 29+ messages in thread
From: Chen, Zide @ 2024-05-29 17:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial



On 5/29/2024 5:46 AM, Igor Mammedov wrote:
> On Tue, 28 May 2024 11:16:59 -0700
> "Chen, Zide" <zide.chen@intel.com> wrote:
> 
>> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
>>> On Fri, 24 May 2024 13:00:14 -0700
>>> Zide Chen <zide.chen@intel.com> wrote:
>>>   
>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
>>>
>>> this is missing proper description how do you trigger issue
>>> with reproducer and detailed description why guest sees MWAIT
>>> when it's not supported by host.  
>>
>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
> it's bette to provide full QEMU CLI and host/guest kernels used and what
> hardware was used if it's relevant so others can reproduce problem.

I ever reproduced this on an older Intel Icelake machine, a
Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
issue, not specific to particular models.

For the CLI, I think the only command line options that matter are
 -overcommit cpu-pm=on: to set enable_cpu_pm
 -cpu host: so that cpu->max_features is set

For QEMU version, as long as it's after this commit: 662175b91ff2
("i386: reorder call to cpu_exec_realizefn")

The guest fails to boot:

[ 24.825568] smpboot: x86: Booting SMP configuration:
[ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
#13 #14 #15 #17
[ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
#136 #137 #138 #139 #140 #141 #142 #143 #145
[ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI
[ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
[ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
[ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
[ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
[ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046
[ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX:
0000000000000000
[ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI:
0000000000000004
[ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09:
0000000000000001
[ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12:
0000000000000000
[ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15:
0000000000093ff0
[ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000)
knlGS:0000000000000000
[ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4:
0000000000770ef0
[ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
0000000000000400
[ 25.137790] PKRU: 55555554
[ 25.137790] Call Trace:
[ 25.137790] <TASK>
[ 25.137790] ? die+0x37/0x90
[ 25.137790] ? do_trap+0xe3/0x110
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? do_error_trap+0x6a/0x90
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? exc_invalid_op+0x52/0x70
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] default_idle_call+0x30/0x100
[ 25.137790] cpuidle_idle_call+0x12c/0x170
[ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
[ 25.137790] do_idle+0x7f/0xd0
[ 25.137790] cpu_startup_entry+0x29/0x30
[ 25.137790] rest_init+0xcc/0xd0
[ 25.137790] start_kernel+0x396/0x5d0
[ 25.137790] x86_64_start_reservations+0x18/0x30
[ 25.137790] x86_64_start_kernel+0xe7/0xf0
[ 25.137790] common_startup_64+0x13e/0x148
[ 25.137790] </TASK>
[ 25.137790] Modules linked in:
[ 25.137790] --[ end trace 0000000000000000 ]--
[ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI
[ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
[ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8

> 
>> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
>> that it doesn't have a chance to check MWAIT against host features and
>> will be advertised to the guest regardless of whether it's supported by
>> the host or not.
>>
>> x86_cpu_realizefn()
>>   x86_cpu_filter_features()
>>   cpu_exec_realizefn()
>>     kvm_cpu_realizefn
>>       host_cpu_realizefn
>>         host_cpu_enable_cpu_pm
>>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>
>>
>> If it's not supported by the host, executing MONITOR or MWAIT
>> instructions from the guest triggers #UD, no matter MWAIT_EXITING
>> control is set or not.
> 
> If I recall right, kvm was able to emulate mwait/monitor.
> So question is why it leads to exception instead?

KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
didn't find explicit proof from Intel SDM that #UD exceptions take
precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
example, in ancient machines which don't support MWAIT yet, the only way
it can do is #UD, not MWAIT VM exit?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
  2024-05-24 20:00 ` [PATCH V2 1/3] vl: Allow multiple -overcommit commands Zide Chen
  2024-05-27  5:19   ` Thomas Huth
@ 2024-05-30 13:39   ` Zhao Liu
  1 sibling, 0 replies; 29+ messages in thread
From: Zhao Liu @ 2024-05-30 13:39 UTC (permalink / raw)
  To: Zide Chen
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial

On Fri, May 24, 2024 at 01:00:15PM -0700, Zide Chen wrote:
> Date: Fri, 24 May 2024 13:00:15 -0700
> From: Zide Chen <zide.chen@intel.com>
> Subject: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
> X-Mailer: git-send-email 2.34.1
> 
> Both cpu-pm and mem-lock are related to system resource overcommit, but
> they are separate from each other, in terms of how they are realized,
> and of course, they are applied to different system resources.
> 
> It's tempting to use separate command lines to specify their behavior.
> e.g., in the following example, the cpu-pm command is quietly
> overwritten, and it's not easy to notice it without careful inspection.
> 
>   --overcommit mem-lock=on
>   --overcommit cpu-pm=on
> 
> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> 
> v2:
> 
> Thanks to Thomas' suggestion, changed to this better approach, which
> is more generic and can handle situations like: "enabled the option in
> the config file, and now you'd like to disable it on the command line
> again".
> 
>  system/vl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-29 17:31       ` Chen, Zide
@ 2024-05-30 13:54         ` Zhao Liu
  2024-05-30 14:34           ` Igor Mammedov
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Zhao Liu @ 2024-05-30 13:54 UTC (permalink / raw)
  To: Chen, Zide
  Cc: Igor Mammedov, qemu-devel, pbonzini, mst, thuth, cfontana,
	xiaoyao.li, qemu-trivial

Hi Zide,

On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> Date: Wed, 29 May 2024 10:31:21 -0700
> From: "Chen, Zide" <zide.chen@intel.com>
> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> 
> 
> 
> On 5/29/2024 5:46 AM, Igor Mammedov wrote:
> > On Tue, 28 May 2024 11:16:59 -0700
> > "Chen, Zide" <zide.chen@intel.com> wrote:
> > 
> >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> >>> On Fri, 24 May 2024 13:00:14 -0700
> >>> Zide Chen <zide.chen@intel.com> wrote:
> >>>   
> >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
> >>>
> >>> this is missing proper description how do you trigger issue
> >>> with reproducer and detailed description why guest sees MWAIT
> >>> when it's not supported by host.  
> >>
> >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
> > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > hardware was used if it's relevant so others can reproduce problem.
> 
> I ever reproduced this on an older Intel Icelake machine, a
> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> issue, not specific to particular models.
> 
> For the CLI, I think the only command line options that matter are
>  -overcommit cpu-pm=on: to set enable_cpu_pm
>  -cpu host: so that cpu->max_features is set
> 
> For QEMU version, as long as it's after this commit: 662175b91ff2
> ("i386: reorder call to cpu_exec_realizefn")
> 
> The guest fails to boot:
> 
> [ 24.825568] smpboot: x86: Booting SMP configuration:
> [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> #13 #14 #15 #17
> [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> #136 #137 #138 #139 #140 #141 #142 #143 #145
> [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI
> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046
> [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX:
> 0000000000000000
> [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI:
> 0000000000000004
> [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09:
> 0000000000000001
> [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12:
> 0000000000000000
> [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15:
> 0000000000093ff0
> [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000)
> knlGS:0000000000000000
> [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4:
> 0000000000770ef0
> [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> 0000000000000400
> [ 25.137790] PKRU: 55555554
> [ 25.137790] Call Trace:
> [ 25.137790] <TASK>
> [ 25.137790] ? die+0x37/0x90
> [ 25.137790] ? do_trap+0xe3/0x110
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] ? do_error_trap+0x6a/0x90
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] ? exc_invalid_op+0x52/0x70
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] default_idle_call+0x30/0x100
> [ 25.137790] cpuidle_idle_call+0x12c/0x170
> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> [ 25.137790] do_idle+0x7f/0xd0
> [ 25.137790] cpu_startup_entry+0x29/0x30
> [ 25.137790] rest_init+0xcc/0xd0
> [ 25.137790] start_kernel+0x396/0x5d0
> [ 25.137790] x86_64_start_reservations+0x18/0x30
> [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> [ 25.137790] common_startup_64+0x13e/0x148
> [ 25.137790] </TASK>
> [ 25.137790] Modules linked in:
> [ 25.137790] --[ end trace 0000000000000000 ]--
> [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI
> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> 
> > 
> >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> >> that it doesn't have a chance to check MWAIT against host features and
> >> will be advertised to the guest regardless of whether it's supported by
> >> the host or not.
> >>
> >> x86_cpu_realizefn()
> >>   x86_cpu_filter_features()
> >>   cpu_exec_realizefn()
> >>     kvm_cpu_realizefn
> >>       host_cpu_realizefn
> >>         host_cpu_enable_cpu_pm
> >>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> >>
> >>
> >> If it's not supported by the host, executing MONITOR or MWAIT
> >> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> >> control is set or not.
> > 
> > If I recall right, kvm was able to emulate mwait/monitor.
> > So question is why it leads to exception instead?
> 
> KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
> didn't find explicit proof from Intel SDM that #UD exceptions take
> precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
> example, in ancient machines which don't support MWAIT yet, the only way
> it can do is #UD, not MWAIT VM exit?

For the Host which doesn't support MWAIT, it shouldn't have the VMX
control bit for mwait exit either, right?

Could you pls check this on your machine? If VMX doesn't support this
exit event, then triggering an exception will make sense.

-Zhao



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
  2024-05-27  5:19   ` Thomas Huth
@ 2024-05-30 14:01     ` Zhao Liu
  2024-05-31  4:57       ` Thomas Huth
  0 siblings, 1 reply; 29+ messages in thread
From: Zhao Liu @ 2024-05-30 14:01 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Zide Chen, qemu-devel, pbonzini, mst, cfontana, xiaoyao.li,
	qemu-trivial

On Mon, May 27, 2024 at 07:19:56AM +0200, Thomas Huth wrote:
> Date: Mon, 27 May 2024 07:19:56 +0200
> From: Thomas Huth <thuth@redhat.com>
> Subject: Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
> 
> On 24/05/2024 22.00, Zide Chen wrote:
> > Both cpu-pm and mem-lock are related to system resource overcommit, but
> > they are separate from each other, in terms of how they are realized,
> > and of course, they are applied to different system resources.
> > 
> > It's tempting to use separate command lines to specify their behavior.
> > e.g., in the following example, the cpu-pm command is quietly
> > overwritten, and it's not easy to notice it without careful inspection.
> > 
> >    --overcommit mem-lock=on
> >    --overcommit cpu-pm=on
> > 
> > Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
> > Suggested-by: Thomas Huth <thuth@redhat.com>
> > Signed-off-by: Zide Chen <zide.chen@intel.com>
> > ---
> > 
> > v2:
> > 
> > Thanks to Thomas' suggestion, changed to this better approach, which
> > is more generic and can handle situations like: "enabled the option in
> > the config file, and now you'd like to disable it on the command line
> > again".
> > 
> >   system/vl.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/system/vl.c b/system/vl.c
> > index a3eede5fa5b8..dfa6cdd9283b 100644
> > --- a/system/vl.c
> > +++ b/system/vl.c
> > @@ -3545,8 +3545,8 @@ void qemu_init(int argc, char **argv)
> >                   if (!opts) {
> >                       exit(1);
> >                   }
> > -                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
> > -                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
> > +                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
> > +                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
> >                   break;
> >               case QEMU_OPTION_compat:
> >                   {
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 

Hi Thomas,

BTW, do you think it's a good idea to define the overcommit via QAPI way
(defined in a json file)? ;-)

My rough understanding is that all APIs are better to be defined via
QAPI to go JSON compatible.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-30 13:54         ` Zhao Liu
@ 2024-05-30 14:34           ` Igor Mammedov
  2024-05-30 14:53             ` Sean Christopherson
  2024-05-30 14:49           ` Igor Mammedov
  2024-05-30 16:15           ` Chen, Zide
  2 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-05-30 14:34 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Chen, Zide, qemu-devel, pbonzini, mst, thuth, cfontana,
	xiaoyao.li, qemu-trivial, Sean Christopherson

On Thu, 30 May 2024 21:54:47 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> > Date: Wed, 29 May 2024 10:31:21 -0700
> > From: "Chen, Zide" <zide.chen@intel.com>
> > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> > 
> > 
> > 
> > On 5/29/2024 5:46 AM, Igor Mammedov wrote:  
> > > On Tue, 28 May 2024 11:16:59 -0700
> > > "Chen, Zide" <zide.chen@intel.com> wrote:
> > >   
> > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:  
> > >>> On Fri, 24 May 2024 13:00:14 -0700
> > >>> Zide Chen <zide.chen@intel.com> wrote:
> > >>>     
> > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.    
> > >>>
> > >>> this is missing proper description how do you trigger issue
> > >>> with reproducer and detailed description why guest sees MWAIT
> > >>> when it's not supported by host.    
> > >>
> > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the  
> > > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > > hardware was used if it's relevant so others can reproduce problem.  
> > 
> > I ever reproduced this on an older Intel Icelake machine, a
> > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> > issue, not specific to particular models.
> > 
> > For the CLI, I think the only command line options that matter are
> >  -overcommit cpu-pm=on: to set enable_cpu_pm
> >  -cpu host: so that cpu->max_features is set
> > 
> > For QEMU version, as long as it's after this commit: 662175b91ff2
> > ("i386: reorder call to cpu_exec_realizefn")
> > 
> > The guest fails to boot:
> > 
> > [ 24.825568] smpboot: x86: Booting SMP configuration:
> > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> > #13 #14 #15 #17
> > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> > #136 #137 #138 #139 #140 #141 #142 #143 #145
> > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI
> > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046
> > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX:
> > 0000000000000000
> > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI:
> > 0000000000000004
> > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09:
> > 0000000000000001
> > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12:
> > 0000000000000000
> > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15:
> > 0000000000093ff0
> > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000)
> > knlGS:0000000000000000
> > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4:
> > 0000000000770ef0
> > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> > 0000000000000400
> > [ 25.137790] PKRU: 55555554
> > [ 25.137790] Call Trace:
> > [ 25.137790] <TASK>
> > [ 25.137790] ? die+0x37/0x90
> > [ 25.137790] ? do_trap+0xe3/0x110
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? do_error_trap+0x6a/0x90
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? exc_invalid_op+0x52/0x70
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] default_idle_call+0x30/0x100
> > [ 25.137790] cpuidle_idle_call+0x12c/0x170
> > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> > [ 25.137790] do_idle+0x7f/0xd0
> > [ 25.137790] cpu_startup_entry+0x29/0x30
> > [ 25.137790] rest_init+0xcc/0xd0
> > [ 25.137790] start_kernel+0x396/0x5d0
> > [ 25.137790] x86_64_start_reservations+0x18/0x30
> > [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> > [ 25.137790] common_startup_64+0x13e/0x148
> > [ 25.137790] </TASK>
> > [ 25.137790] Modules linked in:
> > [ 25.137790] --[ end trace 0000000000000000 ]--
> > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI
> > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> >   
> > >   
> > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> > >> that it doesn't have a chance to check MWAIT against host features and
> > >> will be advertised to the guest regardless of whether it's supported by
> > >> the host or not.
> > >>
> > >> x86_cpu_realizefn()
> > >>   x86_cpu_filter_features()
> > >>   cpu_exec_realizefn()
> > >>     kvm_cpu_realizefn
> > >>       host_cpu_realizefn
> > >>         host_cpu_enable_cpu_pm
> > >>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> > >>
> > >>
> > >> If it's not supported by the host, executing MONITOR or MWAIT
> > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> > >> control is set or not.  
> > > 
> > > If I recall right, kvm was able to emulate mwait/monitor.
> > > So question is why it leads to exception instead?  
> > 
> > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
> > didn't find explicit proof from Intel SDM that #UD exceptions take
> > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
> > example, in ancient machines which don't support MWAIT yet, the only way
> > it can do is #UD, not MWAIT VM exit?  
> 
> For the Host which doesn't support MWAIT, it shouldn't have the VMX
> control bit for mwait exit either, right?
> 
> Could you pls check this on your machine? If VMX doesn't support this
> exit event, then triggering an exception will make sense.

My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable,
unless we have KVM_CAP_X86_DISABLE_EXITS enabled. And in the later case it would
explode as expected, however then we shouldn't be able to set KVM_CAP_X86_DISABLE_EXITS
to begin with.

Recently Sean posted a patch related to that
[PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  https://lkml.org/lkml/2024/5/17/729

This needs someone with KVM expertise to chime in
Perhaps Paolo/Sean could clarify expected behavior.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-30 13:54         ` Zhao Liu
  2024-05-30 14:34           ` Igor Mammedov
@ 2024-05-30 14:49           ` Igor Mammedov
  2024-06-02 21:54             ` Michael S. Tsirkin
  2024-05-30 16:15           ` Chen, Zide
  2 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-05-30 14:49 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Chen, Zide, qemu-devel, pbonzini, mst, thuth, cfontana,
	xiaoyao.li, qemu-trivial, seanjc

On Thu, 30 May 2024 21:54:47 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> > Date: Wed, 29 May 2024 10:31:21 -0700
> > From: "Chen, Zide" <zide.chen@intel.com>
> > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> > 
> > 
> > 
> > On 5/29/2024 5:46 AM, Igor Mammedov wrote:  
> > > On Tue, 28 May 2024 11:16:59 -0700
> > > "Chen, Zide" <zide.chen@intel.com> wrote:
> > >   
> > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:  
> > >>> On Fri, 24 May 2024 13:00:14 -0700
> > >>> Zide Chen <zide.chen@intel.com> wrote:
> > >>>     
> > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.    
> > >>>
> > >>> this is missing proper description how do you trigger issue
> > >>> with reproducer and detailed description why guest sees MWAIT
> > >>> when it's not supported by host.    
> > >>
> > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the  
> > > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > > hardware was used if it's relevant so others can reproduce problem.  
> > 
> > I ever reproduced this on an older Intel Icelake machine, a
> > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> > issue, not specific to particular models.
> > 
> > For the CLI, I think the only command line options that matter are
> >  -overcommit cpu-pm=on: to set enable_cpu_pm
> >  -cpu host: so that cpu->max_features is set
> > 
> > For QEMU version, as long as it's after this commit: 662175b91ff2
> > ("i386: reorder call to cpu_exec_realizefn")
> > 
> > The guest fails to boot:
> > 
> > [ 24.825568] smpboot: x86: Booting SMP configuration:
> > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> > #13 #14 #15 #17
> > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> > #136 #137 #138 #139 #140 #141 #142 #143 #145
> > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI
> > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046
> > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX:
> > 0000000000000000
> > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI:
> > 0000000000000004
> > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09:
> > 0000000000000001
> > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12:
> > 0000000000000000
> > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15:
> > 0000000000093ff0
> > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000)
> > knlGS:0000000000000000
> > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4:
> > 0000000000770ef0
> > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> > 0000000000000400
> > [ 25.137790] PKRU: 55555554
> > [ 25.137790] Call Trace:
> > [ 25.137790] <TASK>
> > [ 25.137790] ? die+0x37/0x90
> > [ 25.137790] ? do_trap+0xe3/0x110
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? do_error_trap+0x6a/0x90
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? exc_invalid_op+0x52/0x70
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] default_idle_call+0x30/0x100
> > [ 25.137790] cpuidle_idle_call+0x12c/0x170
> > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> > [ 25.137790] do_idle+0x7f/0xd0
> > [ 25.137790] cpu_startup_entry+0x29/0x30
> > [ 25.137790] rest_init+0xcc/0xd0
> > [ 25.137790] start_kernel+0x396/0x5d0
> > [ 25.137790] x86_64_start_reservations+0x18/0x30
> > [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> > [ 25.137790] common_startup_64+0x13e/0x148
> > [ 25.137790] </TASK>
> > [ 25.137790] Modules linked in:
> > [ 25.137790] --[ end trace 0000000000000000 ]--
> > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI
> > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> >   
> > >   
> > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> > >> that it doesn't have a chance to check MWAIT against host features and
> > >> will be advertised to the guest regardless of whether it's supported by
> > >> the host or not.
> > >>
> > >> x86_cpu_realizefn()
> > >>   x86_cpu_filter_features()
> > >>   cpu_exec_realizefn()
> > >>     kvm_cpu_realizefn
> > >>       host_cpu_realizefn
> > >>         host_cpu_enable_cpu_pm
> > >>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> > >>
> > >>
> > >> If it's not supported by the host, executing MONITOR or MWAIT
> > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> > >> control is set or not.  
> > > 
> > > If I recall right, kvm was able to emulate mwait/monitor.
> > > So question is why it leads to exception instead?  
> > 
> > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
> > didn't find explicit proof from Intel SDM that #UD exceptions take
> > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
> > example, in ancient machines which don't support MWAIT yet, the only way
> > it can do is #UD, not MWAIT VM exit?  
> 
> For the Host which doesn't support MWAIT, it shouldn't have the VMX
> control bit for mwait exit either, right?
> 
> Could you pls check this on your machine? If VMX doesn't support this
> exit event, then triggering an exception will make sense.

My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable,
unless we have KVM_CAP_X86_DISABLE_EXITS enabled. And in the later case it would
explode as expected, however then we shouldn't be able to set KVM_CAP_X86_DISABLE_EXITS
to begin with.

Recently Sean posted a patch related to that
[PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  https://lkml.org/lkml/2024/5/17/729

This needs someone with KVM expertise to chime in
Perhaps Paolo/Sean could clarify expected behavior.


> 
> -Zhao
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-30 14:34           ` Igor Mammedov
@ 2024-05-30 14:53             ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2024-05-30 14:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Zhao Liu, Zide Chen, qemu-devel, pbonzini, mst, thuth, cfontana,
	xiaoyao.li, qemu-trivial

On Thu, May 30, 2024, Igor Mammedov wrote:
> On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu <zhao1.liu@intel.com> wrote:

...

> > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> > > >> that it doesn't have a chance to check MWAIT against host features and
> > > >> will be advertised to the guest regardless of whether it's supported by
> > > >> the host or not.
> > > >>
> > > >> x86_cpu_realizefn()
> > > >>   x86_cpu_filter_features()
> > > >>   cpu_exec_realizefn()
> > > >>     kvm_cpu_realizefn
> > > >>       host_cpu_realizefn
> > > >>         host_cpu_enable_cpu_pm
> > > >>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> > > >>
> > > >>
> > > >> If it's not supported by the host, executing MONITOR or MWAIT
> > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> > > >> control is set or not.  
> > > > 
> > > > If I recall right, kvm was able to emulate mwait/monitor.
> > > > So question is why it leads to exception instead?

Because KVM doesn't emulated MONITOR/MWAIT on #UD.

> > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
> > > didn't find explicit proof from Intel SDM that #UD exceptions take
> > > precedence over MWAIT/MONITOR VM exits, but this is my speculation.

Yeah, typically #UD takes priority over VM-Exit interception checks.  AMD's APM
is much more explicit and states that all exceptions are checked on MONITOR/MWAIT
before the interception check.

> > > For example, in ancient machines which don't support MWAIT yet, the only
> > > way it can do is #UD, not MWAIT VM exit?  

Not really relevant, because such CPUs wouldn't have MWAIT-exiting.

> > For the Host which doesn't support MWAIT, it shouldn't have the VMX
> > control bit for mwait exit either, right?
> > 
> > Could you pls check this on your machine? If VMX doesn't support this
> > exit event, then triggering an exception will make sense.
> 
> My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable,

Nope.  In order to limit the attack surface of the emulator on modern CPUs, KVM
only emulates select instructions in response to a #UD.

But even if KVM did emulate MONITOR/MWAIT on #UD, this is inarguably a QEMU bug,
e.g. QEMU will effectively coerce the guest into using a idle-polling mechanism.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-30 13:54         ` Zhao Liu
  2024-05-30 14:34           ` Igor Mammedov
  2024-05-30 14:49           ` Igor Mammedov
@ 2024-05-30 16:15           ` Chen, Zide
  2 siblings, 0 replies; 29+ messages in thread
From: Chen, Zide @ 2024-05-30 16:15 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Igor Mammedov, qemu-devel, pbonzini, mst, thuth, cfontana,
	xiaoyao.li, qemu-trivial



On 5/30/2024 6:54 AM, Zhao Liu wrote:
> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
>> Date: Wed, 29 May 2024 10:31:21 -0700
>> From: "Chen, Zide" <zide.chen@intel.com>
>> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
>>
>>
>>
>> On 5/29/2024 5:46 AM, Igor Mammedov wrote:
>>> On Tue, 28 May 2024 11:16:59 -0700
>>> "Chen, Zide" <zide.chen@intel.com> wrote:
>>>
>>>> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
>>>>> On Fri, 24 May 2024 13:00:14 -0700
>>>>> Zide Chen <zide.chen@intel.com> wrote:
>>>>>   
>>>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>>>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>>>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
>>>>>
>>>>> this is missing proper description how do you trigger issue
>>>>> with reproducer and detailed description why guest sees MWAIT
>>>>> when it's not supported by host.  
>>>>
>>>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
>>> it's bette to provide full QEMU CLI and host/guest kernels used and what
>>> hardware was used if it's relevant so others can reproduce problem.
>>
>> I ever reproduced this on an older Intel Icelake machine, a
>> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
>> issue, not specific to particular models.
>>
>> For the CLI, I think the only command line options that matter are
>>  -overcommit cpu-pm=on: to set enable_cpu_pm
>>  -cpu host: so that cpu->max_features is set
>>
>> For QEMU version, as long as it's after this commit: 662175b91ff2
>> ("i386: reorder call to cpu_exec_realizefn")
>>
>> The guest fails to boot:
>>
>> [ 24.825568] smpboot: x86: Booting SMP configuration:
>> [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
>> #13 #14 #15 #17
>> [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
>> #136 #137 #138 #139 #140 #141 #142 #143 #145
>> [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI
>> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
>> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
>> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
>> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
>> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
>> [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046
>> [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX:
>> 0000000000000000
>> [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI:
>> 0000000000000004
>> [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09:
>> 0000000000000001
>> [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12:
>> 0000000000000000
>> [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15:
>> 0000000000093ff0
>> [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000)
>> knlGS:0000000000000000
>> [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4:
>> 0000000000770ef0
>> [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
>> 0000000000000400
>> [ 25.137790] PKRU: 55555554
>> [ 25.137790] Call Trace:
>> [ 25.137790] <TASK>
>> [ 25.137790] ? die+0x37/0x90
>> [ 25.137790] ? do_trap+0xe3/0x110
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? do_error_trap+0x6a/0x90
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? exc_invalid_op+0x52/0x70
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] default_idle_call+0x30/0x100
>> [ 25.137790] cpuidle_idle_call+0x12c/0x170
>> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
>> [ 25.137790] do_idle+0x7f/0xd0
>> [ 25.137790] cpu_startup_entry+0x29/0x30
>> [ 25.137790] rest_init+0xcc/0xd0
>> [ 25.137790] start_kernel+0x396/0x5d0
>> [ 25.137790] x86_64_start_reservations+0x18/0x30
>> [ 25.137790] x86_64_start_kernel+0xe7/0xf0
>> [ 25.137790] common_startup_64+0x13e/0x148
>> [ 25.137790] </TASK>
>> [ 25.137790] Modules linked in:
>> [ 25.137790] --[ end trace 0000000000000000 ]--
>> [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI
>> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
>> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
>> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
>>
>>>
>>>> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
>>>> that it doesn't have a chance to check MWAIT against host features and
>>>> will be advertised to the guest regardless of whether it's supported by
>>>> the host or not.
>>>>
>>>> x86_cpu_realizefn()
>>>>   x86_cpu_filter_features()
>>>>   cpu_exec_realizefn()
>>>>     kvm_cpu_realizefn
>>>>       host_cpu_realizefn
>>>>         host_cpu_enable_cpu_pm
>>>>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>>>
>>>>
>>>> If it's not supported by the host, executing MONITOR or MWAIT
>>>> instructions from the guest triggers #UD, no matter MWAIT_EXITING
>>>> control is set or not.
>>>
>>> If I recall right, kvm was able to emulate mwait/monitor.
>>> So question is why it leads to exception instead?
>>
>> KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
>> didn't find explicit proof from Intel SDM that #UD exceptions take
>> precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
>> example, in ancient machines which don't support MWAIT yet, the only way
>> it can do is #UD, not MWAIT VM exit?
> 
> For the Host which doesn't support MWAIT, it shouldn't have the VMX
> control bit for mwait exit either, right?
> 
> Could you pls check this on your machine? If VMX doesn't support this
> exit event, then triggering an exception will make sense.


As Sean just confirmed, #UD takes priority over MWAIT exiting VM-exit,
thus if the host doesn't support MWAIT, regardless the MWAIT exiting is
set or not, executing MWAIT instruction from the guest triggers #UD, and
the guest doesn't boot.

This is not desired and VMM should not advertise MWAIT to the guest in
this case.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
  2024-05-30 14:01     ` Zhao Liu
@ 2024-05-31  4:57       ` Thomas Huth
  2024-06-03  8:44         ` Markus Armbruster
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2024-05-31  4:57 UTC (permalink / raw)
  To: Zhao Liu, Markus Armbruster
  Cc: Zide Chen, qemu-devel, pbonzini, mst, cfontana, xiaoyao.li,
	qemu-trivial

On 30/05/2024 16.01, Zhao Liu wrote:
> On Mon, May 27, 2024 at 07:19:56AM +0200, Thomas Huth wrote:
>> Date: Mon, 27 May 2024 07:19:56 +0200
>> From: Thomas Huth <thuth@redhat.com>
>> Subject: Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
>>
>> On 24/05/2024 22.00, Zide Chen wrote:
>>> Both cpu-pm and mem-lock are related to system resource overcommit, but
>>> they are separate from each other, in terms of how they are realized,
>>> and of course, they are applied to different system resources.
>>>
>>> It's tempting to use separate command lines to specify their behavior.
>>> e.g., in the following example, the cpu-pm command is quietly
>>> overwritten, and it's not easy to notice it without careful inspection.
>>>
>>>     --overcommit mem-lock=on
>>>     --overcommit cpu-pm=on
>>>
>>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
>>> Suggested-by: Thomas Huth <thuth@redhat.com>
>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>> ---
>>>
>>> v2:
>>>
>>> Thanks to Thomas' suggestion, changed to this better approach, which
>>> is more generic and can handle situations like: "enabled the option in
>>> the config file, and now you'd like to disable it on the command line
>>> again".
>>>
>>>    system/vl.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/vl.c b/system/vl.c
>>> index a3eede5fa5b8..dfa6cdd9283b 100644
>>> --- a/system/vl.c
>>> +++ b/system/vl.c
>>> @@ -3545,8 +3545,8 @@ void qemu_init(int argc, char **argv)
>>>                    if (!opts) {
>>>                        exit(1);
>>>                    }
>>> -                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
>>> -                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
>>> +                enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
>>> +                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
>>>                    break;
>>>                case QEMU_OPTION_compat:
>>>                    {
>>
>> Reviewed-by: Thomas Huth <thuth@redhat.com>
>>
> 
> Hi Thomas,
> 
> BTW, do you think it's a good idea to define the overcommit via QAPI way
> (defined in a json file)? ;-)
> 
> My rough understanding is that all APIs are better to be defined via
> QAPI to go JSON compatible.

Sorry, no clue whether it makes sense here... CC:-ing Markus for 
recommendations.

  Thomas




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-05-24 20:00 ` [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features Zide Chen
@ 2024-05-31  6:30   ` Zhao Liu
  2024-05-31 17:13     ` Chen, Zide
  0 siblings, 1 reply; 29+ messages in thread
From: Zhao Liu @ 2024-05-31  6:30 UTC (permalink / raw)
  To: Zide Chen
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial

Hi Zide,

On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
> Date: Fri, 24 May 2024 13:00:16 -0700
> From: Zide Chen <zide.chen@intel.com>
> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>  x86_cpu_filter_features
> X-Mailer: git-send-email 2.34.1
> 
> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> features.  e.g., some accel-specific options may require extra features
> to be enabled, and it's appropriate to expand these features in accel-
> specific realizefn.
> 
> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> 
> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> that it won't expose features not supported by the host.
> 
> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/cpu.c         | 24 ++++++++++++------------
>  target/i386/kvm/kvm-cpu.c |  1 -
>  2 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index bc2dceb647fa..a1c1c785bd2f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>          }
>      }
>  
> +    /*
> +     * note: the call to the framework needs to happen after feature expansion,
> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> +     * These may be set by the accel-specific code,
> +     * and the results are subsequently checked / assumed in this function.
> +     */
> +    cpu_exec_realizefn(cs, &local_err);
> +    if (local_err != NULL) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);

For your case, which sets cpu-pm=on via overcommit, then
x86_cpu_filter_features() will complain that mwait is not supported.

Such warning is not necessary, because the purpose of overcommit (from
code) is only to support mwait when possible, not to commit to support
mwait in Guest.

Additionally, I understand x86_cpu_filter_features() is primarily
intended to filter features configured by the user, and the changes of
CPUID after x86_cpu_filter_features() should by default be regarded like
"QEMU knows what it is doing".

I feel adding a check for the CPUID mwait bit in host_cpu_realizefn()
is enough, after all, this bit should be present if host supports mwait
and enable_cpu_pm (in kvm_arch_get_supported_cpuid()).

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()
  2024-05-24 20:00 ` [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() Zide Chen
@ 2024-05-31  6:53   ` Zhao Liu
  2024-05-31 17:13     ` Chen, Zide
  0 siblings, 1 reply; 29+ messages in thread
From: Zhao Liu @ 2024-05-31  6:53 UTC (permalink / raw)
  To: Zide Chen
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial

On Fri, May 24, 2024 at 01:00:17PM -0700, Zide Chen wrote:
> Date: Fri, 24 May 2024 13:00:17 -0700
> From: Zide Chen <zide.chen@intel.com>
> Subject: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into
>  kvm_cpu_realizefn()
> X-Mailer: git-send-email 2.34.1
> 
> It seems not a good idea to expand features in host_cpu_realizefn,
> which is for host CPU only. 

It is restricted by max_features and should be part of the "max" CPU,
and the current target/i386/host-cpu.c should only serve the “host” CPU.

> Additionally, cpu-pm option is KVM
> specific, and it's cleaner to put it in kvm_cpu_realizefn(), together
> with the WAITPKG code.
> 
> Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/host-cpu.c    | 12 ------------
>  target/i386/kvm/kvm-cpu.c | 11 +++++++++--
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
> index 280e427c017c..8b8bf5afeccf 100644
> --- a/target/i386/host-cpu.c
> +++ b/target/i386/host-cpu.c
> @@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void)
>      return host_phys_bits;
>  }
>  
> -static void host_cpu_enable_cpu_pm(X86CPU *cpu)
> -{
> -    CPUX86State *env = &cpu->env;
> -
> -    host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
> -               &cpu->mwait.ecx, &cpu->mwait.edx);
> -    env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> -}
> -
>  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>  {
>      uint32_t host_phys_bits = host_cpu_phys_bits();
> @@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp)
>      X86CPU *cpu = X86_CPU(cs);
>      CPUX86State *env = &cpu->env;
>  
> -    if (cpu->max_features && enable_cpu_pm) {
> -        host_cpu_enable_cpu_pm(cpu);
> -    }
>      if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
>          uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
>  
> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> index 3adcedf0dbc3..197c892da89a 100644
> --- a/target/i386/kvm/kvm-cpu.c
> +++ b/target/i386/kvm/kvm-cpu.c
> @@ -64,9 +64,16 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>       *   cpu_common_realizefn() (via xcc->parent_realize)
>       */
>      if (cpu->max_features) {
> -        if (enable_cpu_pm && kvm_has_waitpkg()) {
> -            env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
> +        if (enable_cpu_pm) {
> +            if (kvm_has_waitpkg()) {
> +                env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
> +            }

If you agree with my comment in patch 2, here we need a mwait bit check.

> +            host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
> +                       &cpu->mwait.ecx, &cpu->mwait.edx);
> +            env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>          }
> +
>          if (cpu->ucode_rev == 0) {
>              cpu->ucode_rev =
>                  kvm_arch_get_supported_msr_feature(kvm_state,
> -- 
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-05-31  6:30   ` Zhao Liu
@ 2024-05-31 17:13     ` Chen, Zide
  2024-06-01 15:26       ` Zhao Liu
  0 siblings, 1 reply; 29+ messages in thread
From: Chen, Zide @ 2024-05-31 17:13 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial



On 5/30/2024 11:30 PM, Zhao Liu wrote:
> Hi Zide,
> 
> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>> Date: Fri, 24 May 2024 13:00:16 -0700
>> From: Zide Chen <zide.chen@intel.com>
>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>  x86_cpu_filter_features
>> X-Mailer: git-send-email 2.34.1
>>
>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>> features.  e.g., some accel-specific options may require extra features
>> to be enabled, and it's appropriate to expand these features in accel-
>> specific realizefn.
>>
>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>
>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>> that it won't expose features not supported by the host.
>>
>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>  target/i386/kvm/kvm-cpu.c |  1 -
>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index bc2dceb647fa..a1c1c785bd2f 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>          }
>>      }
>>  
>> +    /*
>> +     * note: the call to the framework needs to happen after feature expansion,
>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>> +     * These may be set by the accel-specific code,
>> +     * and the results are subsequently checked / assumed in this function.
>> +     */
>> +    cpu_exec_realizefn(cs, &local_err);
>> +    if (local_err != NULL) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
> 
> For your case, which sets cpu-pm=on via overcommit, then
> x86_cpu_filter_features() will complain that mwait is not supported.
> 
> Such warning is not necessary, because the purpose of overcommit (from
> code) is only to support mwait when possible, not to commit to support
> mwait in Guest.
> 
> Additionally, I understand x86_cpu_filter_features() is primarily
> intended to filter features configured by the user, 

Yes, that's why this patches intends to let x86_cpu_filter_features()
filter out the MWAIT bit which is set from the overcommit option.

> and the changes of
> CPUID after x86_cpu_filter_features() should by default be regarded like
> "QEMU knows what it is doing".

Sure, we can add feature bits after x86_cpu_filter_features(), but I
think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
more generic, and actually this is what QEMU did before commit 662175b91ff2.

- Less redundant code. Specifically, no need to call
x86_cpu_get_supported_feature_word() again.
- Potentially there could be other features could be added from the
accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
features need to be checked against the host availability.

> 
> I feel adding a check for the CPUID mwait bit in host_cpu_realizefn()
> is enough, after all, this bit should be present if host supports mwait
> and enable_cpu_pm (in kvm_arch_get_supported_cpuid()).

Besides the above reasons, it seems to me expanding env->features in
host-cpu.c is confusing.

> Thanks,
> Zhao
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()
  2024-05-31  6:53   ` Zhao Liu
@ 2024-05-31 17:13     ` Chen, Zide
  0 siblings, 0 replies; 29+ messages in thread
From: Chen, Zide @ 2024-05-31 17:13 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial



On 5/30/2024 11:53 PM, Zhao Liu wrote:
> On Fri, May 24, 2024 at 01:00:17PM -0700, Zide Chen wrote:
>> Date: Fri, 24 May 2024 13:00:17 -0700
>> From: Zide Chen <zide.chen@intel.com>
>> Subject: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into
>>  kvm_cpu_realizefn()
>> X-Mailer: git-send-email 2.34.1
>>
>> It seems not a good idea to expand features in host_cpu_realizefn,
>> which is for host CPU only. 
> 
> It is restricted by max_features and should be part of the "max" CPU,
> and the current target/i386/host-cpu.c should only serve the “host” CPU.

Yes, since this file only serves for "host" CPU, that's why I proposed
to move this code to kvm_cpu_realizefn().

> 
>> Additionally, cpu-pm option is KVM
>> specific, and it's cleaner to put it in kvm_cpu_realizefn(), together
>> with the WAITPKG code.
>>
>> Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>>  target/i386/host-cpu.c    | 12 ------------
>>  target/i386/kvm/kvm-cpu.c | 11 +++++++++--
>>  2 files changed, 9 insertions(+), 14 deletions(-)
>>
>> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
>> index 280e427c017c..8b8bf5afeccf 100644
>> --- a/target/i386/host-cpu.c
>> +++ b/target/i386/host-cpu.c
>> @@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void)
>>      return host_phys_bits;
>>  }
>>  
>> -static void host_cpu_enable_cpu_pm(X86CPU *cpu)
>> -{
>> -    CPUX86State *env = &cpu->env;
>> -
>> -    host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
>> -               &cpu->mwait.ecx, &cpu->mwait.edx);
>> -    env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>> -}
>> -
>>  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>>  {
>>      uint32_t host_phys_bits = host_cpu_phys_bits();
>> @@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp)
>>      X86CPU *cpu = X86_CPU(cs);
>>      CPUX86State *env = &cpu->env;
>>  
>> -    if (cpu->max_features && enable_cpu_pm) {
>> -        host_cpu_enable_cpu_pm(cpu);
>> -    }
>>      if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
>>          uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
>>  
>> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
>> index 3adcedf0dbc3..197c892da89a 100644
>> --- a/target/i386/kvm/kvm-cpu.c
>> +++ b/target/i386/kvm/kvm-cpu.c
>> @@ -64,9 +64,16 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>>       *   cpu_common_realizefn() (via xcc->parent_realize)
>>       */
>>      if (cpu->max_features) {
>> -        if (enable_cpu_pm && kvm_has_waitpkg()) {
>> -            env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
>> +        if (enable_cpu_pm) {
>> +            if (kvm_has_waitpkg()) {
>> +                env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
>> +            }
> 
> If you agree with my comment in patch 2, here we need a mwait bit check.

I still think that take advantaged of x86_cpu_filter_features() to check
host availability is a better choice.

> 
>> +            host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
>> +                       &cpu->mwait.ecx, &cpu->mwait.edx);
>> +            env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>          }
>> +
>>          if (cpu->ucode_rev == 0) {
>>              cpu->ucode_rev =
>>                  kvm_arch_get_supported_msr_feature(kvm_state,
>> -- 
>> 2.34.1
>>
>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-05-31 17:13     ` Chen, Zide
@ 2024-06-01 15:26       ` Zhao Liu
  2024-06-03  9:30         ` Igor Mammedov
  2024-06-03 21:29         ` Chen, Zide
  0 siblings, 2 replies; 29+ messages in thread
From: Zhao Liu @ 2024-06-01 15:26 UTC (permalink / raw)
  To: Chen, Zide
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial

On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
> Date: Fri, 31 May 2024 10:13:47 -0700
> From: "Chen, Zide" <zide.chen@intel.com>
> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>  x86_cpu_filter_features
> 
> On 5/30/2024 11:30 PM, Zhao Liu wrote:
> > Hi Zide,
> > 
> > On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
> >> Date: Fri, 24 May 2024 13:00:16 -0700
> >> From: Zide Chen <zide.chen@intel.com>
> >> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>  x86_cpu_filter_features
> >> X-Mailer: git-send-email 2.34.1
> >>
> >> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> >> features.  e.g., some accel-specific options may require extra features
> >> to be enabled, and it's appropriate to expand these features in accel-
> >> specific realizefn.
> >>
> >> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> >>
> >> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> >> that it won't expose features not supported by the host.
> >>
> >> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> >> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >> Signed-off-by: Zide Chen <zide.chen@intel.com>
> >> ---
> >>  target/i386/cpu.c         | 24 ++++++++++++------------
> >>  target/i386/kvm/kvm-cpu.c |  1 -
> >>  2 files changed, 12 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index bc2dceb647fa..a1c1c785bd2f 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>          }
> >>      }
> >>  
> >> +    /*
> >> +     * note: the call to the framework needs to happen after feature expansion,
> >> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> >> +     * These may be set by the accel-specific code,
> >> +     * and the results are subsequently checked / assumed in this function.
> >> +     */
> >> +    cpu_exec_realizefn(cs, &local_err);
> >> +    if (local_err != NULL) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
> > 
> > For your case, which sets cpu-pm=on via overcommit, then
> > x86_cpu_filter_features() will complain that mwait is not supported.
> > 
> > Such warning is not necessary, because the purpose of overcommit (from
> > code) is only to support mwait when possible, not to commit to support
> > mwait in Guest.
> > 
> > Additionally, I understand x86_cpu_filter_features() is primarily
> > intended to filter features configured by the user, 
> 
> Yes, that's why this patches intends to let x86_cpu_filter_features()
> filter out the MWAIT bit which is set from the overcommit option.

HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
bit set by "-overcommit cpu-pm=on". ;-)

(Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
* Firstly, it set MWAIT bit in x86_cpu_expand_features():
  x86_cpu_expand_features()
     -> x86_cpu_get_supported_feature_word()
        -> kvm_arch_get_supported_cpuid()
 This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
 is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
 is working correctly here!

* Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
  neither Host's support or previous MWAIT enablement result. This is
  the root cause of your issue.

Therefore, we should make cpu-pm honor his first MWAIT enablement result
instead of repeatly and unconditionally setting the MWAIT bit again in
host_cpu_enable_cpu_pm().

Additionally, I think the code in x86_cpu_realizefn():
  cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
has the similar issue because it also should check MWAIT feature bit.

Further, it may be possible to remove cpu->mwait: just check the MWAIT
bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.

> > and the changes of
> > CPUID after x86_cpu_filter_features() should by default be regarded like
> > "QEMU knows what it is doing".
> 
> Sure, we can add feature bits after x86_cpu_filter_features(), but I
> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> more generic, and actually this is what QEMU did before commit 662175b91ff2.
> 
> - Less redundant code. Specifically, no need to call
> x86_cpu_get_supported_feature_word() again.
> - Potentially there could be other features could be added from the
> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> features need to be checked against the host availability.

Mainly I don't think this reorder is a direct fix for the problem (I
just analyse it above), also in your case x86_cpu_filter_features() will
print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
  2024-05-30 14:49           ` Igor Mammedov
@ 2024-06-02 21:54             ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2024-06-02 21:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Zhao Liu, Chen, Zide, qemu-devel, pbonzini, thuth, cfontana,
	xiaoyao.li, qemu-trivial, seanjc

On Thu, May 30, 2024 at 04:49:33PM +0200, Igor Mammedov wrote:
> On Thu, 30 May 2024 21:54:47 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
> 
> > Hi Zide,
> > 
> > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> > > Date: Wed, 29 May 2024 10:31:21 -0700
> > > From: "Chen, Zide" <zide.chen@intel.com>
> > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> > > 
> > > 
> > > 
> > > On 5/29/2024 5:46 AM, Igor Mammedov wrote:  
> > > > On Tue, 28 May 2024 11:16:59 -0700
> > > > "Chen, Zide" <zide.chen@intel.com> wrote:
> > > >   
> > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:  
> > > >>> On Fri, 24 May 2024 13:00:14 -0700
> > > >>> Zide Chen <zide.chen@intel.com> wrote:
> > > >>>     
> > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.    
> > > >>>
> > > >>> this is missing proper description how do you trigger issue
> > > >>> with reproducer and detailed description why guest sees MWAIT
> > > >>> when it's not supported by host.    
> > > >>
> > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the  
> > > > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > > > hardware was used if it's relevant so others can reproduce problem.  
> > > 
> > > I ever reproduced this on an older Intel Icelake machine, a
> > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> > > issue, not specific to particular models.
> > > 
> > > For the CLI, I think the only command line options that matter are
> > >  -overcommit cpu-pm=on: to set enable_cpu_pm
> > >  -cpu host: so that cpu->max_features is set
> > > 
> > > For QEMU version, as long as it's after this commit: 662175b91ff2
> > > ("i386: reorder call to cpu_exec_realizefn")
> > > 
> > > The guest fails to boot:
> > > 
> > > [ 24.825568] smpboot: x86: Booting SMP configuration:
> > > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> > > #13 #14 #15 #17
> > > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> > > #136 #137 #138 #139 #140 #141 #142 #143 #145
> > > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI
> > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046
> > > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX:
> > > 0000000000000000
> > > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI:
> > > 0000000000000004
> > > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09:
> > > 0000000000000001
> > > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12:
> > > 0000000000000000
> > > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15:
> > > 0000000000093ff0
> > > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000)
> > > knlGS:0000000000000000
> > > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4:
> > > 0000000000770ef0
> > > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> > > 0000000000000400
> > > [ 25.137790] PKRU: 55555554
> > > [ 25.137790] Call Trace:
> > > [ 25.137790] <TASK>
> > > [ 25.137790] ? die+0x37/0x90
> > > [ 25.137790] ? do_trap+0xe3/0x110
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] ? do_error_trap+0x6a/0x90
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] ? exc_invalid_op+0x52/0x70
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] default_idle_call+0x30/0x100
> > > [ 25.137790] cpuidle_idle_call+0x12c/0x170
> > > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> > > [ 25.137790] do_idle+0x7f/0xd0
> > > [ 25.137790] cpu_startup_entry+0x29/0x30
> > > [ 25.137790] rest_init+0xcc/0xd0
> > > [ 25.137790] start_kernel+0x396/0x5d0
> > > [ 25.137790] x86_64_start_reservations+0x18/0x30
> > > [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> > > [ 25.137790] common_startup_64+0x13e/0x148
> > > [ 25.137790] </TASK>
> > > [ 25.137790] Modules linked in:
> > > [ 25.137790] --[ end trace 0000000000000000 ]--
> > > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI
> > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > >   
> > > >   
> > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> > > >> that it doesn't have a chance to check MWAIT against host features and
> > > >> will be advertised to the guest regardless of whether it's supported by
> > > >> the host or not.
> > > >>
> > > >> x86_cpu_realizefn()
> > > >>   x86_cpu_filter_features()
> > > >>   cpu_exec_realizefn()
> > > >>     kvm_cpu_realizefn
> > > >>       host_cpu_realizefn
> > > >>         host_cpu_enable_cpu_pm
> > > >>           env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> > > >>
> > > >>
> > > >> If it's not supported by the host, executing MONITOR or MWAIT
> > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> > > >> control is set or not.  
> > > > 
> > > > If I recall right, kvm was able to emulate mwait/monitor.
> > > > So question is why it leads to exception instead?  
> > > 
> > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
> > > didn't find explicit proof from Intel SDM that #UD exceptions take
> > > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
> > > example, in ancient machines which don't support MWAIT yet, the only way
> > > it can do is #UD, not MWAIT VM exit?  
> > 
> > For the Host which doesn't support MWAIT, it shouldn't have the VMX
> > control bit for mwait exit either, right?
> > 
> > Could you pls check this on your machine? If VMX doesn't support this
> > exit event, then triggering an exception will make sense.
> 
> My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable,


emulating mwait correctly is very hard. KVM does not try.

> unless we have KVM_CAP_X86_DISABLE_EXITS enabled. And in the later case it would
> explode as expected, however then we shouldn't be able to set KVM_CAP_X86_DISABLE_EXITS
> to begin with.
> 
> Recently Sean posted a patch related to that
> [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
>   https://lkml.org/lkml/2024/5/17/729
> 
> This needs someone with KVM expertise to chime in
> Perhaps Paolo/Sean could clarify expected behavior.
> 
> 
> > 
> > -Zhao
> > 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 1/3] vl: Allow multiple -overcommit commands
  2024-05-31  4:57       ` Thomas Huth
@ 2024-06-03  8:44         ` Markus Armbruster
  0 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2024-06-03  8:44 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Zhao Liu, Zide Chen, qemu-devel, pbonzini, mst, cfontana,
	xiaoyao.li, qemu-trivial

Thomas Huth <thuth@redhat.com> writes:

> On 30/05/2024 16.01, Zhao Liu wrote:
>> Hi Thomas,
>> BTW, do you think it's a good idea to define the overcommit via QAPI way
>> (defined in a json file)? ;-)
>> My rough understanding is that all APIs are better to be defined via
>> QAPI to go JSON compatible.
>
> Sorry, no clue whether it makes sense here... CC:-ing Markus for recommendations.

I'd love to have a machine-friendly, QAPI-based CLI with a
human-friendly CLI layered on top, similar to machine-friendly,
QAPI-based QMP / human-friendly HMP.

To get this with reasonable effort, we need better infrastructure.  We
have done a few complex options manually, such as -blockdev.  I
recommend this only when there's a clear need for JSON on the command
line.

I doubt this is the case for -overcommit.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-06-01 15:26       ` Zhao Liu
@ 2024-06-03  9:30         ` Igor Mammedov
  2024-06-03 21:29           ` Chen, Zide
  2024-06-03 21:29         ` Chen, Zide
  1 sibling, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-06-03  9:30 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Chen, Zide, qemu-devel, pbonzini, mst, thuth, cfontana,
	xiaoyao.li, qemu-trivial

On Sat, 1 Jun 2024 23:26:55 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
> > Date: Fri, 31 May 2024 10:13:47 -0700
> > From: "Chen, Zide" <zide.chen@intel.com>
> > Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >  x86_cpu_filter_features
> > 
> > On 5/30/2024 11:30 PM, Zhao Liu wrote:  
> > > Hi Zide,
> > > 
> > > On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:  
> > >> Date: Fri, 24 May 2024 13:00:16 -0700
> > >> From: Zide Chen <zide.chen@intel.com>
> > >> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> > >>  x86_cpu_filter_features
> > >> X-Mailer: git-send-email 2.34.1
> > >>
> > >> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> > >> features.  e.g., some accel-specific options may require extra features
> > >> to be enabled, and it's appropriate to expand these features in accel-
> > >> specific realizefn.
> > >>
> > >> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> > >>
> > >> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> > >> that it won't expose features not supported by the host.
> > >>
> > >> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> > >> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > >> Signed-off-by: Zide Chen <zide.chen@intel.com>
> > >> ---
> > >>  target/i386/cpu.c         | 24 ++++++++++++------------
> > >>  target/i386/kvm/kvm-cpu.c |  1 -
> > >>  2 files changed, 12 insertions(+), 13 deletions(-)
> > >>
> > >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > >> index bc2dceb647fa..a1c1c785bd2f 100644
> > >> --- a/target/i386/cpu.c
> > >> +++ b/target/i386/cpu.c
> > >> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > >>          }
> > >>      }
> > >>  
> > >> +    /*
> > >> +     * note: the call to the framework needs to happen after feature expansion,
> > >> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> > >> +     * These may be set by the accel-specific code,
> > >> +     * and the results are subsequently checked / assumed in this function.
> > >> +     */
> > >> +    cpu_exec_realizefn(cs, &local_err);
> > >> +    if (local_err != NULL) {
> > >> +        error_propagate(errp, local_err);
> > >> +        return;
> > >> +    }
> > >> +
> > >>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);  
> > > 
> > > For your case, which sets cpu-pm=on via overcommit, then
> > > x86_cpu_filter_features() will complain that mwait is not supported.
> > > 
> > > Such warning is not necessary, because the purpose of overcommit (from
> > > code) is only to support mwait when possible, not to commit to support
> > > mwait in Guest.
> > > 
> > > Additionally, I understand x86_cpu_filter_features() is primarily
> > > intended to filter features configured by the user,   
> > 
> > Yes, that's why this patches intends to let x86_cpu_filter_features()
> > filter out the MWAIT bit which is set from the overcommit option.  
> 
> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> bit set by "-overcommit cpu-pm=on". ;-)
> 
> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>   x86_cpu_expand_features()
>      -> x86_cpu_get_supported_feature_word()
>         -> kvm_arch_get_supported_cpuid()  
>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>  is working correctly here!
> 
> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>   neither Host's support or previous MWAIT enablement result. This is
>   the root cause of your issue.
> 
> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> instead of repeatly and unconditionally setting the MWAIT bit again in
> host_cpu_enable_cpu_pm().
> 
> Additionally, I think the code in x86_cpu_realizefn():
>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> has the similar issue because it also should check MWAIT feature bit.
> 
> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.

Agreed with above analysis,
we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
gets us MWAIT already.

filling in cpu->mwait.ecx is benign mistake which likely doesn't
trigger anything if CPUID_EXT_MONITOR is not present.
But for clarity it's better to add an explicit check there as well.

> 
> > > and the changes of
> > > CPUID after x86_cpu_filter_features() should by default be regarded like
> > > "QEMU knows what it is doing".  
> > 
> > Sure, we can add feature bits after x86_cpu_filter_features(), but I
> > think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> > more generic, and actually this is what QEMU did before commit 662175b91ff2.
> > 
> > - Less redundant code. Specifically, no need to call
> > x86_cpu_get_supported_feature_word() again.
> > - Potentially there could be other features could be added from the
> > accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> > features need to be checked against the host availability.  
> 
> Mainly I don't think this reorder is a direct fix for the problem (I
> just analyse it above), also in your case x86_cpu_filter_features() will
> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.

There is no problem with warning, I'd even say it's a good thing.
But you are right reordering just masks the issue.

As for expected behavior, if user asked for "-overcommit cpu-pm=on"
there are 2 options:
   * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
   * QEMU shall fail to start.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-06-03  9:30         ` Igor Mammedov
@ 2024-06-03 21:29           ` Chen, Zide
  2024-06-05 15:07             ` Igor Mammedov
  0 siblings, 1 reply; 29+ messages in thread
From: Chen, Zide @ 2024-06-03 21:29 UTC (permalink / raw)
  To: Igor Mammedov, Zhao Liu
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial, Sean Christopherson



On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> On Sat, 1 Jun 2024 23:26:55 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
> 
>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>> From: "Chen, Zide" <zide.chen@intel.com>
>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>  x86_cpu_filter_features
>>>
>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:  
>>>> Hi Zide,
>>>>
>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:  
>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>> From: Zide Chen <zide.chen@intel.com>
>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>> X-Mailer: git-send-email 2.34.1
>>>>>
>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>> features.  e.g., some accel-specific options may require extra features
>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>> specific realizefn.
>>>>>
>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>
>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>> that it won't expose features not supported by the host.
>>>>>
>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>>> ---
>>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>> --- a/target/i386/cpu.c
>>>>> +++ b/target/i386/cpu.c
>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>>>          }
>>>>>      }
>>>>>  
>>>>> +    /*
>>>>> +     * note: the call to the framework needs to happen after feature expansion,
>>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>>> +     * These may be set by the accel-specific code,
>>>>> +     * and the results are subsequently checked / assumed in this function.
>>>>> +     */
>>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>>> +    if (local_err != NULL) {
>>>>> +        error_propagate(errp, local_err);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);  
>>>>
>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>
>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>> code) is only to support mwait when possible, not to commit to support
>>>> mwait in Guest.
>>>>
>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>> intended to filter features configured by the user,   
>>>
>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>> filter out the MWAIT bit which is set from the overcommit option.  
>>
>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>> bit set by "-overcommit cpu-pm=on". ;-)
>>
>> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
>> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>>   x86_cpu_expand_features()
>>      -> x86_cpu_get_supported_feature_word()
>>         -> kvm_arch_get_supported_cpuid()  
>>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>>  is working correctly here!
>>
>> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>>   neither Host's support or previous MWAIT enablement result. This is
>>   the root cause of your issue.
>>
>> Therefore, we should make cpu-pm honor his first MWAIT enablement result
>> instead of repeatly and unconditionally setting the MWAIT bit again in
>> host_cpu_enable_cpu_pm().
>>
>> Additionally, I think the code in x86_cpu_realizefn():
>>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> has the similar issue because it also should check MWAIT feature bit.
>>
>> Further, it may be possible to remove cpu->mwait: just check the MWAIT
>> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
>> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.
> 
> Agreed with above analysis,
> we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
> gets us MWAIT already.

Yes, I agree don't need to set CPUID_EXT_MONITOR besides
kvm_arch_get_supported_cpuid().

> 
> filling in cpu->mwait.ecx is benign mistake which likely doesn't
> trigger anything if CPUID_EXT_MONITOR is not present.
> But for clarity it's better to add an explicit check there as well.

Yes, I agree without MWAIT available and advertised, it's meaningless to
set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
MWAIT is available to the guest. But I don't understand the history of
why QEMU unconditionally advertises these two bits, and don't know it it
could break some thing if these two bits are removed.

Even if we want to fix these two bits, we can do it in another separate
patch.

e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
with further explanation.

2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
comment "We always wake on interrupt even if host does not have the
capability" to CPUID_MWAIT_IBE.


> 
>>
>>>> and the changes of
>>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>>> "QEMU knows what it is doing".  
>>>
>>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>>
>>> - Less redundant code. Specifically, no need to call
>>> x86_cpu_get_supported_feature_word() again.
>>> - Potentially there could be other features could be added from the
>>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>>> features need to be checked against the host availability.  
>>
>> Mainly I don't think this reorder is a direct fix for the problem (I
>> just analyse it above), also in your case x86_cpu_filter_features() will
>> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.
> 
> There is no problem with warning, I'd even say it's a good thing.

I agree it's good to have the warning as well.

> But you are right reordering just masks the issue.
> 
> As for expected behavior, if user asked for "-overcommit cpu-pm=on"
> there are 2 options:
>    * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
>    * QEMU shall fail to start.

I like the idea that QEMU refuses to launch the guest if the asked CPU
features are not available, which is more user friendly.  But the
problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
enable all the following CPUIDs and KVM features if it's possible.  So,
if QEMU fails the guest in this case, then it needs to fail the WAITPKG
feature as well. Additionally, it may need to provide individual options
to enable these individual features, which I doubt could be too complicated.

KVM_X86_DISABLE_EXITS_MWAI
KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
KVM_X86_DISABLE_EXITS_CSTATE
CPUID.7.0:ECX.WAITPKG
CPUID.1.ECX.MWAIT


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-06-01 15:26       ` Zhao Liu
  2024-06-03  9:30         ` Igor Mammedov
@ 2024-06-03 21:29         ` Chen, Zide
  1 sibling, 0 replies; 29+ messages in thread
From: Chen, Zide @ 2024-06-03 21:29 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial, Sean Christopherson



On 6/1/2024 8:26 AM, Zhao Liu wrote:
> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>> Date: Fri, 31 May 2024 10:13:47 -0700
>> From: "Chen, Zide" <zide.chen@intel.com>
>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>  x86_cpu_filter_features
>>
>> On 5/30/2024 11:30 PM, Zhao Liu wrote:
>>> Hi Zide,
>>>
>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>> From: Zide Chen <zide.chen@intel.com>
>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>  x86_cpu_filter_features
>>>> X-Mailer: git-send-email 2.34.1
>>>>
>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>> features.  e.g., some accel-specific options may require extra features
>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>> specific realizefn.
>>>>
>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>
>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>> that it won't expose features not supported by the host.
>>>>
>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>> ---
>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>> --- a/target/i386/cpu.c
>>>> +++ b/target/i386/cpu.c
>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>>          }
>>>>      }
>>>>  
>>>> +    /*
>>>> +     * note: the call to the framework needs to happen after feature expansion,
>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>> +     * These may be set by the accel-specific code,
>>>> +     * and the results are subsequently checked / assumed in this function.
>>>> +     */
>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>> +    if (local_err != NULL) {
>>>> +        error_propagate(errp, local_err);
>>>> +        return;
>>>> +    }
>>>> +
>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
>>>
>>> For your case, which sets cpu-pm=on via overcommit, then
>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>
>>> Such warning is not necessary, because the purpose of overcommit (from
>>> code) is only to support mwait when possible, not to commit to support
>>> mwait in Guest.
>>>
>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>> intended to filter features configured by the user, 
>>
>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>> filter out the MWAIT bit which is set from the overcommit option.
> 
> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> bit set by "-overcommit cpu-pm=on". ;-)
> 
> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>   x86_cpu_expand_features()
>      -> x86_cpu_get_supported_feature_word()
>         -> kvm_arch_get_supported_cpuid()
>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>  is working correctly here!
> 
> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>   neither Host's support or previous MWAIT enablement result. This is
>   the root cause of your issue.
> 
> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> instead of repeatly and unconditionally setting the MWAIT bit again in
> host_cpu_enable_cpu_pm().

Yes, we don't need to set CPUID_EXT_MONITOR in host_cpu_enable_cpu_pm().

I'll drop this patch though I still think it makes sense to reorder
cpu_exec_realizefn () call.


> 
> Additionally, I think the code in x86_cpu_realizefn():
>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> has the similar issue because it also should check MWAIT feature bit.

As explained in another comment, I agree that these two bits need to be
checked against MWAIT availability, or even we don't need
cpu->mwait->ecx at all. But I don't understand why the code explicitly
states that "We always wake on interrupt even if host does not have the
capability", and don't know if it could cause problems if they are removed.

> 
> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.
> 
>>> and the changes of
>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>> "QEMU knows what it is doing".
>>
>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>
>> - Less redundant code. Specifically, no need to call
>> x86_cpu_get_supported_feature_word() again.
>> - Potentially there could be other features could be added from the
>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>> features need to be checked against the host availability.
> 
> Mainly I don't think this reorder is a direct fix for the problem (I
> just analyse it above), also in your case x86_cpu_filter_features() will
> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-06-03 21:29           ` Chen, Zide
@ 2024-06-05 15:07             ` Igor Mammedov
  2024-06-05 17:58               ` Chen, Zide
  0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-06-05 15:07 UTC (permalink / raw)
  To: Chen, Zide
  Cc: Zhao Liu, qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial, Sean Christopherson

On Mon, 3 Jun 2024 14:29:50 -0700
"Chen, Zide" <zide.chen@intel.com> wrote:

> On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> > On Sat, 1 Jun 2024 23:26:55 +0800
> > Zhao Liu <zhao1.liu@intel.com> wrote:
> >   
> >> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:  
> >>> Date: Fri, 31 May 2024 10:13:47 -0700
> >>> From: "Chen, Zide" <zide.chen@intel.com>
> >>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>>  x86_cpu_filter_features
> >>>
> >>> On 5/30/2024 11:30 PM, Zhao Liu wrote:    
> >>>> Hi Zide,
> >>>>
> >>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:    
> >>>>> Date: Fri, 24 May 2024 13:00:16 -0700
> >>>>> From: Zide Chen <zide.chen@intel.com>
> >>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>>>>  x86_cpu_filter_features
> >>>>> X-Mailer: git-send-email 2.34.1
> >>>>>
> >>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> >>>>> features.  e.g., some accel-specific options may require extra features
> >>>>> to be enabled, and it's appropriate to expand these features in accel-
> >>>>> specific realizefn.
> >>>>>
> >>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> >>>>>
> >>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> >>>>> that it won't expose features not supported by the host.
> >>>>>
> >>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> >>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
> >>>>> ---
> >>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
> >>>>>  target/i386/kvm/kvm-cpu.c |  1 -
> >>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
> >>>>>
> >>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >>>>> index bc2dceb647fa..a1c1c785bd2f 100644
> >>>>> --- a/target/i386/cpu.c
> >>>>> +++ b/target/i386/cpu.c
> >>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>>>>          }
> >>>>>      }
> >>>>>  
> >>>>> +    /*
> >>>>> +     * note: the call to the framework needs to happen after feature expansion,
> >>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> >>>>> +     * These may be set by the accel-specific code,
> >>>>> +     * and the results are subsequently checked / assumed in this function.
> >>>>> +     */
> >>>>> +    cpu_exec_realizefn(cs, &local_err);
> >>>>> +    if (local_err != NULL) {
> >>>>> +        error_propagate(errp, local_err);
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);    
> >>>>
> >>>> For your case, which sets cpu-pm=on via overcommit, then
> >>>> x86_cpu_filter_features() will complain that mwait is not supported.
> >>>>
> >>>> Such warning is not necessary, because the purpose of overcommit (from
> >>>> code) is only to support mwait when possible, not to commit to support
> >>>> mwait in Guest.
> >>>>
> >>>> Additionally, I understand x86_cpu_filter_features() is primarily
> >>>> intended to filter features configured by the user,     
> >>>
> >>> Yes, that's why this patches intends to let x86_cpu_filter_features()
> >>> filter out the MWAIT bit which is set from the overcommit option.    
> >>
> >> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> >> bit set by "-overcommit cpu-pm=on". ;-)
> >>
> >> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> >> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
> >>   x86_cpu_expand_features()  
> >>      -> x86_cpu_get_supported_feature_word()
> >>         -> kvm_arch_get_supported_cpuid()    
> >>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
> >>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
> >>  is working correctly here!
> >>
> >> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
> >>   neither Host's support or previous MWAIT enablement result. This is
> >>   the root cause of your issue.
> >>
> >> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> >> instead of repeatly and unconditionally setting the MWAIT bit again in
> >> host_cpu_enable_cpu_pm().
> >>
> >> Additionally, I think the code in x86_cpu_realizefn():
> >>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> has the similar issue because it also should check MWAIT feature bit.
> >>
> >> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> >> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> >> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.  
> > 
> > Agreed with above analysis,
> > we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
> > gets us MWAIT already.  
> 
> Yes, I agree don't need to set CPUID_EXT_MONITOR besides
> kvm_arch_get_supported_cpuid().
> 
> > 
> > filling in cpu->mwait.ecx is benign mistake which likely doesn't
> > trigger anything if CPUID_EXT_MONITOR is not present.
> > But for clarity it's better to add an explicit check there as well.  
> 
> Yes, I agree without MWAIT available and advertised, it's meaningless to
> set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
> all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
> MWAIT is available to the guest. But I don't understand the history of
> why QEMU unconditionally advertises these two bits, and don't know it it
> could break some thing if these two bits are removed.
> 
> Even if we want to fix these two bits, we can do it in another separate
> patch.
> 
> e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
> unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
> with further explanation.
> 
> 2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
> comment "We always wake on interrupt even if host does not have the
> capability" to CPUID_MWAIT_IBE.
> 
> 
> >   
> >>  
> >>>> and the changes of
> >>>> CPUID after x86_cpu_filter_features() should by default be regarded like
> >>>> "QEMU knows what it is doing".    
> >>>
> >>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
> >>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> >>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
> >>>
> >>> - Less redundant code. Specifically, no need to call
> >>> x86_cpu_get_supported_feature_word() again.
> >>> - Potentially there could be other features could be added from the
> >>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> >>> features need to be checked against the host availability.    
> >>
> >> Mainly I don't think this reorder is a direct fix for the problem (I
> >> just analyse it above), also in your case x86_cpu_filter_features() will
> >> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.  
> > 
> > There is no problem with warning, I'd even say it's a good thing.  
> 
> I agree it's good to have the warning as well.
> 
> > But you are right reordering just masks the issue.
> > 
> > As for expected behavior, if user asked for "-overcommit cpu-pm=on"
> > there are 2 options:
> >    * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
> >    * QEMU shall fail to start.  
> 
> I like the idea that QEMU refuses to launch the guest if the asked CPU
> features are not available, which is more user friendly.  But the
> problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
> enable all the following CPUIDs and KVM features if it's possible.  So,
> if QEMU fails the guest in this case, then it needs to fail the WAITPKG
> feature as well. Additionally, it may need to provide individual options
> to enable these individual features, which I doubt could be too complicated.

how about

kvm_arch_init()
    ...
    if (enable_cpu_pm) {                                                         
        int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);   

/* Work around for kernel header with a typo. TODO: fix header and drop. */      
#if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)    
#define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL                      
#endif                                   
                                        
above comment probably needs to be cleaned up

        if (disable_exits) {                                                     
            disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |                      
                              KVM_X86_DISABLE_EXITS_HLT |                        
                              KVM_X86_DISABLE_EXITS_PAUSE |                      
                              KVM_X86_DISABLE_EXITS_CSTATE);                     
        }  

fail here if filtered disable_exits is an empty set?                                                                
                                                                                 
        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,                 
                                disable_exits);                                  
        if (ret < 0) {                                                           
            error_report("kvm: guest stopping CPU not supported: %s",            
                         strerror(-ret));                                        
        }                                                                        
    }                                                                            
      


> KVM_X86_DISABLE_EXITS_MWAI
> KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
> KVM_X86_DISABLE_EXITS_CSTATE
> CPUID.7.0:ECX.WAITPKG
> CPUID.1.ECX.MWAIT
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  2024-06-05 15:07             ` Igor Mammedov
@ 2024-06-05 17:58               ` Chen, Zide
  0 siblings, 0 replies; 29+ messages in thread
From: Chen, Zide @ 2024-06-05 17:58 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Zhao Liu, qemu-devel, pbonzini, mst, thuth, cfontana, xiaoyao.li,
	qemu-trivial, Sean Christopherson



On 6/5/2024 8:07 AM, Igor Mammedov wrote:
> On Mon, 3 Jun 2024 14:29:50 -0700
> "Chen, Zide" <zide.chen@intel.com> wrote:
> 
>> On 6/3/2024 2:30 AM, Igor Mammedov wrote:
>>> On Sat, 1 Jun 2024 23:26:55 +0800
>>> Zhao Liu <zhao1.liu@intel.com> wrote:
>>>   
>>>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:  
>>>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>>>> From: "Chen, Zide" <zide.chen@intel.com>
>>>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>>
>>>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:    
>>>>>> Hi Zide,
>>>>>>
>>>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:    
>>>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>>>> From: Zide Chen <zide.chen@intel.com>
>>>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>>>  x86_cpu_filter_features
>>>>>>> X-Mailer: git-send-email 2.34.1
>>>>>>>
>>>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>>>> features.  e.g., some accel-specific options may require extra features
>>>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>>>> specific realizefn.
>>>>>>>
>>>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>>>
>>>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>>>> that it won't expose features not supported by the host.
>>>>>>>
>>>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>>>>> ---
>>>>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>>>> --- a/target/i386/cpu.c
>>>>>>> +++ b/target/i386/cpu.c
>>>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>>>>>          }
>>>>>>>      }
>>>>>>>  
>>>>>>> +    /*
>>>>>>> +     * note: the call to the framework needs to happen after feature expansion,
>>>>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>>>>> +     * These may be set by the accel-specific code,
>>>>>>> +     * and the results are subsequently checked / assumed in this function.
>>>>>>> +     */
>>>>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>>>>> +    if (local_err != NULL) {
>>>>>>> +        error_propagate(errp, local_err);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);    
>>>>>>
>>>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>>>
>>>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>>>> code) is only to support mwait when possible, not to commit to support
>>>>>> mwait in Guest.
>>>>>>
>>>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>>>> intended to filter features configured by the user,     
>>>>>
>>>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>>>> filter out the MWAIT bit which is set from the overcommit option.    
>>>>
>>>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>>>> bit set by "-overcommit cpu-pm=on". ;-)
>>>>
>>>> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
>>>> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>>>>   x86_cpu_expand_features()  
>>>>      -> x86_cpu_get_supported_feature_word()
>>>>         -> kvm_arch_get_supported_cpuid()    
>>>>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>>>>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>>>>  is working correctly here!
>>>>
>>>> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>>>>   neither Host's support or previous MWAIT enablement result. This is
>>>>   the root cause of your issue.
>>>>
>>>> Therefore, we should make cpu-pm honor his first MWAIT enablement result
>>>> instead of repeatly and unconditionally setting the MWAIT bit again in
>>>> host_cpu_enable_cpu_pm().
>>>>
>>>> Additionally, I think the code in x86_cpu_realizefn():
>>>>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>>>> has the similar issue because it also should check MWAIT feature bit.
>>>>
>>>> Further, it may be possible to remove cpu->mwait: just check the MWAIT
>>>> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
>>>> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.  
>>>
>>> Agreed with above analysis,
>>> we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
>>> gets us MWAIT already.  
>>
>> Yes, I agree don't need to set CPUID_EXT_MONITOR besides
>> kvm_arch_get_supported_cpuid().
>>
>>>
>>> filling in cpu->mwait.ecx is benign mistake which likely doesn't
>>> trigger anything if CPUID_EXT_MONITOR is not present.
>>> But for clarity it's better to add an explicit check there as well.  
>>
>> Yes, I agree without MWAIT available and advertised, it's meaningless to
>> set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
>> all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
>> MWAIT is available to the guest. But I don't understand the history of
>> why QEMU unconditionally advertises these two bits, and don't know it it
>> could break some thing if these two bits are removed.
>>
>> Even if we want to fix these two bits, we can do it in another separate
>> patch.
>>
>> e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
>> unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
>> with further explanation.
>>
>> 2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
>> comment "We always wake on interrupt even if host does not have the
>> capability" to CPUID_MWAIT_IBE.
>>
>>
>>>   
>>>>  
>>>>>> and the changes of
>>>>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>>>>> "QEMU knows what it is doing".    
>>>>>
>>>>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>>>>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>>>>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>>>>
>>>>> - Less redundant code. Specifically, no need to call
>>>>> x86_cpu_get_supported_feature_word() again.
>>>>> - Potentially there could be other features could be added from the
>>>>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>>>>> features need to be checked against the host availability.    
>>>>
>>>> Mainly I don't think this reorder is a direct fix for the problem (I
>>>> just analyse it above), also in your case x86_cpu_filter_features() will
>>>> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.  
>>>
>>> There is no problem with warning, I'd even say it's a good thing.  
>>
>> I agree it's good to have the warning as well.
>>
>>> But you are right reordering just masks the issue.
>>>
>>> As for expected behavior, if user asked for "-overcommit cpu-pm=on"
>>> there are 2 options:
>>>    * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
>>>    * QEMU shall fail to start.  
>>
>> I like the idea that QEMU refuses to launch the guest if the asked CPU
>> features are not available, which is more user friendly.  But the
>> problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
>> enable all the following CPUIDs and KVM features if it's possible.  So,
>> if QEMU fails the guest in this case, then it needs to fail the WAITPKG
>> feature as well. Additionally, it may need to provide individual options
>> to enable these individual features, which I doubt could be too complicated.
> 
> how about
> 
> kvm_arch_init()
>     ...
>     if (enable_cpu_pm) {                                                         
>         int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);   
> 
> /* Work around for kernel header with a typo. TODO: fix header and drop. */      
> #if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)    
> #define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL                      
> #endif                                   
>                                         
> above comment probably needs to be cleaned up

Zhao already has a patch to clean it up.
https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg00889.html

> 
>         if (disable_exits) {                                                     
>             disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |                      
>                               KVM_X86_DISABLE_EXITS_HLT |                        
>                               KVM_X86_DISABLE_EXITS_PAUSE |                      
>                               KVM_X86_DISABLE_EXITS_CSTATE);                     
>         }  
> 
> fail here if filtered disable_exits is an empty set?                                                                
>                                                                                  
>         ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,                 
>                                 disable_exits);                                  
>         if (ret < 0) {                                                           
>             error_report("kvm: guest stopping CPU not supported: %s",            
>                          strerror(-ret));                                        
>         }                                                                        
>     }                                                                            

We may do the following. But it's still confusing:

- If any of these cpu-pm features are available, we should not fail the
guest.  Meanwhile, users are not able to easily identify which features
are available and enabled, so that user may still get false assumptions.

- We may print out the capabilities that are enabled from
kvm_vm_enable_cap(), but seems this is not QEMU style.

- Since KVM commit 0abcc8f65cc ("KVM: VMX: enable X86_FEATURE_WAITPKG in
KVM capabilities"), WAITPKG is advertised to the guest by default, thus
it's not much meaningful to tight enable_cpu_pm with this feature.  But
before this logic is removed from QEMU, look at this case: WAITPKS is
available while none of KVM_X86_DISABLE_EXITS_xxx are available, we
don't fail the guest, but the behavior of the guest is exactly the same
with or without "cpu-pm=on".

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 891ce287e473..a4da23e2cd07 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2708,11 +2708,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
                               KVM_X86_DISABLE_EXITS_CSTATE);
         }

-        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,
-                                disable_exits);
-        if (ret < 0) {
-            error_report("kvm: guest stopping CPU not supported: %s",
+        if (disable_exits)
+            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,
+                                    disable_exits);
+       else
+            ret = -ENOTSUP;
+
+        if (ret < 0 && !has_msr_umwait) {
+            error_report("kvm: Failed to enable cpu-pm overcommit: %s",
                          strerror(-ret));
+           return ret;
         }
     }

> 
>> KVM_X86_DISABLE_EXITS_MWAI
>> KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
>> KVM_X86_DISABLE_EXITS_CSTATE
>> CPUID.7.0:ECX.WAITPKG
>> CPUID.1.ECX.MWAIT
>>
> 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-06-05 17:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-24 20:00 [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Zide Chen
2024-05-24 20:00 ` [PATCH V2 1/3] vl: Allow multiple -overcommit commands Zide Chen
2024-05-27  5:19   ` Thomas Huth
2024-05-30 14:01     ` Zhao Liu
2024-05-31  4:57       ` Thomas Huth
2024-06-03  8:44         ` Markus Armbruster
2024-05-30 13:39   ` Zhao Liu
2024-05-24 20:00 ` [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features Zide Chen
2024-05-31  6:30   ` Zhao Liu
2024-05-31 17:13     ` Chen, Zide
2024-06-01 15:26       ` Zhao Liu
2024-06-03  9:30         ` Igor Mammedov
2024-06-03 21:29           ` Chen, Zide
2024-06-05 15:07             ` Igor Mammedov
2024-06-05 17:58               ` Chen, Zide
2024-06-03 21:29         ` Chen, Zide
2024-05-24 20:00 ` [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() Zide Chen
2024-05-31  6:53   ` Zhao Liu
2024-05-31 17:13     ` Chen, Zide
2024-05-28  9:23 ` [PATCH V2 0/3] improve -overcommit cpu-pm=on|off Igor Mammedov
2024-05-28 18:16   ` Chen, Zide
2024-05-29 12:46     ` Igor Mammedov
2024-05-29 17:31       ` Chen, Zide
2024-05-30 13:54         ` Zhao Liu
2024-05-30 14:34           ` Igor Mammedov
2024-05-30 14:53             ` Sean Christopherson
2024-05-30 14:49           ` Igor Mammedov
2024-06-02 21:54             ` Michael S. Tsirkin
2024-05-30 16:15           ` Chen, Zide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).