qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition)
@ 2011-02-15  8:23 Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 01/13] x86: Account for MCE in cpu_has_work Jan Kiszka
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

And the show goes on: This 4th round contains a rework of the MCE
handling, mostly for KVM, but also few bits used in emulation mode are
affected. Major contributions are:
 - fixed MCE injection race with other events
 - kick VCPU out of halt on asynchronous MCEs
 - unified TCG and KVM MCE injection code
 - unpoison pages on guest reboot (Huang Ying)

This won't be the last uq/master series during my effort to consolidate
qemu-kvm and upstream. A few more patches piled up again that will be
sent later on separately.

CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Huang Ying (2):
  Add qemu_ram_remap
  KVM, MCE, unpoison memory address across reboot

Jan Kiszka (11):
  x86: Account for MCE in cpu_has_work
  x86: Perform implicit mcg_status reset
  x86: Small cleanups of MCE helpers
  x86: Refine error reporting of MCE injection services
  x86: Optionally avoid injecting AO MCEs while others are pending
  Synchronize VCPU states before reset
  kvm: x86: Move MCE functions together
  kvm: x86: Inject pending MCE events on state writeback
  kvm: x86: Consolidate TCG and KVM MCE injection code
  kvm: x86: Clean up kvm_setup_mce
  kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails

 cpu-all.h             |    8 +-
 cpu-common.h          |    1 +
 exec.c                |   61 +++++++-
 monitor.c             |   11 +-
 qemu-common.h         |    6 +-
 target-i386/cpu.h     |   11 +-
 target-i386/exec.h    |   15 +-
 target-i386/helper.c  |  185 ++++++++++++-------
 target-i386/kvm.c     |  472 ++++++++++++++++++++-----------------------------
 target-i386/kvm_x86.h |   25 ---
 vl.c                  |    1 +
 11 files changed, 400 insertions(+), 396 deletions(-)
 delete mode 100644 target-i386/kvm_x86.h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 01/13] x86: Account for MCE in cpu_has_work
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 02/13] x86: Perform implicit mcg_status reset Jan Kiszka
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

MCEs can be injected asynchronously, so they can also terminate the halt
state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/exec.h |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target-i386/exec.h b/target-i386/exec.h
index fc8945b..d050dd0 100644
--- a/target-i386/exec.h
+++ b/target-i386/exec.h
@@ -293,15 +293,12 @@ static inline void load_eflags(int eflags, int update_mask)
 
 static inline int cpu_has_work(CPUState *env)
 {
-    int work;
-
-    work = (env->interrupt_request & CPU_INTERRUPT_HARD) &&
-           (env->eflags & IF_MASK);
-    work |= env->interrupt_request & CPU_INTERRUPT_NMI;
-    work |= env->interrupt_request & CPU_INTERRUPT_INIT;
-    work |= env->interrupt_request & CPU_INTERRUPT_SIPI;
-
-    return work;
+    return ((env->interrupt_request & CPU_INTERRUPT_HARD) &&
+            (env->eflags & IF_MASK)) ||
+           (env->interrupt_request & (CPU_INTERRUPT_NMI |
+                                      CPU_INTERRUPT_INIT |
+                                      CPU_INTERRUPT_SIPI |
+                                      CPU_INTERRUPT_MCE));
 }
 
 static inline int cpu_halted(CPUState *env) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 02/13] x86: Perform implicit mcg_status reset
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 01/13] x86: Account for MCE in cpu_has_work Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 03/13] x86: Small cleanups of MCE helpers Jan Kiszka
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Reorder mcg_status in CPUState to achive automatic clearing on reset.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/cpu.h    |    3 ++-
 target-i386/helper.c |    2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 5f1df8b..75156e7 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -687,6 +687,8 @@ typedef struct CPUX86State {
 
     uint64_t pat;
 
+    uint64_t mcg_status;
+
     /* exception/interrupt handling */
     int error_code;
     int exception_is_int;
@@ -741,7 +743,6 @@ typedef struct CPUX86State {
     struct DeviceState *apic_state;
 
     uint64_t mcg_cap;
-    uint64_t mcg_status;
     uint64_t mcg_ctl;
     uint64_t mce_banks[MCE_BANKS_DEF*4];
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f0c546d..f41416f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -101,8 +101,6 @@ void cpu_reset(CPUX86State *env)
     env->dr[7] = DR7_FIXED_1;
     cpu_breakpoint_remove_all(env, BP_CPU);
     cpu_watchpoint_remove_all(env, BP_CPU);
-
-    env->mcg_status = 0;
 }
 
 void cpu_x86_close(CPUX86State *env)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 03/13] x86: Small cleanups of MCE helpers
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 01/13] x86: Account for MCE in cpu_has_work Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 02/13] x86: Perform implicit mcg_status reset Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 04/13] x86: Refine error reporting of MCE injection services Jan Kiszka
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Fix some code style issues, use proper headers, and align to cpu_x86
naming scheme. No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 cpu-all.h            |    4 ----
 monitor.c            |    2 +-
 target-i386/cpu.h    |    5 +++++
 target-i386/helper.c |   41 ++++++++++++++++++++++++-----------------
 4 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 87b0f86..caf5e6c 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -971,8 +971,4 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int broadcast);
-
 #endif /* CPU_ALL_H */
diff --git a/monitor.c b/monitor.c
index 22ae3bb..45b0cc2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2713,7 +2713,7 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
 
     for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
         if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
+            cpu_x86_inject_mce(cenv, bank, status, mcg_status, addr, misc,
                                broadcast);
             break;
         }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 75156e7..52bb48e 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -986,4 +986,9 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
 
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
+
+void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                        int broadcast);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f41416f..ba3bed9 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -28,6 +28,9 @@
 #include "qemu-common.h"
 #include "kvm.h"
 #include "kvm_x86.h"
+#ifndef CONFIG_USER_ONLY
+#include "sysemu.h"
+#endif
 
 //#define DEBUG_MMU
 
@@ -1063,11 +1066,9 @@ static void breakpoint_handler(CPUState *env)
         prev_debug_excp_handler(env);
 }
 
-/* This should come from sysemu.h - if we could include it here... */
-void qemu_system_reset_request(void);
-
-static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
+static void
+qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+                    uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
     uint64_t *banks = cenv->mce_banks;
@@ -1077,15 +1078,17 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
      * reporting is disabled
      */
     if ((status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
-        cenv->mcg_ctl != ~(uint64_t)0)
+        cenv->mcg_ctl != ~(uint64_t)0) {
         return;
+    }
     banks += 4 * bank;
     /*
      * if MSR_MCi_CTL is not all 1s, the uncorrected error
      * reporting is disabled for the bank
      */
-    if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0)
+    if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0) {
         return;
+    }
     if (status & MCI_STATUS_UC) {
         if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
             !(cenv->cr[4] & CR4_MCE_MASK)) {
@@ -1095,8 +1098,9 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
             qemu_system_reset_request();
             return;
         }
-        if (banks[1] & MCI_STATUS_VAL)
+        if (banks[1] & MCI_STATUS_VAL) {
             status |= MCI_STATUS_OVER;
+        }
         banks[2] = addr;
         banks[3] = misc;
         cenv->mcg_status = mcg_status;
@@ -1104,16 +1108,18 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         cpu_interrupt(cenv, CPU_INTERRUPT_MCE);
     } else if (!(banks[1] & MCI_STATUS_VAL)
                || !(banks[1] & MCI_STATUS_UC)) {
-        if (banks[1] & MCI_STATUS_VAL)
+        if (banks[1] & MCI_STATUS_VAL) {
             status |= MCI_STATUS_OVER;
+        }
         banks[2] = addr;
         banks[3] = misc;
         banks[1] = status;
-    } else
+    } else {
         banks[1] |= MCI_STATUS_OVER;
+    }
 }
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
                         int broadcast)
 {
@@ -1155,15 +1161,16 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 
 static void mce_init(CPUX86State *cenv)
 {
-    unsigned int bank, bank_num;
+    unsigned int bank;
 
-    if (((cenv->cpuid_version >> 8)&0xf) >= 6
-        && (cenv->cpuid_features&(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)) {
+    if (((cenv->cpuid_version >> 8) & 0xf) >= 6
+        && (cenv->cpuid_features & (CPUID_MCE | CPUID_MCA)) ==
+            (CPUID_MCE | CPUID_MCA)) {
         cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
         cenv->mcg_ctl = ~(uint64_t)0;
-        bank_num = MCE_BANKS_DEF;
-        for (bank = 0; bank < bank_num; bank++)
-            cenv->mce_banks[bank*4] = ~(uint64_t)0;
+        for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
+            cenv->mce_banks[bank * 4] = ~(uint64_t)0;
+        }
     }
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 04/13] x86: Refine error reporting of MCE injection services
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (2 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 03/13] x86: Small cleanups of MCE helpers Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 05/13] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

As this service is used by the human monitor, make sure that errors get
reported to the right channel, and also raise the verbosity.

This requires to move Monitor typedef in qemu-common.h to resolve the
include dependency.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 monitor.c            |    4 +-
 qemu-common.h        |    6 ++--
 target-i386/cpu.h    |    6 ++--
 target-i386/helper.c |   79 +++++++++++++++++++++++++++++---------------------
 4 files changed, 54 insertions(+), 41 deletions(-)

diff --git a/monitor.c b/monitor.c
index 45b0cc2..662df7c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2712,8 +2712,8 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
     int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
 
     for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-        if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-            cpu_x86_inject_mce(cenv, bank, status, mcg_status, addr, misc,
+        if (cenv->cpu_index == cpu_index) {
+            cpu_x86_inject_mce(mon, cenv, bank, status, mcg_status, addr, misc,
                                broadcast);
             break;
         }
diff --git a/qemu-common.h b/qemu-common.h
index a4d9c21..6ac29cc 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -18,6 +18,9 @@ typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
 
+struct Monitor;
+typedef struct Monitor Monitor;
+
 /* we put basic includes here to avoid repeating them in device drivers */
 #include <stdlib.h>
 #include <stdio.h>
@@ -324,9 +327,6 @@ void qemu_iovec_to_buffer(QEMUIOVector *qiov, void *buf);
 void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void *buf, size_t count);
 void qemu_iovec_memset(QEMUIOVector *qiov, int c, size_t count);
 
-struct Monitor;
-typedef struct Monitor Monitor;
-
 /* Convert a byte between binary and BCD.  */
 static inline uint8_t to_bcd(uint8_t val)
 {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 52bb48e..486af1d 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -987,8 +987,8 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
 
-void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int broadcast);
+void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
+                        uint64_t status, uint64_t mcg_status, uint64_t addr,
+                        uint64_t misc, int broadcast);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index ba3bed9..462d332 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -30,6 +30,7 @@
 #include "kvm_x86.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu.h"
+#include "monitor.h"
 #endif
 
 //#define DEBUG_MMU
@@ -1067,33 +1068,38 @@ static void breakpoint_handler(CPUState *env)
 }
 
 static void
-qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
                     uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
-    uint64_t *banks = cenv->mce_banks;
-
-    /*
-     * if MSR_MCG_CTL is not all 1s, the uncorrected error
-     * reporting is disabled
-     */
-    if ((status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
-        cenv->mcg_ctl != ~(uint64_t)0) {
-        return;
-    }
-    banks += 4 * bank;
-    /*
-     * if MSR_MCi_CTL is not all 1s, the uncorrected error
-     * reporting is disabled for the bank
-     */
-    if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0) {
-        return;
-    }
+    uint64_t *banks = cenv->mce_banks + 4 * bank;
+
     if (status & MCI_STATUS_UC) {
+        /*
+         * if MSR_MCG_CTL is not all 1s, the uncorrected error
+         * reporting is disabled
+         */
+        if ((mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
+            monitor_printf(mon,
+                           "CPU %d: Uncorrected error reporting disabled\n",
+                           cenv->cpu_index);
+            return;
+        }
+
+        /*
+         * if MSR_MCi_CTL is not all 1s, the uncorrected error
+         * reporting is disabled for the bank
+         */
+        if (banks[0] != ~(uint64_t)0) {
+            monitor_printf(mon, "CPU %d: Uncorrected error reporting disabled "
+                           "for bank %d\n", cenv->cpu_index, bank);
+            return;
+        }
+
         if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
             !(cenv->cr[4] & CR4_MCE_MASK)) {
-            fprintf(stderr, "injects mce exception while previous "
-                    "one is in progress!\n");
+            monitor_printf(mon, "CPU %d: Previous MCE still in progress, "
+                                "raising triple fault\n", cenv->cpu_index);
             qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
             qemu_system_reset_request();
             return;
@@ -1119,23 +1125,29 @@ qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
     }
 }
 
-void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int broadcast)
+void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
+                        uint64_t status, uint64_t mcg_status, uint64_t addr,
+                        uint64_t misc, int broadcast)
 {
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
     int flag = 0;
 
-    if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
+    if (!cenv->mcg_cap) {
+        monitor_printf(mon, "MCE injection not supported\n");
         return;
     }
-
-    if (broadcast) {
-        if (!cpu_x86_support_mca_broadcast(cenv)) {
-            fprintf(stderr, "Current CPU does not support broadcast\n");
-            return;
-        }
+    if (bank >= bank_num) {
+        monitor_printf(mon, "Invalid MCE bank number\n");
+        return;
+    }
+    if (!(status & MCI_STATUS_VAL)) {
+        monitor_printf(mon, "Invalid MCE status code\n");
+        return;
+    }
+    if (broadcast && !cpu_x86_support_mca_broadcast(cenv)) {
+        monitor_printf(mon, "Guest CPU does not support MCA broadcast\n");
+        return;
     }
 
     if (kvm_enabled()) {
@@ -1145,13 +1157,14 @@ void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
 
         kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
-        qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc);
         if (broadcast) {
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
                 if (cenv == env) {
                     continue;
                 }
-                qemu_inject_x86_mce(env, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
+                qemu_inject_x86_mce(mon, env, 1,
+                                    MCI_STATUS_VAL | MCI_STATUS_UC,
                                     MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
             }
         }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 05/13] x86: Optionally avoid injecting AO MCEs while others are pending
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (3 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 04/13] x86: Refine error reporting of MCE injection services Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 06/13] Synchronize VCPU states before reset Jan Kiszka
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Allow to tell cpu_x86_inject_mce that it should ignore Action Optional
MCE events when the target VCPU is still processing another one. This
will be used by KVM soon.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 monitor.c            |    7 +++++--
 target-i386/cpu.h    |    5 ++++-
 target-i386/helper.c |   26 +++++++++++++++++++-------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/monitor.c b/monitor.c
index 662df7c..ae20927 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2709,12 +2709,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
     uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
     uint64_t addr = qdict_get_int(qdict, "addr");
     uint64_t misc = qdict_get_int(qdict, "misc");
-    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
+    int flags = MCE_INJECT_UNCOND_AO;
 
+    if (qdict_get_try_bool(qdict, "broadcast", 0)) {
+        flags |= MCE_INJECT_BROADCAST;
+    }
     for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
         if (cenv->cpu_index == cpu_index) {
             cpu_x86_inject_mce(mon, cenv, bank, status, mcg_status, addr, misc,
-                               broadcast);
+                               flags);
             break;
         }
     }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 486af1d..d0eae75 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -987,8 +987,11 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
 
+#define MCE_INJECT_BROADCAST    1
+#define MCE_INJECT_UNCOND_AO    2
+
 void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
                         uint64_t status, uint64_t mcg_status, uint64_t addr,
-                        uint64_t misc, int broadcast);
+                        uint64_t misc, int flags);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 462d332..e3ef40c 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1069,11 +1069,20 @@ static void breakpoint_handler(CPUState *env)
 
 static void
 qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
-                    uint64_t mcg_status, uint64_t addr, uint64_t misc)
+                    uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                    int flags)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
     uint64_t *banks = cenv->mce_banks + 4 * bank;
 
+    /*
+     * If there is an MCE exception being processed, ignore this SRAO MCE
+     * unless unconditional injection was requested.
+     */
+    if (!(flags & MCE_INJECT_UNCOND_AO) && !(status & MCI_STATUS_AR)
+        && (cenv->mcg_status & MCG_STATUS_MCIP)) {
+        return;
+    }
     if (status & MCI_STATUS_UC) {
         /*
          * if MSR_MCG_CTL is not all 1s, the uncorrected error
@@ -1127,7 +1136,7 @@ qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
 
 void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
                         uint64_t status, uint64_t mcg_status, uint64_t addr,
-                        uint64_t misc, int broadcast)
+                        uint64_t misc, int flags)
 {
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
@@ -1145,27 +1154,30 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
         monitor_printf(mon, "Invalid MCE status code\n");
         return;
     }
-    if (broadcast && !cpu_x86_support_mca_broadcast(cenv)) {
+    if ((flags & MCE_INJECT_BROADCAST)
+        && !cpu_x86_support_mca_broadcast(cenv)) {
         monitor_printf(mon, "Guest CPU does not support MCA broadcast\n");
         return;
     }
 
     if (kvm_enabled()) {
-        if (broadcast) {
+        if (flags & MCE_INJECT_BROADCAST) {
             flag |= MCE_BROADCAST;
         }
 
         kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
-        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc);
-        if (broadcast) {
+        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc,
+                            flags);
+        if (flags & MCE_INJECT_BROADCAST) {
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
                 if (cenv == env) {
                     continue;
                 }
                 qemu_inject_x86_mce(mon, env, 1,
                                     MCI_STATUS_VAL | MCI_STATUS_UC,
-                                    MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
+                                    MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
+                                    flags);
             }
         }
     }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 06/13] Synchronize VCPU states before reset
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (4 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 05/13] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 07/13] kvm: x86: Move MCE functions together Jan Kiszka
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm

This is required to support keeping VCPU states across a system reset.
If we do not read the current state before the reset,
cpu_synchronize_all_post_reset may write back incorrect state
information.

The first user of this will be MCE MSR synchronization which currently
works around the missing cpu_synchronize_all_states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 vl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index b436952..7751843 100644
--- a/vl.c
+++ b/vl.c
@@ -1452,6 +1452,7 @@ static void main_loop(void)
         }
         if (qemu_reset_requested()) {
             pause_all_vcpus();
+            cpu_synchronize_all_states();
             qemu_system_reset();
             resume_all_vcpus();
         }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 07/13] kvm: x86: Move MCE functions together
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (5 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 06/13] Synchronize VCPU states before reset Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Pure function suffling to avoid multiple #ifdef KVM_CAP_MCE sections,
no functional changes. While at it, annotate some #ifdef sections.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |  346 ++++++++++++++++++++++++++---------------------------
 1 files changed, 171 insertions(+), 175 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0aa0a41..f909661 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -172,7 +172,7 @@ static int get_para_features(CPUState *env)
 #endif
     return features;
 }
-#endif
+#endif /* CONFIG_KVM_PARA */
 
 #ifdef KVM_CAP_MCE
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
@@ -273,8 +273,174 @@ static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
     run_on_cpu(env, kvm_do_inject_x86_mce, &data);
 }
 
-static void kvm_mce_broadcast_rest(CPUState *env);
-#endif
+static void kvm_mce_broadcast_rest(CPUState *env)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 1,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = 0,
+        .misc = 0,
+    };
+    CPUState *cenv;
+
+    /* Broadcast MCA signal for processor version 06H_EH and above */
+    if (cpu_x86_support_mca_broadcast(env)) {
+        for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
+            if (cenv == env) {
+                continue;
+            }
+            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
+        }
+    }
+}
+
+static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | MCI_STATUS_AR | 0x134,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+
+    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
+    kvm_mce_broadcast_rest(env);
+}
+#endif /* KVM_CAP_MCE */
+
+static void hardware_memory_error(void)
+{
+    fprintf(stderr, "Hardware memory error!\n");
+    exit(1);
+}
+
+int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
+{
+#ifdef KVM_CAP_MCE
+    void *vaddr;
+    ram_addr_t ram_addr;
+    target_phys_addr_t paddr;
+
+    if ((env->mcg_cap & MCG_SER_P) && addr
+        && (code == BUS_MCEERR_AR
+            || code == BUS_MCEERR_AO)) {
+        vaddr = (void *)addr;
+        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
+            fprintf(stderr, "Hardware memory error for memory used by "
+                    "QEMU itself instead of guest system!\n");
+            /* Hope we are lucky for AO MCE */
+            if (code == BUS_MCEERR_AO) {
+                return 0;
+            } else {
+                hardware_memory_error();
+            }
+        }
+
+        if (code == BUS_MCEERR_AR) {
+            /* Fake an Intel architectural Data Load SRAR UCR */
+            kvm_mce_inj_srar_dataload(env, paddr);
+        } else {
+            /*
+             * If there is an MCE excpetion being processed, ignore
+             * this SRAO MCE
+             */
+            if (!kvm_mce_in_progress(env)) {
+                /* Fake an Intel architectural Memory scrubbing UCR */
+                kvm_mce_inj_srao_memscrub(env, paddr);
+            }
+        }
+    } else
+#endif /* KVM_CAP_MCE */
+    {
+        if (code == BUS_MCEERR_AO) {
+            return 0;
+        } else if (code == BUS_MCEERR_AR) {
+            hardware_memory_error();
+        } else {
+            return 1;
+        }
+    }
+    return 0;
+}
+
+int kvm_arch_on_sigbus(int code, void *addr)
+{
+#ifdef KVM_CAP_MCE
+    if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
+        void *vaddr;
+        ram_addr_t ram_addr;
+        target_phys_addr_t paddr;
+
+        /* Hope we are lucky for AO MCE */
+        vaddr = addr;
+        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr,
+                                               &paddr)) {
+            fprintf(stderr, "Hardware memory error for memory used by "
+                    "QEMU itself instead of guest system!: %p\n", addr);
+            return 0;
+        }
+        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
+    } else
+#endif /* KVM_CAP_MCE */
+    {
+        if (code == BUS_MCEERR_AO) {
+            return 0;
+        } else if (code == BUS_MCEERR_AR) {
+            hardware_memory_error();
+        } else {
+            return 1;
+        }
+    }
+    return 0;
+}
 
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
@@ -294,11 +460,11 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
     }
 
     kvm_inject_x86_mce_on(cenv, &mce, flag);
-#else
+#else /* !KVM_CAP_MCE*/
     if (flag & ABORT_ON_ERROR) {
         abort();
     }
-#endif
+#endif /* !KVM_CAP_MCE*/
 }
 
 static void cpu_update_state(void *opaque, int running, int reason)
@@ -1783,173 +1949,3 @@ bool kvm_arch_stop_on_emulation_error(CPUState *env)
     return !(env->cr[0] & CR0_PE_MASK) ||
            ((env->segs[R_CS].selector  & 3) != 3);
 }
-
-static void hardware_memory_error(void)
-{
-    fprintf(stderr, "Hardware memory error!\n");
-    exit(1);
-}
-
-#ifdef KVM_CAP_MCE
-static void kvm_mce_broadcast_rest(CPUState *env)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 1,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = 0,
-        .misc = 0,
-    };
-    CPUState *cenv;
-
-    /* Broadcast MCA signal for processor version 06H_EH and above */
-    if (cpu_x86_support_mca_broadcast(env)) {
-        for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-            if (cenv == env) {
-                continue;
-            }
-            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
-        }
-    }
-}
-
-static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | MCI_STATUS_AR | 0x134,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
-    }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
-    }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-
-    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
-    kvm_mce_broadcast_rest(env);
-}
-
-#endif
-
-int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
-{
-#if defined(KVM_CAP_MCE)
-    void *vaddr;
-    ram_addr_t ram_addr;
-    target_phys_addr_t paddr;
-
-    if ((env->mcg_cap & MCG_SER_P) && addr
-        && (code == BUS_MCEERR_AR
-            || code == BUS_MCEERR_AO)) {
-        vaddr = (void *)addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
-            fprintf(stderr, "Hardware memory error for memory used by "
-                    "QEMU itself instead of guest system!\n");
-            /* Hope we are lucky for AO MCE */
-            if (code == BUS_MCEERR_AO) {
-                return 0;
-            } else {
-                hardware_memory_error();
-            }
-        }
-
-        if (code == BUS_MCEERR_AR) {
-            /* Fake an Intel architectural Data Load SRAR UCR */
-            kvm_mce_inj_srar_dataload(env, paddr);
-        } else {
-            /*
-             * If there is an MCE excpetion being processed, ignore
-             * this SRAO MCE
-             */
-            if (!kvm_mce_in_progress(env)) {
-                /* Fake an Intel architectural Memory scrubbing UCR */
-                kvm_mce_inj_srao_memscrub(env, paddr);
-            }
-        }
-    } else
-#endif
-    {
-        if (code == BUS_MCEERR_AO) {
-            return 0;
-        } else if (code == BUS_MCEERR_AR) {
-            hardware_memory_error();
-        } else {
-            return 1;
-        }
-    }
-    return 0;
-}
-
-int kvm_arch_on_sigbus(int code, void *addr)
-{
-#if defined(KVM_CAP_MCE)
-    if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
-        void *vaddr;
-        ram_addr_t ram_addr;
-        target_phys_addr_t paddr;
-
-        /* Hope we are lucky for AO MCE */
-        vaddr = addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr, &paddr)) {
-            fprintf(stderr, "Hardware memory error for memory used by "
-                    "QEMU itself instead of guest system!: %p\n", addr);
-            return 0;
-        }
-        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
-    } else
-#endif
-    {
-        if (code == BUS_MCEERR_AO) {
-            return 0;
-        } else if (code == BUS_MCEERR_AR) {
-            hardware_memory_error();
-        } else {
-            return 1;
-        }
-    }
-    return 0;
-}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (6 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 07/13] kvm: x86: Move MCE functions together Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-17 16:35   ` [Qemu-devel] " Marcelo Tosatti
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

The current way of injecting MCE events without updating of and
synchronizing with the CPUState is broken and causes spurious
corruptions of the MCE-related parts of the CPUState.

As a first step towards a fix, enhance the state writeback code with
support for injecting events that are pending in the CPUState. A pending
exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
And, just like for TCG, we need to leave the halt state when
CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
code).

This will also allow to unify TCG and KVM injection code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f909661..46f45db 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -467,6 +467,44 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 #endif /* !KVM_CAP_MCE*/
 }
 
+static int kvm_inject_mce_oldstyle(CPUState *env)
+{
+#ifdef KVM_CAP_MCE
+    if (kvm_has_vcpu_events()) {
+        return 0;
+    }
+    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
+        unsigned int bank, bank_num = env->mcg_cap & 0xff;
+        struct kvm_x86_mce mce;
+
+        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
+        assert(env->mcg_cap);
+
+        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
+
+        /*
+         * There must be at least one bank in use if CPU_INTERRUPT_MCE was set.
+         * Find it and use its values for the event injection.
+         */
+        for (bank = 0; bank < bank_num; bank++) {
+            if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
+                break;
+            }
+        }
+        assert(bank < bank_num);
+
+        mce.bank = bank;
+        mce.status = env->mce_banks[bank * 4 + 1];
+        mce.mcg_status = env->mcg_status;
+        mce.addr = env->mce_banks[bank * 4 + 2];
+        mce.misc = env->mce_banks[bank * 4 + 3];
+
+        return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
+    }
+#endif /* KVM_CAP_MCE */
+    return 0;
+}
+
 static void cpu_update_state(void *opaque, int running, int reason)
 {
     CPUState *env = opaque;
@@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
         return 0;
     }
 
-    events.exception.injected = (env->exception_injected >= 0);
-    events.exception.nr = env->exception_injected;
-    events.exception.has_error_code = env->has_error_code;
-    events.exception.error_code = env->error_code;
+    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
+        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
+        assert(env->mcg_cap);
+
+        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
+        if (env->exception_injected == EXCP08_DBLE) {
+            /* this means triple fault */
+            qemu_system_reset_request();
+            env->exit_request = 1;
+        }
+        events.exception.injected = 1;
+        events.exception.nr = EXCP12_MCHK;
+        events.exception.has_error_code = 0;
+    } else {
+        events.exception.injected = (env->exception_injected >= 0);
+        events.exception.nr = env->exception_injected;
+        events.exception.has_error_code = env->has_error_code;
+        events.exception.error_code = env->error_code;
+    }
 
     events.interrupt.injected = (env->interrupt_injected >= 0);
     events.interrupt.nr = env->interrupt_injected;
@@ -1539,6 +1592,11 @@ int kvm_arch_put_registers(CPUState *env, int level)
     if (ret < 0) {
         return ret;
     }
+    /* must be before kvm_put_msrs */
+    ret = kvm_inject_mce_oldstyle(env);
+    if (ret < 0) {
+        return ret;
+    }
     ret = kvm_put_msrs(env, level);
     if (ret < 0) {
         return ret;
@@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 int kvm_arch_process_irqchip_events(CPUState *env)
 {
     if (kvm_irqchip_in_kernel()) {
+        if (env->interrupt_request & CPU_INTERRUPT_MCE) {
+            kvm_cpu_synchronize_state(env);
+            if (env->mp_state == KVM_MP_STATE_HALTED) {
+                env->mp_state = KVM_MP_STATE_RUNNABLE;
+            }
+        }
         return 0;
     }
 
-    if (env->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) {
+    if (env->interrupt_request &
+        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI | CPU_INTERRUPT_MCE)) {
         env->halted = 0;
     }
     if (env->interrupt_request & CPU_INTERRUPT_INIT) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (7 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-17 18:08   ` [Qemu-devel] " Marcelo Tosatti
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 10/13] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
in the VCPUState also for KVM.

We have to drop the MSRs writeback restrictions for this purpose which
is now safe as every uncoordinated MSR injection is removed with this
patch.

We have to execute every per-VCPU event setup on the target VCPU as we
are performing a read-modify-write on its state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/helper.c  |  105 ++++++++++++--------
 target-i386/kvm.c     |  256 +++++++------------------------------------------
 target-i386/kvm_x86.h |   25 -----
 3 files changed, 96 insertions(+), 290 deletions(-)
 delete mode 100644 target-i386/kvm_x86.h

diff --git a/target-i386/helper.c b/target-i386/helper.c
index e3ef40c..a08309f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -27,7 +27,6 @@
 #include "exec-all.h"
 #include "qemu-common.h"
 #include "kvm.h"
-#include "kvm_x86.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu.h"
 #include "monitor.h"
@@ -1067,29 +1066,42 @@ static void breakpoint_handler(CPUState *env)
         prev_debug_excp_handler(env);
 }
 
-static void
-qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
-                    uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                    int flags)
+typedef struct MCEInjectionParams {
+    Monitor *mon;
+    CPUState *env;
+    int bank;
+    uint64_t status;
+    uint64_t mcg_status;
+    uint64_t addr;
+    uint64_t misc;
+    int flags;
+} MCEInjectionParams;
+
+static void do_inject_x86_mce(void *data)
 {
-    uint64_t mcg_cap = cenv->mcg_cap;
-    uint64_t *banks = cenv->mce_banks + 4 * bank;
+    MCEInjectionParams *params = data;
+    CPUState *cenv = params->env;
+    uint64_t *banks = cenv->mce_banks + 4 * params->bank;
+
+    cpu_synchronize_state(cenv);
 
     /*
      * If there is an MCE exception being processed, ignore this SRAO MCE
      * unless unconditional injection was requested.
      */
-    if (!(flags & MCE_INJECT_UNCOND_AO) && !(status & MCI_STATUS_AR)
+    if (!(params->flags & MCE_INJECT_UNCOND_AO)
+        && !(params->status & MCI_STATUS_AR)
         && (cenv->mcg_status & MCG_STATUS_MCIP)) {
         return;
     }
-    if (status & MCI_STATUS_UC) {
+
+    if (params->status & MCI_STATUS_UC) {
         /*
          * if MSR_MCG_CTL is not all 1s, the uncorrected error
          * reporting is disabled
          */
-        if ((mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
-            monitor_printf(mon,
+        if ((cenv->mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
+            monitor_printf(params->mon,
                            "CPU %d: Uncorrected error reporting disabled\n",
                            cenv->cpu_index);
             return;
@@ -1100,35 +1112,39 @@ qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
          * reporting is disabled for the bank
          */
         if (banks[0] != ~(uint64_t)0) {
-            monitor_printf(mon, "CPU %d: Uncorrected error reporting disabled "
-                           "for bank %d\n", cenv->cpu_index, bank);
+            monitor_printf(params->mon,
+                           "CPU %d: Uncorrected error reporting disabled for"
+                           " bank %d\n",
+                           cenv->cpu_index, params->bank);
             return;
         }
 
         if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
             !(cenv->cr[4] & CR4_MCE_MASK)) {
-            monitor_printf(mon, "CPU %d: Previous MCE still in progress, "
-                                "raising triple fault\n", cenv->cpu_index);
+            monitor_printf(params->mon,
+                           "CPU %d: Previous MCE still in progress, raising"
+                           " triple fault\n",
+                           cenv->cpu_index);
             qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
             qemu_system_reset_request();
             return;
         }
         if (banks[1] & MCI_STATUS_VAL) {
-            status |= MCI_STATUS_OVER;
+            params->status |= MCI_STATUS_OVER;
         }
-        banks[2] = addr;
-        banks[3] = misc;
-        cenv->mcg_status = mcg_status;
-        banks[1] = status;
+        banks[2] = params->addr;
+        banks[3] = params->misc;
+        cenv->mcg_status = params->mcg_status;
+        banks[1] = params->status;
         cpu_interrupt(cenv, CPU_INTERRUPT_MCE);
     } else if (!(banks[1] & MCI_STATUS_VAL)
                || !(banks[1] & MCI_STATUS_UC)) {
         if (banks[1] & MCI_STATUS_VAL) {
-            status |= MCI_STATUS_OVER;
+            params->status |= MCI_STATUS_OVER;
         }
-        banks[2] = addr;
-        banks[3] = misc;
-        banks[1] = status;
+        banks[2] = params->addr;
+        banks[3] = params->misc;
+        banks[1] = params->status;
     } else {
         banks[1] |= MCI_STATUS_OVER;
     }
@@ -1138,9 +1154,18 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
                         uint64_t status, uint64_t mcg_status, uint64_t addr,
                         uint64_t misc, int flags)
 {
+    MCEInjectionParams params = {
+        .mon = mon,
+        .env = cenv,
+        .bank = bank,
+        .status = status,
+        .mcg_status = mcg_status,
+        .addr = addr,
+        .misc = misc,
+        .flags = flags,
+    };
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
-    int flag = 0;
 
     if (!cenv->mcg_cap) {
         monitor_printf(mon, "MCE injection not supported\n");
@@ -1160,25 +1185,19 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
         return;
     }
 
-    if (kvm_enabled()) {
-        if (flags & MCE_INJECT_BROADCAST) {
-            flag |= MCE_BROADCAST;
-        }
-
-        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
-    } else {
-        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc,
-                            flags);
-        if (flags & MCE_INJECT_BROADCAST) {
-            for (env = first_cpu; env != NULL; env = env->next_cpu) {
-                if (cenv == env) {
-                    continue;
-                }
-                qemu_inject_x86_mce(mon, env, 1,
-                                    MCI_STATUS_VAL | MCI_STATUS_UC,
-                                    MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
-                                    flags);
+    run_on_cpu(cenv, do_inject_x86_mce, &params);
+    if (flags & MCE_INJECT_BROADCAST) {
+        params.bank = 1;
+        params.status = MCI_STATUS_VAL | MCI_STATUS_UC;
+        params.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
+        params.addr = 0;
+        params.misc = 0;
+        for (env = first_cpu; env != NULL; env = env->next_cpu) {
+            if (cenv == env) {
+                continue;
             }
+            params.env = env;
+            run_on_cpu(cenv, do_inject_x86_mce, &params);
         }
     }
 }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 46f45db..8be093b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -28,7 +28,6 @@
 #include "hw/pc.h"
 #include "hw/apic.h"
 #include "ioport.h"
-#include "kvm_x86.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -193,164 +192,23 @@ static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
     return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
 }
 
-static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m)
+static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code)
 {
-    return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m);
-}
-
-static int kvm_get_msr(CPUState *env, struct kvm_msr_entry *msrs, int n)
-{
-    struct kvm_msrs *kmsrs = qemu_malloc(sizeof *kmsrs + n * sizeof *msrs);
-    int r;
-
-    kmsrs->nmsrs = n;
-    memcpy(kmsrs->entries, msrs, n * sizeof *msrs);
-    r = kvm_vcpu_ioctl(env, KVM_GET_MSRS, kmsrs);
-    memcpy(msrs, kmsrs->entries, n * sizeof *msrs);
-    free(kmsrs);
-    return r;
-}
-
-/* FIXME: kill this and kvm_get_msr, use env->mcg_status instead */
-static int kvm_mce_in_progress(CPUState *env)
-{
-    struct kvm_msr_entry msr_mcg_status = {
-        .index = MSR_MCG_STATUS,
-    };
-    int r;
-
-    r = kvm_get_msr(env, &msr_mcg_status, 1);
-    if (r == -1 || r == 0) {
-        fprintf(stderr, "Failed to get MCE status\n");
-        return 0;
-    }
-    return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
-}
-
-struct kvm_x86_mce_data
-{
-    CPUState *env;
-    struct kvm_x86_mce *mce;
-    int abort_on_error;
-};
-
-static void kvm_do_inject_x86_mce(void *_data)
-{
-    struct kvm_x86_mce_data *data = _data;
-    int r;
-
-    /* If there is an MCE exception being processed, ignore this SRAO MCE */
-    if ((data->env->mcg_cap & MCG_SER_P) &&
-        !(data->mce->status & MCI_STATUS_AR)) {
-        if (kvm_mce_in_progress(data->env)) {
-            return;
-        }
-    }
-
-    r = kvm_set_mce(data->env, data->mce);
-    if (r < 0) {
-        perror("kvm_set_mce FAILED");
-        if (data->abort_on_error) {
-            abort();
-        }
-    }
-}
-
-static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
-                                  int flag)
-{
-    struct kvm_x86_mce_data data = {
-        .env = env,
-        .mce = mce,
-        .abort_on_error = (flag & ABORT_ON_ERROR),
-    };
-
-    if (!env->mcg_cap) {
-        fprintf(stderr, "MCE support is not enabled!\n");
-        return;
-    }
-
-    run_on_cpu(env, kvm_do_inject_x86_mce, &data);
-}
+    uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
+                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+    uint64_t mcg_status = MCG_STATUS_MCIP;
 
-static void kvm_mce_broadcast_rest(CPUState *env)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 1,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = 0,
-        .misc = 0,
-    };
-    CPUState *cenv;
-
-    /* Broadcast MCA signal for processor version 06H_EH and above */
-    if (cpu_x86_support_mca_broadcast(env)) {
-        for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-            if (cenv == env) {
-                continue;
-            }
-            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
-        }
-    }
-}
-
-static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | MCI_STATUS_AR | 0x134,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
-    }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
+    if (code == BUS_MCEERR_AR) {
+        status |= MCI_STATUS_AR | 0x134;
+        mcg_status |= MCG_STATUS_EIPV;
+    } else {
+        status |= 0xc0;
+        mcg_status |= MCG_STATUS_RIPV;
     }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-
-    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
-    kvm_mce_broadcast_rest(env);
+    cpu_x86_inject_mce(NULL, env, 9, status, mcg_status, paddr,
+                       (MCM_ADDR_PHYS << 6) | 0xc,
+                       cpu_x86_support_mca_broadcast(env) ?
+                       MCE_INJECT_BROADCAST : 0);
 }
 #endif /* KVM_CAP_MCE */
 
@@ -360,19 +218,17 @@ static void hardware_memory_error(void)
     exit(1);
 }
 
-int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
+int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *hvaddr)
 {
 #ifdef KVM_CAP_MCE
-    void *vaddr;
     ram_addr_t ram_addr;
-    target_phys_addr_t paddr;
-
-    if ((env->mcg_cap & MCG_SER_P) && addr
-        && (code == BUS_MCEERR_AR
-            || code == BUS_MCEERR_AO)) {
-        vaddr = (void *)addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
+    target_phys_addr_t gpaddr;
+
+    if ((env->mcg_cap & MCG_SER_P) && hvaddr
+        && (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
+        if (qemu_ram_addr_from_host(hvaddr, &ram_addr) ||
+            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr,
+                                               &gpaddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!\n");
             /* Hope we are lucky for AO MCE */
@@ -382,20 +238,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
                 hardware_memory_error();
             }
         }
-
-        if (code == BUS_MCEERR_AR) {
-            /* Fake an Intel architectural Data Load SRAR UCR */
-            kvm_mce_inj_srar_dataload(env, paddr);
-        } else {
-            /*
-             * If there is an MCE excpetion being processed, ignore
-             * this SRAO MCE
-             */
-            if (!kvm_mce_in_progress(env)) {
-                /* Fake an Intel architectural Memory scrubbing UCR */
-                kvm_mce_inj_srao_memscrub(env, paddr);
-            }
-        }
+        kvm_mce_inject(env, gpaddr, code);
     } else
 #endif /* KVM_CAP_MCE */
     {
@@ -410,24 +253,22 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
     return 0;
 }
 
-int kvm_arch_on_sigbus(int code, void *addr)
+int kvm_arch_on_sigbus(int code, void *hvaddr)
 {
 #ifdef KVM_CAP_MCE
-    if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
-        void *vaddr;
+    if ((first_cpu->mcg_cap & MCG_SER_P) && hvaddr && code == BUS_MCEERR_AO) {
         ram_addr_t ram_addr;
-        target_phys_addr_t paddr;
+        target_phys_addr_t gpaddr;
 
         /* Hope we are lucky for AO MCE */
-        vaddr = addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+        if (qemu_ram_addr_from_host(hvaddr, &ram_addr) ||
             !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr,
-                                               &paddr)) {
+                                               &gpaddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
-                    "QEMU itself instead of guest system!: %p\n", addr);
+                    "QEMU itself instead of guest system!: %p\n", hvaddr);
             return 0;
         }
-        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
+        kvm_mce_inject(first_cpu, gpaddr, code);
     } else
 #endif /* KVM_CAP_MCE */
     {
@@ -442,31 +283,6 @@ int kvm_arch_on_sigbus(int code, void *addr)
     return 0;
 }
 
-void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int flag)
-{
-#ifdef KVM_CAP_MCE
-    struct kvm_x86_mce mce = {
-        .bank = bank,
-        .status = status,
-        .mcg_status = mcg_status,
-        .addr = addr,
-        .misc = misc,
-    };
-
-    if (flag & MCE_BROADCAST) {
-        kvm_mce_broadcast_rest(cenv);
-    }
-
-    kvm_inject_x86_mce_on(cenv, &mce, flag);
-#else /* !KVM_CAP_MCE*/
-    if (flag & ABORT_ON_ERROR) {
-        abort();
-    }
-#endif /* !KVM_CAP_MCE*/
-}
-
 static int kvm_inject_mce_oldstyle(CPUState *env)
 {
 #ifdef KVM_CAP_MCE
@@ -1059,14 +875,10 @@ static int kvm_put_msrs(CPUState *env, int level)
     if (env->mcg_cap) {
         int i;
 
-        if (level == KVM_PUT_RESET_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-        } else if (level == KVM_PUT_FULL_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
-            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
-                kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
-            }
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
+        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
+            kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
         }
     }
 #endif
diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
deleted file mode 100644
index 9d7b584..0000000
--- a/target-i386/kvm_x86.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- * QEMU KVM support
- *
- * Copyright (C) 2009 Red Hat Inc.
- * Copyright IBM, Corp. 2008
- *
- * Authors:
- *  Anthony Liguori   <aliguori@us.ibm.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef __KVM_X86_H__
-#define __KVM_X86_H__
-
-#define ABORT_ON_ERROR  0x01
-#define MCE_BROADCAST   0x02
-
-void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int flag);
-
-#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 10/13] kvm: x86: Clean up kvm_setup_mce
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (8 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 11/13] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

There is nothing to abstract here. Fold kvm_setup_mce into its caller
and fix up the error reporting (return code of kvm_vcpu_ioctl holds the
error value).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8be093b..7cdff65 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -187,11 +187,6 @@ static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
     return -ENOSYS;
 }
 
-static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
-{
-    return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
-}
-
 static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code)
 {
     uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
@@ -446,6 +441,7 @@ int kvm_arch_init_vcpu(CPUState *env)
         && kvm_check_extension(env->kvm_state, KVM_CAP_MCE) > 0) {
         uint64_t mcg_cap;
         int banks;
+        int ret;
 
         if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
             perror("kvm_get_mce_cap_supported FAILED");
@@ -454,8 +450,9 @@ int kvm_arch_init_vcpu(CPUState *env)
                 banks = MCE_BANKS_DEF;
             mcg_cap &= MCE_CAP_DEF;
             mcg_cap |= banks;
-            if (kvm_setup_mce(env, &mcg_cap)) {
-                perror("kvm_setup_mce FAILED");
+            ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
+            if (ret < 0) {
+                fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
             } else {
                 env->mcg_cap = mcg_cap;
             }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 11/13] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (9 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 10/13] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 12/13] Add qemu_ram_remap Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 13/13] KVM, MCE, unpoison memory address across reboot Jan Kiszka
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

There is no reason to continue if the kernel claims to support MCE but
then fails to process our request.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |   30 +++++++++++++++++-------------
 1 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7cdff65..8eda78b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -443,20 +443,24 @@ int kvm_arch_init_vcpu(CPUState *env)
         int banks;
         int ret;
 
-        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
-            perror("kvm_get_mce_cap_supported FAILED");
-        } else {
-            if (banks > MCE_BANKS_DEF)
-                banks = MCE_BANKS_DEF;
-            mcg_cap &= MCE_CAP_DEF;
-            mcg_cap |= banks;
-            ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
-            if (ret < 0) {
-                fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
-            } else {
-                env->mcg_cap = mcg_cap;
-            }
+        ret = kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks);
+        if (ret < 0) {
+            fprintf(stderr, "kvm_get_mce_cap_supported: %s", strerror(-ret));
+            return ret;
         }
+
+        if (banks > MCE_BANKS_DEF) {
+            banks = MCE_BANKS_DEF;
+        }
+        mcg_cap &= MCE_CAP_DEF;
+        mcg_cap |= banks;
+        ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
+        if (ret < 0) {
+            fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
+            return ret;
+        }
+
+        env->mcg_cap = mcg_cap;
     }
 #endif
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 12/13] Add qemu_ram_remap
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (10 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 11/13] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 13/13] KVM, MCE, unpoison memory address across reboot Jan Kiszka
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm, Huang Ying

From: Huang Ying <ying.huang@intel.com>

qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again.  This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.

[ Jan: style fixlets ]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpu-all.h    |    4 +++
 cpu-common.h |    1 +
 exec.c       |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 65 insertions(+), 1 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index caf5e6c..4f4631d 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -863,10 +863,14 @@ target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr);
 extern int phys_ram_fd;
 extern ram_addr_t ram_size;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC_MASK   (1 << 0)
+
 typedef struct RAMBlock {
     uint8_t *host;
     ram_addr_t offset;
     ram_addr_t length;
+    uint32_t flags;
     char idstr[256];
     QLIST_ENTRY(RAMBlock) next;
 #if defined(__linux__) && !defined(TARGET_S390X)
diff --git a/cpu-common.h b/cpu-common.h
index 54d21d4..ef4e8da 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -50,6 +50,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
                         ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
 void qemu_ram_free(ram_addr_t addr);
+void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
 /* Same but slower, to use for migration, where the order of
diff --git a/exec.c b/exec.c
index d611100..e486458 100644
--- a/exec.c
+++ b/exec.c
@@ -2867,6 +2867,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
 
     if (host) {
         new_block->host = host;
+        new_block->flags |= RAM_PREALLOC_MASK;
     } else {
         if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
@@ -2920,7 +2921,9 @@ void qemu_ram_free(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
-            if (mem_path) {
+            if (block->flags & RAM_PREALLOC_MASK) {
+                ;
+            } else if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
                 if (block->fd) {
                     munmap(block->host, block->length);
@@ -2943,6 +2946,62 @@ void qemu_ram_free(ram_addr_t addr)
 
 }
 
+void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    int flags;
+    void *area, *vaddr;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        offset = addr - block->offset;
+        if (offset < block->length) {
+            vaddr = block->host + offset;
+            if (block->flags & RAM_PREALLOC_MASK) {
+                ;
+            } else {
+                flags = MAP_FIXED;
+                munmap(vaddr, length);
+                if (mem_path) {
+#if defined(__linux__) && !defined(TARGET_S390X)
+                    if (block->fd) {
+#ifdef MAP_POPULATE
+                        flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
+                            MAP_PRIVATE;
+#else
+                        flags |= MAP_PRIVATE;
+#endif
+                        area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+                                    flags, block->fd, offset);
+                    } else {
+                        flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+                        area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+                                    flags, -1, 0);
+                    }
+#endif
+                } else {
+#if defined(TARGET_S390X) && defined(CONFIG_KVM)
+                    flags |= MAP_SHARED | MAP_ANONYMOUS;
+                    area = mmap(vaddr, length, PROT_EXEC|PROT_READ|PROT_WRITE,
+                                flags, -1, 0);
+#else
+                    flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+                    area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+                                flags, -1, 0);
+#endif
+                }
+                if (area != vaddr) {
+                    fprintf(stderr, "Could not remap addr: %lx@%lx\n",
+                            length, addr);
+                    exit(1);
+                }
+                qemu_madvise(vaddr, length, QEMU_MADV_MERGEABLE);
+            }
+            return;
+        }
+    }
+}
+
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
    With the exception of the softmmu code in this file, this should
    only be used for local memory (e.g. video ram) that the device owns,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 13/13] KVM, MCE, unpoison memory address across reboot
  2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (11 preceding siblings ...)
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 12/13] Add qemu_ram_remap Jan Kiszka
@ 2011-02-15  8:23 ` Jan Kiszka
  12 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-15  8:23 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm, Huang Ying

From: Huang Ying <ying.huang@intel.com>

In Linux kernel HWPoison processing implementation, the virtual
address in processes mapping the error physical memory page is marked
as HWPoison.  So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.

If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS.  If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system.  But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated.  That is, guest system can not recover via rebooting.

In fact, across rebooting, the contents of guest physical memory page
need not to be kept.  We can allocate a new host physical page to
back the corresponding guest physical address.

This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
to clear the corresponding page table entry, so that make it possible
to allocate a new page to recover the issue.

[ Jan: rebasing and tiny cleanups]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 target-i386/kvm.c |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8eda78b..45e366a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -173,7 +173,40 @@ static int get_para_features(CPUState *env)
 }
 #endif /* CONFIG_KVM_PARA */
 
+typedef struct HWPoisonPage {
+    ram_addr_t ram_addr;
+    QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+static void kvm_unpoison_all(void *param)
+{
+    HWPoisonPage *page, *next_page;
+
+    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+        QLIST_REMOVE(page, list);
+        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+        qemu_free(page);
+    }
+}
+
 #ifdef KVM_CAP_MCE
+static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+    HWPoisonPage *page;
+
+    QLIST_FOREACH(page, &hwpoison_page_list, list) {
+        if (page->ram_addr == ram_addr) {
+            return;
+        }
+    }
+    page = qemu_malloc(sizeof(HWPoisonPage));
+    page->ram_addr = ram_addr;
+    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
                                      int *max_banks)
 {
@@ -233,6 +266,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *hvaddr)
                 hardware_memory_error();
             }
         }
+        kvm_hwpoison_page_add(ram_addr);
         kvm_mce_inject(env, gpaddr, code);
     } else
 #endif /* KVM_CAP_MCE */
@@ -263,6 +297,7 @@ int kvm_arch_on_sigbus(int code, void *hvaddr)
                     "QEMU itself instead of guest system!: %p\n", hvaddr);
             return 0;
         }
+        kvm_hwpoison_page_add(ram_addr);
         kvm_mce_inject(first_cpu, gpaddr, code);
     } else
 #endif /* KVM_CAP_MCE */
@@ -577,6 +612,7 @@ int kvm_arch_init(KVMState *s)
         fprintf(stderr, "e820_add_entry() table is full\n");
         return ret;
     }
+    qemu_register_reset(kvm_unpoison_all, NULL);
 
     return 0;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
@ 2011-02-17 16:35   ` Marcelo Tosatti
  2011-02-17 17:06     ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2011-02-17 16:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Hidetoshi Seto, kvm, qemu-devel, Avi Kivity, Huang Ying,
	Jin Dongming

On Tue, Feb 15, 2011 at 09:23:32AM +0100, Jan Kiszka wrote:
> The current way of injecting MCE events without updating of and
> synchronizing with the CPUState is broken and causes spurious
> corruptions of the MCE-related parts of the CPUState.

Can you explain how? The current pronlem with MCE is that it bypasses 
writeback code, but corruption has nothing to do with that.

> As a first step towards a fix, enhance the state writeback code with
> support for injecting events that are pending in the CPUState. A pending
> exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
> And, just like for TCG, we need to leave the halt state when
> CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
> code).
> 
> This will also allow to unify TCG and KVM injection code.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> CC: Huang Ying <ying.huang@intel.com>
> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> ---
>  target-i386/kvm.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 70 insertions(+), 5 deletions(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index f909661..46f45db 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -467,6 +467,44 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>  #endif /* !KVM_CAP_MCE*/
>  }
>  
> +static int kvm_inject_mce_oldstyle(CPUState *env)
> +{
> +#ifdef KVM_CAP_MCE
> +    if (kvm_has_vcpu_events()) {
> +        return 0;
> +    }
> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> +        unsigned int bank, bank_num = env->mcg_cap & 0xff;
> +        struct kvm_x86_mce mce;
> +
> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> +        assert(env->mcg_cap);
> +
> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> +
> +        /*
> +         * There must be at least one bank in use if CPU_INTERRUPT_MCE was set.
> +         * Find it and use its values for the event injection.
> +         */
> +        for (bank = 0; bank < bank_num; bank++) {
> +            if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
> +                break;
> +            }
> +        }
> +        assert(bank < bank_num);
> +
> +        mce.bank = bank;
> +        mce.status = env->mce_banks[bank * 4 + 1];
> +        mce.mcg_status = env->mcg_status;
> +        mce.addr = env->mce_banks[bank * 4 + 2];
> +        mce.misc = env->mce_banks[bank * 4 + 3];
> +
> +        return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
> +    }
> +#endif /* KVM_CAP_MCE */
> +    return 0;
> +}
> +
>  static void cpu_update_state(void *opaque, int running, int reason)
>  {
>      CPUState *env = opaque;
> @@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
>          return 0;
>      }
>  
> -    events.exception.injected = (env->exception_injected >= 0);
> -    events.exception.nr = env->exception_injected;
> -    events.exception.has_error_code = env->has_error_code;
> -    events.exception.error_code = env->error_code;
> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> +        assert(env->mcg_cap);
> +
> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> +        if (env->exception_injected == EXCP08_DBLE) {
> +            /* this means triple fault */
> +            qemu_system_reset_request();
> +            env->exit_request = 1;
> +        }
> +        events.exception.injected = 1;
> +        events.exception.nr = EXCP12_MCHK;
> +        events.exception.has_error_code = 0;
> +    } else {
> +        events.exception.injected = (env->exception_injected >= 0);
> +        events.exception.nr = env->exception_injected;
> +        events.exception.has_error_code = env->has_error_code;
> +        events.exception.error_code = env->error_code;
> +    }

IMO it is important to maintain a scope for kvm_put_vcpu_events /
kvm_get_vcpu_events: they synchronize state to/from the kernel. Not more
than that. Whatever you're trying to do here should be higher in the
vcpu loop code.

>      events.interrupt.injected = (env->interrupt_injected >= 0);
>      events.interrupt.nr = env->interrupt_injected;
> @@ -1539,6 +1592,11 @@ int kvm_arch_put_registers(CPUState *env, int level)
>      if (ret < 0) {
>          return ret;
>      }
> +    /* must be before kvm_put_msrs */
> +    ret = kvm_inject_mce_oldstyle(env);
> +    if (ret < 0) {
> +        return ret;
> +    }
>      ret = kvm_put_msrs(env, level);
>      if (ret < 0) {
>          return ret;
> @@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
>  int kvm_arch_process_irqchip_events(CPUState *env)
>  {
>      if (kvm_irqchip_in_kernel()) {
> +        if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> +            kvm_cpu_synchronize_state(env);
> +            if (env->mp_state == KVM_MP_STATE_HALTED) {
> +                env->mp_state = KVM_MP_STATE_RUNNABLE;
> +            }
> +        }

Should not manipulate mp_state of a running vcpu (should only do that
for migration when vcpu is stopped), since its managed by the kernel,
for irqchip case.

>          return 0;
>      }
>  
> -    if (env->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI)) {
> +    if (env->interrupt_request &
> +        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI | CPU_INTERRUPT_MCE)) {
>          env->halted = 0;
>      }
>      if (env->interrupt_request & CPU_INTERRUPT_INIT) {
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
  2011-02-17 16:35   ` [Qemu-devel] " Marcelo Tosatti
@ 2011-02-17 17:06     ` Jan Kiszka
  2011-02-17 17:55       ` Marcelo Tosatti
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2011-02-17 17:06 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Hidetoshi Seto, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Avi Kivity, Huang Ying, Jin Dongming

On 2011-02-17 17:35, Marcelo Tosatti wrote:
> On Tue, Feb 15, 2011 at 09:23:32AM +0100, Jan Kiszka wrote:
>> The current way of injecting MCE events without updating of and
>> synchronizing with the CPUState is broken and causes spurious
>> corruptions of the MCE-related parts of the CPUState.
> 
> Can you explain how? The current pronlem with MCE is that it bypasses 
> writeback code, but corruption has nothing to do with that.

It's precisely the same scenario as with the old debug exception
re-injection: If we update the pending exception state via
KVM_SET_VCPU_EVENTS, we must not inject it via any other path. Otherwise
we end up with overwritten/lost events - which is extremely critical for
this rarely taken code paths.

Jut like parts of KVM_SET_GUEST_DEBUG, KVM_X86_SET_MCE pre-dates
KVM_SET_VCPU_EVENTS which obsoleted all other exception injection
mechanisms.

> 
>> As a first step towards a fix, enhance the state writeback code with
>> support for injecting events that are pending in the CPUState. A pending
>> exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
>> And, just like for TCG, we need to leave the halt state when
>> CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
>> code).
>>
>> This will also allow to unify TCG and KVM injection code.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> CC: Huang Ying <ying.huang@intel.com>
>> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>> ---
>>  target-i386/kvm.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 files changed, 70 insertions(+), 5 deletions(-)
>>
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index f909661..46f45db 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -467,6 +467,44 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>>  #endif /* !KVM_CAP_MCE*/
>>  }
>>  
>> +static int kvm_inject_mce_oldstyle(CPUState *env)
>> +{
>> +#ifdef KVM_CAP_MCE
>> +    if (kvm_has_vcpu_events()) {
>> +        return 0;
>> +    }
>> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
>> +        unsigned int bank, bank_num = env->mcg_cap & 0xff;
>> +        struct kvm_x86_mce mce;
>> +
>> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
>> +        assert(env->mcg_cap);
>> +
>> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
>> +
>> +        /*
>> +         * There must be at least one bank in use if CPU_INTERRUPT_MCE was set.
>> +         * Find it and use its values for the event injection.
>> +         */
>> +        for (bank = 0; bank < bank_num; bank++) {
>> +            if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
>> +                break;
>> +            }
>> +        }
>> +        assert(bank < bank_num);
>> +
>> +        mce.bank = bank;
>> +        mce.status = env->mce_banks[bank * 4 + 1];
>> +        mce.mcg_status = env->mcg_status;
>> +        mce.addr = env->mce_banks[bank * 4 + 2];
>> +        mce.misc = env->mce_banks[bank * 4 + 3];
>> +
>> +        return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
>> +    }
>> +#endif /* KVM_CAP_MCE */
>> +    return 0;
>> +}
>> +
>>  static void cpu_update_state(void *opaque, int running, int reason)
>>  {
>>      CPUState *env = opaque;
>> @@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
>>          return 0;
>>      }
>>  
>> -    events.exception.injected = (env->exception_injected >= 0);
>> -    events.exception.nr = env->exception_injected;
>> -    events.exception.has_error_code = env->has_error_code;
>> -    events.exception.error_code = env->error_code;
>> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
>> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
>> +        assert(env->mcg_cap);
>> +
>> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
>> +        if (env->exception_injected == EXCP08_DBLE) {
>> +            /* this means triple fault */
>> +            qemu_system_reset_request();
>> +            env->exit_request = 1;
>> +        }
>> +        events.exception.injected = 1;
>> +        events.exception.nr = EXCP12_MCHK;
>> +        events.exception.has_error_code = 0;
>> +    } else {
>> +        events.exception.injected = (env->exception_injected >= 0);
>> +        events.exception.nr = env->exception_injected;
>> +        events.exception.has_error_code = env->has_error_code;
>> +        events.exception.error_code = env->error_code;
>> +    }
> 
> IMO it is important to maintain a scope for kvm_put_vcpu_events /
> kvm_get_vcpu_events: they synchronize state to/from the kernel. Not more
> than that. Whatever you're trying to do here should be higher in the
> vcpu loop code.

We pick up CPU_INTERRUPT_MCE and translate it into the right exception
that put_vcpu_events is about to sync to the kernel. What should be done
earlier of those steps? Calculating env->exception_injected?

> 
>>      events.interrupt.injected = (env->interrupt_injected >= 0);
>>      events.interrupt.nr = env->interrupt_injected;
>> @@ -1539,6 +1592,11 @@ int kvm_arch_put_registers(CPUState *env, int level)
>>      if (ret < 0) {
>>          return ret;
>>      }
>> +    /* must be before kvm_put_msrs */
>> +    ret = kvm_inject_mce_oldstyle(env);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>>      ret = kvm_put_msrs(env, level);
>>      if (ret < 0) {
>>          return ret;
>> @@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
>>  int kvm_arch_process_irqchip_events(CPUState *env)
>>  {
>>      if (kvm_irqchip_in_kernel()) {
>> +        if (env->interrupt_request & CPU_INTERRUPT_MCE) {
>> +            kvm_cpu_synchronize_state(env);
>> +            if (env->mp_state == KVM_MP_STATE_HALTED) {
>> +                env->mp_state = KVM_MP_STATE_RUNNABLE;
>> +            }
>> +        }
> 
> Should not manipulate mp_state of a running vcpu (should only do that
> for migration when vcpu is stopped), since its managed by the kernel,
> for irqchip case.

Not for asynchronously injected MCEs. The target CPU would simply
oversleep them. MCEs are not in the scope of the in-kernel irqchip.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
  2011-02-17 17:06     ` Jan Kiszka
@ 2011-02-17 17:55       ` Marcelo Tosatti
  2011-02-17 18:04         ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2011-02-17 17:55 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Hidetoshi Seto, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Avi Kivity, Huang Ying, Jin Dongming

On Thu, Feb 17, 2011 at 06:06:19PM +0100, Jan Kiszka wrote:
> On 2011-02-17 17:35, Marcelo Tosatti wrote:
> > On Tue, Feb 15, 2011 at 09:23:32AM +0100, Jan Kiszka wrote:
> >> The current way of injecting MCE events without updating of and
> >> synchronizing with the CPUState is broken and causes spurious
> >> corruptions of the MCE-related parts of the CPUState.
> > 
> > Can you explain how? The current pronlem with MCE is that it bypasses 
> > writeback code, but corruption has nothing to do with that.
> 
> It's precisely the same scenario as with the old debug exception
> re-injection: If we update the pending exception state via
> KVM_SET_VCPU_EVENTS, we must not inject it via any other path. Otherwise
> we end up with overwritten/lost events - which is extremely critical for
> this rarely taken code paths.
> 
> Jut like parts of KVM_SET_GUEST_DEBUG, KVM_X86_SET_MCE pre-dates
> KVM_SET_VCPU_EVENTS which obsoleted all other exception injection
> mechanisms.

OK.

> > 
> >> As a first step towards a fix, enhance the state writeback code with
> >> support for injecting events that are pending in the CPUState. A pending
> >> exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
> >> And, just like for TCG, we need to leave the halt state when
> >> CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
> >> code).
> >>
> >> This will also allow to unify TCG and KVM injection code.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> CC: Huang Ying <ying.huang@intel.com>
> >> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> >> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> >> ---
> >>  target-i386/kvm.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++---
> >>  1 files changed, 70 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> >> index f909661..46f45db 100644
> >> --- a/target-i386/kvm.c
> >> +++ b/target-i386/kvm.c
> >> @@ -467,6 +467,44 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> >>  #endif /* !KVM_CAP_MCE*/
> >>  }
> >>  
> >> +static int kvm_inject_mce_oldstyle(CPUState *env)
> >> +{
> >> +#ifdef KVM_CAP_MCE
> >> +    if (kvm_has_vcpu_events()) {
> >> +        return 0;
> >> +    }
> >> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> +        unsigned int bank, bank_num = env->mcg_cap & 0xff;
> >> +        struct kvm_x86_mce mce;
> >> +
> >> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> >> +        assert(env->mcg_cap);
> >> +
> >> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> >> +
> >> +        /*
> >> +         * There must be at least one bank in use if CPU_INTERRUPT_MCE was set.
> >> +         * Find it and use its values for the event injection.
> >> +         */
> >> +        for (bank = 0; bank < bank_num; bank++) {
> >> +            if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
> >> +                break;
> >> +            }
> >> +        }
> >> +        assert(bank < bank_num);
> >> +
> >> +        mce.bank = bank;
> >> +        mce.status = env->mce_banks[bank * 4 + 1];
> >> +        mce.mcg_status = env->mcg_status;
> >> +        mce.addr = env->mce_banks[bank * 4 + 2];
> >> +        mce.misc = env->mce_banks[bank * 4 + 3];
> >> +
> >> +        return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
> >> +    }
> >> +#endif /* KVM_CAP_MCE */
> >> +    return 0;
> >> +}
> >> +
> >>  static void cpu_update_state(void *opaque, int running, int reason)
> >>  {
> >>      CPUState *env = opaque;
> >> @@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
> >>          return 0;
> >>      }
> >>  
> >> -    events.exception.injected = (env->exception_injected >= 0);
> >> -    events.exception.nr = env->exception_injected;
> >> -    events.exception.has_error_code = env->has_error_code;
> >> -    events.exception.error_code = env->error_code;
> >> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> >> +        assert(env->mcg_cap);
> >> +
> >> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> >> +        if (env->exception_injected == EXCP08_DBLE) {
> >> +            /* this means triple fault */
> >> +            qemu_system_reset_request();
> >> +            env->exit_request = 1;
> >> +        }
> >> +        events.exception.injected = 1;
> >> +        events.exception.nr = EXCP12_MCHK;
> >> +        events.exception.has_error_code = 0;
> >> +    } else {
> >> +        events.exception.injected = (env->exception_injected >= 0);
> >> +        events.exception.nr = env->exception_injected;
> >> +        events.exception.has_error_code = env->has_error_code;
> >> +        events.exception.error_code = env->error_code;
> >> +    }
> > 
> > IMO it is important to maintain a scope for kvm_put_vcpu_events /
> > kvm_get_vcpu_events: they synchronize state to/from the kernel. Not more
> > than that. Whatever you're trying to do here should be higher in the
> > vcpu loop code.
> 
> We pick up CPU_INTERRUPT_MCE and translate it into the right exception
> that put_vcpu_events is about to sync to the kernel. What should be done
> earlier of those steps? Calculating env->exception_injected?

Everything but writeback. Update env->exception_injected/nr in
process_irqchip_events, or in a separate kvm_arch_update_exceptions.

> >>          return ret;
> >> @@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
> >>  int kvm_arch_process_irqchip_events(CPUState *env)
> >>  {
> >>      if (kvm_irqchip_in_kernel()) {
> >> +        if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> +            kvm_cpu_synchronize_state(env);
> >> +            if (env->mp_state == KVM_MP_STATE_HALTED) {
> >> +                env->mp_state = KVM_MP_STATE_RUNNABLE;
> >> +            }
> >> +        }
> > 
> > Should not manipulate mp_state of a running vcpu (should only do that
> > for migration when vcpu is stopped), since its managed by the kernel,
> > for irqchip case.
> 
> Not for asynchronously injected MCEs. The target CPU would simply
> oversleep them. MCEs are not in the scope of the in-kernel irqchip.

Pending MCE exception could break out of in-kernel halt emulation.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
  2011-02-17 17:55       ` Marcelo Tosatti
@ 2011-02-17 18:04         ` Jan Kiszka
  2011-02-17 18:17           ` Marcelo Tosatti
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2011-02-17 18:04 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Hidetoshi Seto, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Avi Kivity, Huang Ying, Jin Dongming

On 2011-02-17 18:55, Marcelo Tosatti wrote:
>>>> @@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
>>>>          return 0;
>>>>      }
>>>>  
>>>> -    events.exception.injected = (env->exception_injected >= 0);
>>>> -    events.exception.nr = env->exception_injected;
>>>> -    events.exception.has_error_code = env->has_error_code;
>>>> -    events.exception.error_code = env->error_code;
>>>> +    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
>>>> +        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
>>>> +        assert(env->mcg_cap);
>>>> +
>>>> +        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
>>>> +        if (env->exception_injected == EXCP08_DBLE) {
>>>> +            /* this means triple fault */
>>>> +            qemu_system_reset_request();
>>>> +            env->exit_request = 1;
>>>> +        }
>>>> +        events.exception.injected = 1;
>>>> +        events.exception.nr = EXCP12_MCHK;
>>>> +        events.exception.has_error_code = 0;
>>>> +    } else {
>>>> +        events.exception.injected = (env->exception_injected >= 0);
>>>> +        events.exception.nr = env->exception_injected;
>>>> +        events.exception.has_error_code = env->has_error_code;
>>>> +        events.exception.error_code = env->error_code;
>>>> +    }
>>>
>>> IMO it is important to maintain a scope for kvm_put_vcpu_events /
>>> kvm_get_vcpu_events: they synchronize state to/from the kernel. Not more
>>> than that. Whatever you're trying to do here should be higher in the
>>> vcpu loop code.
>>
>> We pick up CPU_INTERRUPT_MCE and translate it into the right exception
>> that put_vcpu_events is about to sync to the kernel. What should be done
>> earlier of those steps? Calculating env->exception_injected?
> 
> Everything but writeback. Update env->exception_injected/nr in
> process_irqchip_events, or in a separate kvm_arch_update_exceptions.
> 

OK, will rework this.

>>>>          return ret;
>>>> @@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
>>>>  int kvm_arch_process_irqchip_events(CPUState *env)
>>>>  {
>>>>      if (kvm_irqchip_in_kernel()) {
>>>> +        if (env->interrupt_request & CPU_INTERRUPT_MCE) {
>>>> +            kvm_cpu_synchronize_state(env);
>>>> +            if (env->mp_state == KVM_MP_STATE_HALTED) {
>>>> +                env->mp_state = KVM_MP_STATE_RUNNABLE;
>>>> +            }
>>>> +        }
>>>
>>> Should not manipulate mp_state of a running vcpu (should only do that
>>> for migration when vcpu is stopped), since its managed by the kernel,
>>> for irqchip case.
>>
>> Not for asynchronously injected MCEs. The target CPU would simply
>> oversleep them. MCEs are not in the scope of the in-kernel irqchip.
> 
> Pending MCE exception could break out of in-kernel halt emulation.

Can't follow. What do you mean? That the kernel already takes care? I
didn't find a trace, so I added that code.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code
  2011-02-15  8:23 ` [Qemu-devel] [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
@ 2011-02-17 18:08   ` Marcelo Tosatti
  2011-02-17 18:17     ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2011-02-17 18:08 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Hidetoshi Seto, kvm, qemu-devel, Avi Kivity, Huang Ying,
	Jin Dongming

On Tue, Feb 15, 2011 at 09:23:33AM +0100, Jan Kiszka wrote:
> This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
> SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
> in the VCPUState also for KVM.
> 
> We have to drop the MSRs writeback restrictions for this purpose which
> is now safe as every uncoordinated MSR injection is removed with this
> patch.
> 
> We have to execute every per-VCPU event setup on the target VCPU as we
> are performing a read-modify-write on its state.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> CC: Huang Ying <ying.huang@intel.com>
> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Difficult to review, please split:

- drop writeback restriction
- unify monitor/sigbus injection paths 
- cleanup / refactor

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code
  2011-02-17 18:08   ` [Qemu-devel] " Marcelo Tosatti
@ 2011-02-17 18:17     ` Jan Kiszka
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kiszka @ 2011-02-17 18:17 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Hidetoshi Seto, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Avi Kivity, Huang Ying, Jin Dongming

On 2011-02-17 19:08, Marcelo Tosatti wrote:
> On Tue, Feb 15, 2011 at 09:23:33AM +0100, Jan Kiszka wrote:
>> This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
>> SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
>> in the VCPUState also for KVM.
>>
>> We have to drop the MSRs writeback restrictions for this purpose which
>> is now safe as every uncoordinated MSR injection is removed with this
>> patch.
>>
>> We have to execute every per-VCPU event setup on the target VCPU as we
>> are performing a read-modify-write on its state.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> CC: Huang Ying <ying.huang@intel.com>
>> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> 
> Difficult to review, please split:
> 
> - drop writeback restriction

That will not work due to mutual dependency (see changelog)...

> - unify monitor/sigbus injection paths 
> - cleanup / refactor
> 

...but I will check again and filter out potential cleanups that could
be done afterwards.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
  2011-02-17 18:04         ` Jan Kiszka
@ 2011-02-17 18:17           ` Marcelo Tosatti
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2011-02-17 18:17 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Hidetoshi Seto, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Avi Kivity, Huang Ying, Jin Dongming

On Thu, Feb 17, 2011 at 07:04:51PM +0100, Jan Kiszka wrote:
> >>> Should not manipulate mp_state of a running vcpu (should only do that
> >>> for migration when vcpu is stopped), since its managed by the kernel,
> >>> for irqchip case.
> >>
> >> Not for asynchronously injected MCEs. The target CPU would simply
> >> oversleep them. MCEs are not in the scope of the in-kernel irqchip.
> > 
> > Pending MCE exception could break out of in-kernel halt emulation.
> 
> Can't follow. What do you mean? That the kernel already takes care? I
> didn't find a trace, so I added that code.

Nevermind. This is rare and "halted -> running" transition in userspace 
is harmless.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-02-17 18:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-15  8:23 [Qemu-devel] [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 01/13] x86: Account for MCE in cpu_has_work Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 02/13] x86: Perform implicit mcg_status reset Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 03/13] x86: Small cleanups of MCE helpers Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 04/13] x86: Refine error reporting of MCE injection services Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 05/13] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 06/13] Synchronize VCPU states before reset Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 07/13] kvm: x86: Move MCE functions together Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
2011-02-17 16:35   ` [Qemu-devel] " Marcelo Tosatti
2011-02-17 17:06     ` Jan Kiszka
2011-02-17 17:55       ` Marcelo Tosatti
2011-02-17 18:04         ` Jan Kiszka
2011-02-17 18:17           ` Marcelo Tosatti
2011-02-15  8:23 ` [Qemu-devel] [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
2011-02-17 18:08   ` [Qemu-devel] " Marcelo Tosatti
2011-02-17 18:17     ` Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 10/13] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 11/13] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 12/13] Add qemu_ram_remap Jan Kiszka
2011-02-15  8:23 ` [Qemu-devel] [PATCH 13/13] KVM, MCE, unpoison memory address across reboot Jan Kiszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).