qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition)
@ 2011-03-02  7:56 Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 01/17] kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events Jan Kiszka
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, kvm, qemu-devel, Huang Ying, David Gibson,
	Jin Dongming

This is mostly a rebase of the previous round, just including one
additional ppc build fix (patch 2) for a regression in a previous part.
Please merge.

CC: David Gibson <david@gibson.dropbear.id.au>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Huang Ying (2):
  Add qemu_ram_remap
  KVM, MCE, unpoison memory address across reboot

Jan Kiszka (15):
  kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events
  kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking
  x86: Account for MCE in cpu_has_work
  x86: Perform implicit mcg_status reset
  x86: Small cleanups of MCE helpers
  x86: Refine error reporting of MCE injection services
  x86: Optionally avoid injecting AO MCEs while others are pending
  Synchronize VCPU states before reset
  kvm: x86: Move MCE functions together
  kvm: Rename kvm_arch_process_irqchip_events to async_events
  kvm: x86: Inject pending MCE events on state writeback
  x86: Run qemu_inject_x86_mce on target VCPU
  kvm: x86: Consolidate TCG and KVM MCE injection code
  kvm: x86: Clean up kvm_setup_mce
  kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails

 cpu-all.h             |    8 +-
 cpu-common.h          |    1 +
 exec.c                |   63 +++++++-
 kvm-all.c             |    4 +-
 kvm.h                 |    2 +-
 monitor.c             |   11 +-
 qemu-common.h         |    6 +-
 target-i386/cpu.h     |   11 +-
 target-i386/exec.h    |   15 +-
 target-i386/helper.c  |  185 +++++++++++++-------
 target-i386/kvm.c     |  463 ++++++++++++++++++++-----------------------------
 target-i386/kvm_x86.h |   25 ---
 target-ppc/kvm.c      |    6 +-
 target-s390x/kvm.c    |    2 +-
 vl.c                  |    1 +
 15 files changed, 403 insertions(+), 400 deletions(-)
 delete mode 100644 target-i386/kvm_x86.h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 01/17] kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 02/17] kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking Jan Kiszka
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm

Commit 7a39fe5882 failed to convert the right arch function.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 target-ppc/kvm.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index bd4012a..3924f4b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -222,7 +222,7 @@ int kvmppc_set_interrupt(CPUState *env, int irq, int level)
 #define PPC_INPUT_INT PPC6xx_INPUT_INT
 #endif
 
-int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
+void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 {
     int r;
     unsigned irq;
@@ -253,15 +253,15 @@ int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
     /* We don't know if there are more interrupts pending after this. However,
      * the guest will return to userspace in the course of handling this one
      * anyways, so we will get a chance to deliver the rest. */
-    return 0;
 }
 
 void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 {
 }
 
-void kvm_arch_process_irqchip_events(CPUState *env)
+int kvm_arch_process_irqchip_events(CPUState *env)
 {
+    return 0;
 }
 
 static int kvmppc_handle_halt(CPUState *env)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 02/17] kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 01/17] kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 03/17] x86: Account for MCE in cpu_has_work Jan Kiszka
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm, David Gibson

Original fix by David Gibson.

CC: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm-all.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index e6a7de4..7753c8a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -998,7 +998,9 @@ int kvm_cpu_exec(CPUState *env)
     }
     ret = EXCP_INTERRUPT;
 
+#ifdef KVM_CAP_SET_GUEST_DEBUG
 out:
+#endif
     env->exit_request = 0;
     cpu_single_env = NULL;
     return ret;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 03/17] x86: Account for MCE in cpu_has_work
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 01/17] kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 02/17] kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 04/17] x86: Perform implicit mcg_status reset Jan Kiszka
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

MCEs can be injected asynchronously, so they can also terminate the halt
state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/exec.h |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target-i386/exec.h b/target-i386/exec.h
index fc8945b..d050dd0 100644
--- a/target-i386/exec.h
+++ b/target-i386/exec.h
@@ -293,15 +293,12 @@ static inline void load_eflags(int eflags, int update_mask)
 
 static inline int cpu_has_work(CPUState *env)
 {
-    int work;
-
-    work = (env->interrupt_request & CPU_INTERRUPT_HARD) &&
-           (env->eflags & IF_MASK);
-    work |= env->interrupt_request & CPU_INTERRUPT_NMI;
-    work |= env->interrupt_request & CPU_INTERRUPT_INIT;
-    work |= env->interrupt_request & CPU_INTERRUPT_SIPI;
-
-    return work;
+    return ((env->interrupt_request & CPU_INTERRUPT_HARD) &&
+            (env->eflags & IF_MASK)) ||
+           (env->interrupt_request & (CPU_INTERRUPT_NMI |
+                                      CPU_INTERRUPT_INIT |
+                                      CPU_INTERRUPT_SIPI |
+                                      CPU_INTERRUPT_MCE));
 }
 
 static inline int cpu_halted(CPUState *env) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 04/17] x86: Perform implicit mcg_status reset
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (2 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 03/17] x86: Account for MCE in cpu_has_work Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 05/17] x86: Small cleanups of MCE helpers Jan Kiszka
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Reorder mcg_status in CPUState to achieve automatic clearing on reset.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/cpu.h    |    3 ++-
 target-i386/helper.c |    2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 5f1df8b..75156e7 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -687,6 +687,8 @@ typedef struct CPUX86State {
 
     uint64_t pat;
 
+    uint64_t mcg_status;
+
     /* exception/interrupt handling */
     int error_code;
     int exception_is_int;
@@ -741,7 +743,6 @@ typedef struct CPUX86State {
     struct DeviceState *apic_state;
 
     uint64_t mcg_cap;
-    uint64_t mcg_status;
     uint64_t mcg_ctl;
     uint64_t mce_banks[MCE_BANKS_DEF*4];
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f0c546d..f41416f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -101,8 +101,6 @@ void cpu_reset(CPUX86State *env)
     env->dr[7] = DR7_FIXED_1;
     cpu_breakpoint_remove_all(env, BP_CPU);
     cpu_watchpoint_remove_all(env, BP_CPU);
-
-    env->mcg_status = 0;
 }
 
 void cpu_x86_close(CPUX86State *env)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 05/17] x86: Small cleanups of MCE helpers
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (3 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 04/17] x86: Perform implicit mcg_status reset Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 06/17] x86: Refine error reporting of MCE injection services Jan Kiszka
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Fix some code style issues, use proper headers, and align to cpu_x86
naming scheme. No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 cpu-all.h            |    4 ----
 monitor.c            |    2 +-
 target-i386/cpu.h    |    5 +++++
 target-i386/helper.c |   41 ++++++++++++++++++++++++-----------------
 4 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 87b0f86..caf5e6c 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -971,8 +971,4 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int broadcast);
-
 #endif /* CPU_ALL_H */
diff --git a/monitor.c b/monitor.c
index 22ae3bb..45b0cc2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2713,7 +2713,7 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
 
     for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
         if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
+            cpu_x86_inject_mce(cenv, bank, status, mcg_status, addr, misc,
                                broadcast);
             break;
         }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 75156e7..52bb48e 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -986,4 +986,9 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
 
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
+
+void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                        int broadcast);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f41416f..ba3bed9 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -28,6 +28,9 @@
 #include "qemu-common.h"
 #include "kvm.h"
 #include "kvm_x86.h"
+#ifndef CONFIG_USER_ONLY
+#include "sysemu.h"
+#endif
 
 //#define DEBUG_MMU
 
@@ -1063,11 +1066,9 @@ static void breakpoint_handler(CPUState *env)
         prev_debug_excp_handler(env);
 }
 
-/* This should come from sysemu.h - if we could include it here... */
-void qemu_system_reset_request(void);
-
-static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
+static void
+qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+                    uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
     uint64_t *banks = cenv->mce_banks;
@@ -1077,15 +1078,17 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
      * reporting is disabled
      */
     if ((status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
-        cenv->mcg_ctl != ~(uint64_t)0)
+        cenv->mcg_ctl != ~(uint64_t)0) {
         return;
+    }
     banks += 4 * bank;
     /*
      * if MSR_MCi_CTL is not all 1s, the uncorrected error
      * reporting is disabled for the bank
      */
-    if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0)
+    if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0) {
         return;
+    }
     if (status & MCI_STATUS_UC) {
         if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
             !(cenv->cr[4] & CR4_MCE_MASK)) {
@@ -1095,8 +1098,9 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
             qemu_system_reset_request();
             return;
         }
-        if (banks[1] & MCI_STATUS_VAL)
+        if (banks[1] & MCI_STATUS_VAL) {
             status |= MCI_STATUS_OVER;
+        }
         banks[2] = addr;
         banks[3] = misc;
         cenv->mcg_status = mcg_status;
@@ -1104,16 +1108,18 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         cpu_interrupt(cenv, CPU_INTERRUPT_MCE);
     } else if (!(banks[1] & MCI_STATUS_VAL)
                || !(banks[1] & MCI_STATUS_UC)) {
-        if (banks[1] & MCI_STATUS_VAL)
+        if (banks[1] & MCI_STATUS_VAL) {
             status |= MCI_STATUS_OVER;
+        }
         banks[2] = addr;
         banks[3] = misc;
         banks[1] = status;
-    } else
+    } else {
         banks[1] |= MCI_STATUS_OVER;
+    }
 }
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
                         int broadcast)
 {
@@ -1155,15 +1161,16 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 
 static void mce_init(CPUX86State *cenv)
 {
-    unsigned int bank, bank_num;
+    unsigned int bank;
 
-    if (((cenv->cpuid_version >> 8)&0xf) >= 6
-        && (cenv->cpuid_features&(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)) {
+    if (((cenv->cpuid_version >> 8) & 0xf) >= 6
+        && (cenv->cpuid_features & (CPUID_MCE | CPUID_MCA)) ==
+            (CPUID_MCE | CPUID_MCA)) {
         cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF;
         cenv->mcg_ctl = ~(uint64_t)0;
-        bank_num = MCE_BANKS_DEF;
-        for (bank = 0; bank < bank_num; bank++)
-            cenv->mce_banks[bank*4] = ~(uint64_t)0;
+        for (bank = 0; bank < MCE_BANKS_DEF; bank++) {
+            cenv->mce_banks[bank * 4] = ~(uint64_t)0;
+        }
     }
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 06/17] x86: Refine error reporting of MCE injection services
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (4 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 05/17] x86: Small cleanups of MCE helpers Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 07/17] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

As this service is used by the human monitor, make sure that errors get
reported to the right channel, and also raise the verbosity.

This requires to move Monitor typedef in qemu-common.h to resolve the
include dependency.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 monitor.c            |    4 +-
 qemu-common.h        |    6 ++--
 target-i386/cpu.h    |    6 ++--
 target-i386/helper.c |   79 +++++++++++++++++++++++++++++---------------------
 4 files changed, 54 insertions(+), 41 deletions(-)

diff --git a/monitor.c b/monitor.c
index 45b0cc2..662df7c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2712,8 +2712,8 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
     int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
 
     for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-        if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-            cpu_x86_inject_mce(cenv, bank, status, mcg_status, addr, misc,
+        if (cenv->cpu_index == cpu_index) {
+            cpu_x86_inject_mce(mon, cenv, bank, status, mcg_status, addr, misc,
                                broadcast);
             break;
         }
diff --git a/qemu-common.h b/qemu-common.h
index 40dad52..3ce6b6c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -18,6 +18,9 @@ typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
 
+struct Monitor;
+typedef struct Monitor Monitor;
+
 /* we put basic includes here to avoid repeating them in device drivers */
 #include <stdlib.h>
 #include <stdio.h>
@@ -326,9 +329,6 @@ void qemu_iovec_memset(QEMUIOVector *qiov, int c, size_t count);
 void qemu_iovec_memset_skip(QEMUIOVector *qiov, int c, size_t count,
                             size_t skip);
 
-struct Monitor;
-typedef struct Monitor Monitor;
-
 /* Convert a byte between binary and BCD.  */
 static inline uint8_t to_bcd(uint8_t val)
 {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 52bb48e..486af1d 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -987,8 +987,8 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
 
-void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int broadcast);
+void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
+                        uint64_t status, uint64_t mcg_status, uint64_t addr,
+                        uint64_t misc, int broadcast);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index ba3bed9..462d332 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -30,6 +30,7 @@
 #include "kvm_x86.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu.h"
+#include "monitor.h"
 #endif
 
 //#define DEBUG_MMU
@@ -1067,33 +1068,38 @@ static void breakpoint_handler(CPUState *env)
 }
 
 static void
-qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
                     uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
-    uint64_t *banks = cenv->mce_banks;
-
-    /*
-     * if MSR_MCG_CTL is not all 1s, the uncorrected error
-     * reporting is disabled
-     */
-    if ((status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
-        cenv->mcg_ctl != ~(uint64_t)0) {
-        return;
-    }
-    banks += 4 * bank;
-    /*
-     * if MSR_MCi_CTL is not all 1s, the uncorrected error
-     * reporting is disabled for the bank
-     */
-    if ((status & MCI_STATUS_UC) && banks[0] != ~(uint64_t)0) {
-        return;
-    }
+    uint64_t *banks = cenv->mce_banks + 4 * bank;
+
     if (status & MCI_STATUS_UC) {
+        /*
+         * if MSR_MCG_CTL is not all 1s, the uncorrected error
+         * reporting is disabled
+         */
+        if ((mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
+            monitor_printf(mon,
+                           "CPU %d: Uncorrected error reporting disabled\n",
+                           cenv->cpu_index);
+            return;
+        }
+
+        /*
+         * if MSR_MCi_CTL is not all 1s, the uncorrected error
+         * reporting is disabled for the bank
+         */
+        if (banks[0] != ~(uint64_t)0) {
+            monitor_printf(mon, "CPU %d: Uncorrected error reporting disabled "
+                           "for bank %d\n", cenv->cpu_index, bank);
+            return;
+        }
+
         if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
             !(cenv->cr[4] & CR4_MCE_MASK)) {
-            fprintf(stderr, "injects mce exception while previous "
-                    "one is in progress!\n");
+            monitor_printf(mon, "CPU %d: Previous MCE still in progress, "
+                                "raising triple fault\n", cenv->cpu_index);
             qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
             qemu_system_reset_request();
             return;
@@ -1119,23 +1125,29 @@ qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
     }
 }
 
-void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int broadcast)
+void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
+                        uint64_t status, uint64_t mcg_status, uint64_t addr,
+                        uint64_t misc, int broadcast)
 {
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
     int flag = 0;
 
-    if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
+    if (!cenv->mcg_cap) {
+        monitor_printf(mon, "MCE injection not supported\n");
         return;
     }
-
-    if (broadcast) {
-        if (!cpu_x86_support_mca_broadcast(cenv)) {
-            fprintf(stderr, "Current CPU does not support broadcast\n");
-            return;
-        }
+    if (bank >= bank_num) {
+        monitor_printf(mon, "Invalid MCE bank number\n");
+        return;
+    }
+    if (!(status & MCI_STATUS_VAL)) {
+        monitor_printf(mon, "Invalid MCE status code\n");
+        return;
+    }
+    if (broadcast && !cpu_x86_support_mca_broadcast(cenv)) {
+        monitor_printf(mon, "Guest CPU does not support MCA broadcast\n");
+        return;
     }
 
     if (kvm_enabled()) {
@@ -1145,13 +1157,14 @@ void cpu_x86_inject_mce(CPUState *cenv, int bank, uint64_t status,
 
         kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
-        qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc);
         if (broadcast) {
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
                 if (cenv == env) {
                     continue;
                 }
-                qemu_inject_x86_mce(env, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
+                qemu_inject_x86_mce(mon, env, 1,
+                                    MCI_STATUS_VAL | MCI_STATUS_UC,
                                     MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
             }
         }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 07/17] x86: Optionally avoid injecting AO MCEs while others are pending
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (5 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 06/17] x86: Refine error reporting of MCE injection services Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 08/17] Synchronize VCPU states before reset Jan Kiszka
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Allow to tell cpu_x86_inject_mce that it should ignore Action Optional
MCE events when the target VCPU is still processing another one. This
will be used by KVM soon.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 monitor.c            |    7 +++++--
 target-i386/cpu.h    |    5 ++++-
 target-i386/helper.c |   26 +++++++++++++++++++-------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/monitor.c b/monitor.c
index 662df7c..ae20927 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2709,12 +2709,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
     uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
     uint64_t addr = qdict_get_int(qdict, "addr");
     uint64_t misc = qdict_get_int(qdict, "misc");
-    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
+    int flags = MCE_INJECT_UNCOND_AO;
 
+    if (qdict_get_try_bool(qdict, "broadcast", 0)) {
+        flags |= MCE_INJECT_BROADCAST;
+    }
     for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
         if (cenv->cpu_index == cpu_index) {
             cpu_x86_inject_mce(mon, cenv, bank, status, mcg_status, addr, misc,
-                               broadcast);
+                               flags);
             break;
         }
     }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 486af1d..d0eae75 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -987,8 +987,11 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
 void do_cpu_init(CPUState *env);
 void do_cpu_sipi(CPUState *env);
 
+#define MCE_INJECT_BROADCAST    1
+#define MCE_INJECT_UNCOND_AO    2
+
 void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
                         uint64_t status, uint64_t mcg_status, uint64_t addr,
-                        uint64_t misc, int broadcast);
+                        uint64_t misc, int flags);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 462d332..e3ef40c 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1069,11 +1069,20 @@ static void breakpoint_handler(CPUState *env)
 
 static void
 qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
-                    uint64_t mcg_status, uint64_t addr, uint64_t misc)
+                    uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                    int flags)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
     uint64_t *banks = cenv->mce_banks + 4 * bank;
 
+    /*
+     * If there is an MCE exception being processed, ignore this SRAO MCE
+     * unless unconditional injection was requested.
+     */
+    if (!(flags & MCE_INJECT_UNCOND_AO) && !(status & MCI_STATUS_AR)
+        && (cenv->mcg_status & MCG_STATUS_MCIP)) {
+        return;
+    }
     if (status & MCI_STATUS_UC) {
         /*
          * if MSR_MCG_CTL is not all 1s, the uncorrected error
@@ -1127,7 +1136,7 @@ qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
 
 void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
                         uint64_t status, uint64_t mcg_status, uint64_t addr,
-                        uint64_t misc, int broadcast)
+                        uint64_t misc, int flags)
 {
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
@@ -1145,27 +1154,30 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
         monitor_printf(mon, "Invalid MCE status code\n");
         return;
     }
-    if (broadcast && !cpu_x86_support_mca_broadcast(cenv)) {
+    if ((flags & MCE_INJECT_BROADCAST)
+        && !cpu_x86_support_mca_broadcast(cenv)) {
         monitor_printf(mon, "Guest CPU does not support MCA broadcast\n");
         return;
     }
 
     if (kvm_enabled()) {
-        if (broadcast) {
+        if (flags & MCE_INJECT_BROADCAST) {
             flag |= MCE_BROADCAST;
         }
 
         kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
-        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc);
-        if (broadcast) {
+        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc,
+                            flags);
+        if (flags & MCE_INJECT_BROADCAST) {
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
                 if (cenv == env) {
                     continue;
                 }
                 qemu_inject_x86_mce(mon, env, 1,
                                     MCI_STATUS_VAL | MCI_STATUS_UC,
-                                    MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
+                                    MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
+                                    flags);
             }
         }
     }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 08/17] Synchronize VCPU states before reset
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (6 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 07/17] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 09/17] kvm: x86: Move MCE functions together Jan Kiszka
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm

This is required to support keeping VCPU states across a system reset.
If we do not read the current state before the reset,
cpu_synchronize_all_post_reset may write back incorrect state
information.

The first user of this will be MCE MSR synchronization which currently
works around the missing cpu_synchronize_all_states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 vl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index b436952..7751843 100644
--- a/vl.c
+++ b/vl.c
@@ -1452,6 +1452,7 @@ static void main_loop(void)
         }
         if (qemu_reset_requested()) {
             pause_all_vcpus();
+            cpu_synchronize_all_states();
             qemu_system_reset();
             resume_all_vcpus();
         }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 09/17] kvm: x86: Move MCE functions together
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (7 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 08/17] Synchronize VCPU states before reset Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 10/17] kvm: Rename kvm_arch_process_irqchip_events to async_events Jan Kiszka
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

Pure function suffling to avoid multiple #ifdef KVM_CAP_MCE sections,
no functional changes. While at it, annotate some #ifdef sections.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |  346 ++++++++++++++++++++++++++---------------------------
 1 files changed, 171 insertions(+), 175 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0aa0a41..f909661 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -172,7 +172,7 @@ static int get_para_features(CPUState *env)
 #endif
     return features;
 }
-#endif
+#endif /* CONFIG_KVM_PARA */
 
 #ifdef KVM_CAP_MCE
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
@@ -273,8 +273,174 @@ static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
     run_on_cpu(env, kvm_do_inject_x86_mce, &data);
 }
 
-static void kvm_mce_broadcast_rest(CPUState *env);
-#endif
+static void kvm_mce_broadcast_rest(CPUState *env)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 1,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = 0,
+        .misc = 0,
+    };
+    CPUState *cenv;
+
+    /* Broadcast MCA signal for processor version 06H_EH and above */
+    if (cpu_x86_support_mca_broadcast(env)) {
+        for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
+            if (cenv == env) {
+                continue;
+            }
+            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
+        }
+    }
+}
+
+static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | MCI_STATUS_AR | 0x134,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+
+    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
+    kvm_mce_broadcast_rest(env);
+}
+#endif /* KVM_CAP_MCE */
+
+static void hardware_memory_error(void)
+{
+    fprintf(stderr, "Hardware memory error!\n");
+    exit(1);
+}
+
+int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
+{
+#ifdef KVM_CAP_MCE
+    void *vaddr;
+    ram_addr_t ram_addr;
+    target_phys_addr_t paddr;
+
+    if ((env->mcg_cap & MCG_SER_P) && addr
+        && (code == BUS_MCEERR_AR
+            || code == BUS_MCEERR_AO)) {
+        vaddr = (void *)addr;
+        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
+            fprintf(stderr, "Hardware memory error for memory used by "
+                    "QEMU itself instead of guest system!\n");
+            /* Hope we are lucky for AO MCE */
+            if (code == BUS_MCEERR_AO) {
+                return 0;
+            } else {
+                hardware_memory_error();
+            }
+        }
+
+        if (code == BUS_MCEERR_AR) {
+            /* Fake an Intel architectural Data Load SRAR UCR */
+            kvm_mce_inj_srar_dataload(env, paddr);
+        } else {
+            /*
+             * If there is an MCE excpetion being processed, ignore
+             * this SRAO MCE
+             */
+            if (!kvm_mce_in_progress(env)) {
+                /* Fake an Intel architectural Memory scrubbing UCR */
+                kvm_mce_inj_srao_memscrub(env, paddr);
+            }
+        }
+    } else
+#endif /* KVM_CAP_MCE */
+    {
+        if (code == BUS_MCEERR_AO) {
+            return 0;
+        } else if (code == BUS_MCEERR_AR) {
+            hardware_memory_error();
+        } else {
+            return 1;
+        }
+    }
+    return 0;
+}
+
+int kvm_arch_on_sigbus(int code, void *addr)
+{
+#ifdef KVM_CAP_MCE
+    if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
+        void *vaddr;
+        ram_addr_t ram_addr;
+        target_phys_addr_t paddr;
+
+        /* Hope we are lucky for AO MCE */
+        vaddr = addr;
+        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr,
+                                               &paddr)) {
+            fprintf(stderr, "Hardware memory error for memory used by "
+                    "QEMU itself instead of guest system!: %p\n", addr);
+            return 0;
+        }
+        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
+    } else
+#endif /* KVM_CAP_MCE */
+    {
+        if (code == BUS_MCEERR_AO) {
+            return 0;
+        } else if (code == BUS_MCEERR_AR) {
+            hardware_memory_error();
+        } else {
+            return 1;
+        }
+    }
+    return 0;
+}
 
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
@@ -294,11 +460,11 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
     }
 
     kvm_inject_x86_mce_on(cenv, &mce, flag);
-#else
+#else /* !KVM_CAP_MCE*/
     if (flag & ABORT_ON_ERROR) {
         abort();
     }
-#endif
+#endif /* !KVM_CAP_MCE*/
 }
 
 static void cpu_update_state(void *opaque, int running, int reason)
@@ -1783,173 +1949,3 @@ bool kvm_arch_stop_on_emulation_error(CPUState *env)
     return !(env->cr[0] & CR0_PE_MASK) ||
            ((env->segs[R_CS].selector  & 3) != 3);
 }
-
-static void hardware_memory_error(void)
-{
-    fprintf(stderr, "Hardware memory error!\n");
-    exit(1);
-}
-
-#ifdef KVM_CAP_MCE
-static void kvm_mce_broadcast_rest(CPUState *env)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 1,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = 0,
-        .misc = 0,
-    };
-    CPUState *cenv;
-
-    /* Broadcast MCA signal for processor version 06H_EH and above */
-    if (cpu_x86_support_mca_broadcast(env)) {
-        for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-            if (cenv == env) {
-                continue;
-            }
-            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
-        }
-    }
-}
-
-static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | MCI_STATUS_AR | 0x134,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
-    }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
-    }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-
-    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
-    kvm_mce_broadcast_rest(env);
-}
-
-#endif
-
-int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
-{
-#if defined(KVM_CAP_MCE)
-    void *vaddr;
-    ram_addr_t ram_addr;
-    target_phys_addr_t paddr;
-
-    if ((env->mcg_cap & MCG_SER_P) && addr
-        && (code == BUS_MCEERR_AR
-            || code == BUS_MCEERR_AO)) {
-        vaddr = (void *)addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
-            fprintf(stderr, "Hardware memory error for memory used by "
-                    "QEMU itself instead of guest system!\n");
-            /* Hope we are lucky for AO MCE */
-            if (code == BUS_MCEERR_AO) {
-                return 0;
-            } else {
-                hardware_memory_error();
-            }
-        }
-
-        if (code == BUS_MCEERR_AR) {
-            /* Fake an Intel architectural Data Load SRAR UCR */
-            kvm_mce_inj_srar_dataload(env, paddr);
-        } else {
-            /*
-             * If there is an MCE excpetion being processed, ignore
-             * this SRAO MCE
-             */
-            if (!kvm_mce_in_progress(env)) {
-                /* Fake an Intel architectural Memory scrubbing UCR */
-                kvm_mce_inj_srao_memscrub(env, paddr);
-            }
-        }
-    } else
-#endif
-    {
-        if (code == BUS_MCEERR_AO) {
-            return 0;
-        } else if (code == BUS_MCEERR_AR) {
-            hardware_memory_error();
-        } else {
-            return 1;
-        }
-    }
-    return 0;
-}
-
-int kvm_arch_on_sigbus(int code, void *addr)
-{
-#if defined(KVM_CAP_MCE)
-    if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
-        void *vaddr;
-        ram_addr_t ram_addr;
-        target_phys_addr_t paddr;
-
-        /* Hope we are lucky for AO MCE */
-        vaddr = addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr, &paddr)) {
-            fprintf(stderr, "Hardware memory error for memory used by "
-                    "QEMU itself instead of guest system!: %p\n", addr);
-            return 0;
-        }
-        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
-    } else
-#endif
-    {
-        if (code == BUS_MCEERR_AO) {
-            return 0;
-        } else if (code == BUS_MCEERR_AR) {
-            hardware_memory_error();
-        } else {
-            return 1;
-        }
-    }
-    return 0;
-}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 10/17] kvm: Rename kvm_arch_process_irqchip_events to async_events
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (8 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 09/17] kvm: x86: Move MCE functions together Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 11/17] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm

We will broaden the scope of this function on x86 beyond irqchip events.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm-all.c          |    2 +-
 kvm.h              |    2 +-
 target-i386/kvm.c  |    2 +-
 target-ppc/kvm.c   |    2 +-
 target-s390x/kvm.c |    2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7753c8a..226843c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -893,7 +893,7 @@ int kvm_cpu_exec(CPUState *env)
 
     DPRINTF("kvm_cpu_exec()\n");
 
-    if (kvm_arch_process_irqchip_events(env)) {
+    if (kvm_arch_process_async_events(env)) {
         env->exit_request = 0;
         return EXCP_HLT;
     }
diff --git a/kvm.h b/kvm.h
index 59b2c29..7bc04e0 100644
--- a/kvm.h
+++ b/kvm.h
@@ -102,7 +102,7 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run);
 
 int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run);
 
-int kvm_arch_process_irqchip_events(CPUState *env);
+int kvm_arch_process_async_events(CPUState *env);
 
 int kvm_arch_get_registers(CPUState *env);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f909661..a416554 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1675,7 +1675,7 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
     cpu_set_apic_base(env->apic_state, run->apic_base);
 }
 
-int kvm_arch_process_irqchip_events(CPUState *env)
+int kvm_arch_process_async_events(CPUState *env)
 {
     if (kvm_irqchip_in_kernel()) {
         return 0;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 3924f4b..6c99a16 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -259,7 +259,7 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 {
 }
 
-int kvm_arch_process_irqchip_events(CPUState *env)
+int kvm_arch_process_async_events(CPUState *env)
 {
     return 0;
 }
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index b349812..5673a95 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -177,7 +177,7 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 {
 }
 
-int kvm_arch_process_irqchip_events(CPUState *env)
+int kvm_arch_process_async_events(CPUState *env)
 {
     return 0;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 11/17] kvm: x86: Inject pending MCE events on state writeback
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (9 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 10/17] kvm: Rename kvm_arch_process_irqchip_events to async_events Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 12/17] x86: Run qemu_inject_x86_mce on target VCPU Jan Kiszka
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

The current way of injecting MCE events without updating of and
synchronizing with the CPUState is broken and causes spurious
corruptions of the MCE-related parts of the CPUState.

As a first step towards a fix, enhance the state writeback code with
support for injecting events that are pending in the CPUState. A pending
exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
And, just like for TCG, we need to leave the halt state when
CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
code).

This will also allow to unify TCG and KVM injection code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index a416554..939edc8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -467,6 +467,38 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 #endif /* !KVM_CAP_MCE*/
 }
 
+static int kvm_inject_mce_oldstyle(CPUState *env)
+{
+#ifdef KVM_CAP_MCE
+    if (!kvm_has_vcpu_events() && env->exception_injected == EXCP12_MCHK) {
+        unsigned int bank, bank_num = env->mcg_cap & 0xff;
+        struct kvm_x86_mce mce;
+
+        env->exception_injected = -1;
+
+        /*
+         * There must be at least one bank in use if an MCE is pending.
+         * Find it and use its values for the event injection.
+         */
+        for (bank = 0; bank < bank_num; bank++) {
+            if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
+                break;
+            }
+        }
+        assert(bank < bank_num);
+
+        mce.bank = bank;
+        mce.status = env->mce_banks[bank * 4 + 1];
+        mce.mcg_status = env->mcg_status;
+        mce.addr = env->mce_banks[bank * 4 + 2];
+        mce.misc = env->mce_banks[bank * 4 + 3];
+
+        return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
+    }
+#endif /* KVM_CAP_MCE */
+    return 0;
+}
+
 static void cpu_update_state(void *opaque, int running, int reason)
 {
     CPUState *env = opaque;
@@ -1539,6 +1571,11 @@ int kvm_arch_put_registers(CPUState *env, int level)
     if (ret < 0) {
         return ret;
     }
+    /* must be before kvm_put_msrs */
+    ret = kvm_inject_mce_oldstyle(env);
+    if (ret < 0) {
+        return ret;
+    }
     ret = kvm_put_msrs(env, level);
     if (ret < 0) {
         return ret;
@@ -1677,6 +1714,29 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 
 int kvm_arch_process_async_events(CPUState *env)
 {
+    if (env->interrupt_request & CPU_INTERRUPT_MCE) {
+        /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
+        assert(env->mcg_cap);
+
+        env->interrupt_request &= ~CPU_INTERRUPT_MCE;
+
+        kvm_cpu_synchronize_state(env);
+
+        if (env->exception_injected == EXCP08_DBLE) {
+            /* this means triple fault */
+            qemu_system_reset_request();
+            env->exit_request = 1;
+            return 0;
+        }
+        env->exception_injected = EXCP12_MCHK;
+        env->has_error_code = 0;
+
+        env->halted = 0;
+        if (kvm_irqchip_in_kernel() && env->mp_state == KVM_MP_STATE_HALTED) {
+            env->mp_state = KVM_MP_STATE_RUNNABLE;
+        }
+    }
+
     if (kvm_irqchip_in_kernel()) {
         return 0;
     }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 12/17] x86: Run qemu_inject_x86_mce on target VCPU
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (10 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 11/17] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 13/17] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm

We will use the current TCG-only MCE injection path for KVM as well, and
then this read-modify-write of the target VCPU state has to be performed
synchronously in the corresponding thread.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 target-i386/helper.c |   87 +++++++++++++++++++++++++++++++++----------------
 1 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index e3ef40c..a32960c 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1067,29 +1067,42 @@ static void breakpoint_handler(CPUState *env)
         prev_debug_excp_handler(env);
 }
 
-static void
-qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
-                    uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                    int flags)
+typedef struct MCEInjectionParams {
+    Monitor *mon;
+    CPUState *env;
+    int bank;
+    uint64_t status;
+    uint64_t mcg_status;
+    uint64_t addr;
+    uint64_t misc;
+    int flags;
+} MCEInjectionParams;
+
+static void do_inject_x86_mce(void *data)
 {
-    uint64_t mcg_cap = cenv->mcg_cap;
-    uint64_t *banks = cenv->mce_banks + 4 * bank;
+    MCEInjectionParams *params = data;
+    CPUState *cenv = params->env;
+    uint64_t *banks = cenv->mce_banks + 4 * params->bank;
+
+    cpu_synchronize_state(cenv);
 
     /*
      * If there is an MCE exception being processed, ignore this SRAO MCE
      * unless unconditional injection was requested.
      */
-    if (!(flags & MCE_INJECT_UNCOND_AO) && !(status & MCI_STATUS_AR)
+    if (!(params->flags & MCE_INJECT_UNCOND_AO)
+        && !(params->status & MCI_STATUS_AR)
         && (cenv->mcg_status & MCG_STATUS_MCIP)) {
         return;
     }
-    if (status & MCI_STATUS_UC) {
+
+    if (params->status & MCI_STATUS_UC) {
         /*
          * if MSR_MCG_CTL is not all 1s, the uncorrected error
          * reporting is disabled
          */
-        if ((mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
-            monitor_printf(mon,
+        if ((cenv->mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
+            monitor_printf(params->mon,
                            "CPU %d: Uncorrected error reporting disabled\n",
                            cenv->cpu_index);
             return;
@@ -1100,35 +1113,39 @@ qemu_inject_x86_mce(Monitor *mon, CPUState *cenv, int bank, uint64_t status,
          * reporting is disabled for the bank
          */
         if (banks[0] != ~(uint64_t)0) {
-            monitor_printf(mon, "CPU %d: Uncorrected error reporting disabled "
-                           "for bank %d\n", cenv->cpu_index, bank);
+            monitor_printf(params->mon,
+                           "CPU %d: Uncorrected error reporting disabled for"
+                           " bank %d\n",
+                           cenv->cpu_index, params->bank);
             return;
         }
 
         if ((cenv->mcg_status & MCG_STATUS_MCIP) ||
             !(cenv->cr[4] & CR4_MCE_MASK)) {
-            monitor_printf(mon, "CPU %d: Previous MCE still in progress, "
-                                "raising triple fault\n", cenv->cpu_index);
+            monitor_printf(params->mon,
+                           "CPU %d: Previous MCE still in progress, raising"
+                           " triple fault\n",
+                           cenv->cpu_index);
             qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
             qemu_system_reset_request();
             return;
         }
         if (banks[1] & MCI_STATUS_VAL) {
-            status |= MCI_STATUS_OVER;
+            params->status |= MCI_STATUS_OVER;
         }
-        banks[2] = addr;
-        banks[3] = misc;
-        cenv->mcg_status = mcg_status;
-        banks[1] = status;
+        banks[2] = params->addr;
+        banks[3] = params->misc;
+        cenv->mcg_status = params->mcg_status;
+        banks[1] = params->status;
         cpu_interrupt(cenv, CPU_INTERRUPT_MCE);
     } else if (!(banks[1] & MCI_STATUS_VAL)
                || !(banks[1] & MCI_STATUS_UC)) {
         if (banks[1] & MCI_STATUS_VAL) {
-            status |= MCI_STATUS_OVER;
+            params->status |= MCI_STATUS_OVER;
         }
-        banks[2] = addr;
-        banks[3] = misc;
-        banks[1] = status;
+        banks[2] = params->addr;
+        banks[3] = params->misc;
+        banks[1] = params->status;
     } else {
         banks[1] |= MCI_STATUS_OVER;
     }
@@ -1138,6 +1155,16 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
                         uint64_t status, uint64_t mcg_status, uint64_t addr,
                         uint64_t misc, int flags)
 {
+    MCEInjectionParams params = {
+        .mon = mon,
+        .env = cenv,
+        .bank = bank,
+        .status = status,
+        .mcg_status = mcg_status,
+        .addr = addr,
+        .misc = misc,
+        .flags = flags,
+    };
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
     int flag = 0;
@@ -1167,17 +1194,19 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
 
         kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
-        qemu_inject_x86_mce(mon, cenv, bank, status, mcg_status, addr, misc,
-                            flags);
+        run_on_cpu(cenv, do_inject_x86_mce, &params);
         if (flags & MCE_INJECT_BROADCAST) {
+            params.bank = 1;
+            params.status = MCI_STATUS_VAL | MCI_STATUS_UC;
+            params.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
+            params.addr = 0;
+            params.misc = 0;
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
                 if (cenv == env) {
                     continue;
                 }
-                qemu_inject_x86_mce(mon, env, 1,
-                                    MCI_STATUS_VAL | MCI_STATUS_UC,
-                                    MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
-                                    flags);
+                params.env = env;
+                run_on_cpu(cenv, do_inject_x86_mce, &params);
             }
         }
     }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 13/17] kvm: x86: Consolidate TCG and KVM MCE injection code
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (11 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 12/17] x86: Run qemu_inject_x86_mce on target VCPU Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 14/17] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
in the VCPUState also for KVM.

We have to drop the MSRs writeback restrictions for this purpose which
is now safe as every uncoordinated MSR injection is removed with this
patch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/helper.c  |   34 +++-----
 target-i386/kvm.c     |  238 +++++-------------------------------------------
 target-i386/kvm_x86.h |   25 -----
 3 files changed, 37 insertions(+), 260 deletions(-)
 delete mode 100644 target-i386/kvm_x86.h

diff --git a/target-i386/helper.c b/target-i386/helper.c
index a32960c..a08309f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -27,7 +27,6 @@
 #include "exec-all.h"
 #include "qemu-common.h"
 #include "kvm.h"
-#include "kvm_x86.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu.h"
 #include "monitor.h"
@@ -1167,7 +1166,6 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
     };
     unsigned bank_num = cenv->mcg_cap & 0xff;
     CPUState *env;
-    int flag = 0;
 
     if (!cenv->mcg_cap) {
         monitor_printf(mon, "MCE injection not supported\n");
@@ -1187,27 +1185,19 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
         return;
     }
 
-    if (kvm_enabled()) {
-        if (flags & MCE_INJECT_BROADCAST) {
-            flag |= MCE_BROADCAST;
-        }
-
-        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
-    } else {
-        run_on_cpu(cenv, do_inject_x86_mce, &params);
-        if (flags & MCE_INJECT_BROADCAST) {
-            params.bank = 1;
-            params.status = MCI_STATUS_VAL | MCI_STATUS_UC;
-            params.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
-            params.addr = 0;
-            params.misc = 0;
-            for (env = first_cpu; env != NULL; env = env->next_cpu) {
-                if (cenv == env) {
-                    continue;
-                }
-                params.env = env;
-                run_on_cpu(cenv, do_inject_x86_mce, &params);
+    run_on_cpu(cenv, do_inject_x86_mce, &params);
+    if (flags & MCE_INJECT_BROADCAST) {
+        params.bank = 1;
+        params.status = MCI_STATUS_VAL | MCI_STATUS_UC;
+        params.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
+        params.addr = 0;
+        params.misc = 0;
+        for (env = first_cpu; env != NULL; env = env->next_cpu) {
+            if (cenv == env) {
+                continue;
             }
+            params.env = env;
+            run_on_cpu(cenv, do_inject_x86_mce, &params);
         }
     }
 }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 939edc8..be896dd 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -28,7 +28,6 @@
 #include "hw/pc.h"
 #include "hw/apic.h"
 #include "ioport.h"
-#include "kvm_x86.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -193,164 +192,23 @@ static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
     return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
 }
 
-static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m)
+static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code)
 {
-    return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m);
-}
-
-static int kvm_get_msr(CPUState *env, struct kvm_msr_entry *msrs, int n)
-{
-    struct kvm_msrs *kmsrs = qemu_malloc(sizeof *kmsrs + n * sizeof *msrs);
-    int r;
-
-    kmsrs->nmsrs = n;
-    memcpy(kmsrs->entries, msrs, n * sizeof *msrs);
-    r = kvm_vcpu_ioctl(env, KVM_GET_MSRS, kmsrs);
-    memcpy(msrs, kmsrs->entries, n * sizeof *msrs);
-    free(kmsrs);
-    return r;
-}
-
-/* FIXME: kill this and kvm_get_msr, use env->mcg_status instead */
-static int kvm_mce_in_progress(CPUState *env)
-{
-    struct kvm_msr_entry msr_mcg_status = {
-        .index = MSR_MCG_STATUS,
-    };
-    int r;
-
-    r = kvm_get_msr(env, &msr_mcg_status, 1);
-    if (r == -1 || r == 0) {
-        fprintf(stderr, "Failed to get MCE status\n");
-        return 0;
-    }
-    return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
-}
-
-struct kvm_x86_mce_data
-{
-    CPUState *env;
-    struct kvm_x86_mce *mce;
-    int abort_on_error;
-};
-
-static void kvm_do_inject_x86_mce(void *_data)
-{
-    struct kvm_x86_mce_data *data = _data;
-    int r;
-
-    /* If there is an MCE exception being processed, ignore this SRAO MCE */
-    if ((data->env->mcg_cap & MCG_SER_P) &&
-        !(data->mce->status & MCI_STATUS_AR)) {
-        if (kvm_mce_in_progress(data->env)) {
-            return;
-        }
-    }
-
-    r = kvm_set_mce(data->env, data->mce);
-    if (r < 0) {
-        perror("kvm_set_mce FAILED");
-        if (data->abort_on_error) {
-            abort();
-        }
-    }
-}
-
-static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
-                                  int flag)
-{
-    struct kvm_x86_mce_data data = {
-        .env = env,
-        .mce = mce,
-        .abort_on_error = (flag & ABORT_ON_ERROR),
-    };
-
-    if (!env->mcg_cap) {
-        fprintf(stderr, "MCE support is not enabled!\n");
-        return;
-    }
-
-    run_on_cpu(env, kvm_do_inject_x86_mce, &data);
-}
+    uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
+                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+    uint64_t mcg_status = MCG_STATUS_MCIP;
 
-static void kvm_mce_broadcast_rest(CPUState *env)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 1,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = 0,
-        .misc = 0,
-    };
-    CPUState *cenv;
-
-    /* Broadcast MCA signal for processor version 06H_EH and above */
-    if (cpu_x86_support_mca_broadcast(env)) {
-        for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-            if (cenv == env) {
-                continue;
-            }
-            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
-        }
-    }
-}
-
-static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | MCI_STATUS_AR | 0x134,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
-    }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-    int r;
-
-    r = kvm_set_mce(env, &mce);
-    if (r < 0) {
-        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-        abort();
+    if (code == BUS_MCEERR_AR) {
+        status |= MCI_STATUS_AR | 0x134;
+        mcg_status |= MCG_STATUS_EIPV;
+    } else {
+        status |= 0xc0;
+        mcg_status |= MCG_STATUS_RIPV;
     }
-    kvm_mce_broadcast_rest(env);
-}
-
-static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
-{
-    struct kvm_x86_mce mce = {
-        .bank = 9,
-        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                  | 0xc0,
-        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-        .addr = paddr,
-        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
-    };
-
-    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
-    kvm_mce_broadcast_rest(env);
+    cpu_x86_inject_mce(NULL, env, 9, status, mcg_status, paddr,
+                       (MCM_ADDR_PHYS << 6) | 0xc,
+                       cpu_x86_support_mca_broadcast(env) ?
+                       MCE_INJECT_BROADCAST : 0);
 }
 #endif /* KVM_CAP_MCE */
 
@@ -363,16 +221,14 @@ static void hardware_memory_error(void)
 int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
 {
 #ifdef KVM_CAP_MCE
-    void *vaddr;
     ram_addr_t ram_addr;
     target_phys_addr_t paddr;
 
     if ((env->mcg_cap & MCG_SER_P) && addr
-        && (code == BUS_MCEERR_AR
-            || code == BUS_MCEERR_AO)) {
-        vaddr = (void *)addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
+        && (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
+        if (qemu_ram_addr_from_host(addr, &ram_addr) ||
+            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr,
+                                               &paddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!\n");
             /* Hope we are lucky for AO MCE */
@@ -382,20 +238,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
                 hardware_memory_error();
             }
         }
-
-        if (code == BUS_MCEERR_AR) {
-            /* Fake an Intel architectural Data Load SRAR UCR */
-            kvm_mce_inj_srar_dataload(env, paddr);
-        } else {
-            /*
-             * If there is an MCE excpetion being processed, ignore
-             * this SRAO MCE
-             */
-            if (!kvm_mce_in_progress(env)) {
-                /* Fake an Intel architectural Memory scrubbing UCR */
-                kvm_mce_inj_srao_memscrub(env, paddr);
-            }
-        }
+        kvm_mce_inject(env, paddr, code);
     } else
 #endif /* KVM_CAP_MCE */
     {
@@ -414,20 +257,18 @@ int kvm_arch_on_sigbus(int code, void *addr)
 {
 #ifdef KVM_CAP_MCE
     if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
-        void *vaddr;
         ram_addr_t ram_addr;
         target_phys_addr_t paddr;
 
         /* Hope we are lucky for AO MCE */
-        vaddr = addr;
-        if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
+        if (qemu_ram_addr_from_host(addr, &ram_addr) ||
             !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr,
                                                &paddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!: %p\n", addr);
             return 0;
         }
-        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
+        kvm_mce_inject(first_cpu, paddr, code);
     } else
 #endif /* KVM_CAP_MCE */
     {
@@ -442,31 +283,6 @@ int kvm_arch_on_sigbus(int code, void *addr)
     return 0;
 }
 
-void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int flag)
-{
-#ifdef KVM_CAP_MCE
-    struct kvm_x86_mce mce = {
-        .bank = bank,
-        .status = status,
-        .mcg_status = mcg_status,
-        .addr = addr,
-        .misc = misc,
-    };
-
-    if (flag & MCE_BROADCAST) {
-        kvm_mce_broadcast_rest(cenv);
-    }
-
-    kvm_inject_x86_mce_on(cenv, &mce, flag);
-#else /* !KVM_CAP_MCE*/
-    if (flag & ABORT_ON_ERROR) {
-        abort();
-    }
-#endif /* !KVM_CAP_MCE*/
-}
-
 static int kvm_inject_mce_oldstyle(CPUState *env)
 {
 #ifdef KVM_CAP_MCE
@@ -1053,14 +869,10 @@ static int kvm_put_msrs(CPUState *env, int level)
     if (env->mcg_cap) {
         int i;
 
-        if (level == KVM_PUT_RESET_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-        } else if (level == KVM_PUT_FULL_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
-            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
-                kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
-            }
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
+        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
+            kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
         }
     }
 #endif
diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
deleted file mode 100644
index 9d7b584..0000000
--- a/target-i386/kvm_x86.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- * QEMU KVM support
- *
- * Copyright (C) 2009 Red Hat Inc.
- * Copyright IBM, Corp. 2008
- *
- * Authors:
- *  Anthony Liguori   <aliguori@us.ibm.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef __KVM_X86_H__
-#define __KVM_X86_H__
-
-#define ABORT_ON_ERROR  0x01
-#define MCE_BROADCAST   0x02
-
-void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int flag);
-
-#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 14/17] kvm: x86: Clean up kvm_setup_mce
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (12 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 13/17] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 15/17] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

There is nothing to abstract here. Fold kvm_setup_mce into its caller
and fix up the error reporting (return code of kvm_vcpu_ioctl holds the
error value).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index be896dd..486efb9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -187,11 +187,6 @@ static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
     return -ENOSYS;
 }
 
-static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
-{
-    return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
-}
-
 static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code)
 {
     uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
@@ -440,6 +435,7 @@ int kvm_arch_init_vcpu(CPUState *env)
         && kvm_check_extension(env->kvm_state, KVM_CAP_MCE) > 0) {
         uint64_t mcg_cap;
         int banks;
+        int ret;
 
         if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
             perror("kvm_get_mce_cap_supported FAILED");
@@ -448,8 +444,9 @@ int kvm_arch_init_vcpu(CPUState *env)
                 banks = MCE_BANKS_DEF;
             mcg_cap &= MCE_CAP_DEF;
             mcg_cap |= banks;
-            if (kvm_setup_mce(env, &mcg_cap)) {
-                perror("kvm_setup_mce FAILED");
+            ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
+            if (ret < 0) {
+                fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
             } else {
                 env->mcg_cap = mcg_cap;
             }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 15/17] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (13 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 14/17] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 16/17] Add qemu_ram_remap Jan Kiszka
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Hidetoshi Seto, Jin Dongming, qemu-devel, kvm, Huang Ying

There is no reason to continue if the kernel claims to support MCE but
then fails to process our request.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
---
 target-i386/kvm.c |   30 +++++++++++++++++-------------
 1 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 486efb9..44e5504 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -437,20 +437,24 @@ int kvm_arch_init_vcpu(CPUState *env)
         int banks;
         int ret;
 
-        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
-            perror("kvm_get_mce_cap_supported FAILED");
-        } else {
-            if (banks > MCE_BANKS_DEF)
-                banks = MCE_BANKS_DEF;
-            mcg_cap &= MCE_CAP_DEF;
-            mcg_cap |= banks;
-            ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
-            if (ret < 0) {
-                fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
-            } else {
-                env->mcg_cap = mcg_cap;
-            }
+        ret = kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks);
+        if (ret < 0) {
+            fprintf(stderr, "kvm_get_mce_cap_supported: %s", strerror(-ret));
+            return ret;
         }
+
+        if (banks > MCE_BANKS_DEF) {
+            banks = MCE_BANKS_DEF;
+        }
+        mcg_cap &= MCE_CAP_DEF;
+        mcg_cap |= banks;
+        ret = kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, &mcg_cap);
+        if (ret < 0) {
+            fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
+            return ret;
+        }
+
+        env->mcg_cap = mcg_cap;
     }
 #endif
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 16/17] Add qemu_ram_remap
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (14 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 15/17] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 17/17] KVM, MCE, unpoison memory address across reboot Jan Kiszka
  2011-03-03 18:02 ` [Qemu-devel] Re: [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Marcelo Tosatti
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm, Huang Ying

From: Huang Ying <ying.huang@intel.com>

qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again.  This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.

[ Jan: style fixlets, WIN32 fix ]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpu-all.h    |    4 +++
 cpu-common.h |    1 +
 exec.c       |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+), 1 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index caf5e6c..4f4631d 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -863,10 +863,14 @@ target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr);
 extern int phys_ram_fd;
 extern ram_addr_t ram_size;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC_MASK   (1 << 0)
+
 typedef struct RAMBlock {
     uint8_t *host;
     ram_addr_t offset;
     ram_addr_t length;
+    uint32_t flags;
     char idstr[256];
     QLIST_ENTRY(RAMBlock) next;
 #if defined(__linux__) && !defined(TARGET_S390X)
diff --git a/cpu-common.h b/cpu-common.h
index 54d21d4..ef4e8da 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -50,6 +50,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
                         ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
 void qemu_ram_free(ram_addr_t addr);
+void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
 /* Same but slower, to use for migration, where the order of
diff --git a/exec.c b/exec.c
index d611100..9308a97 100644
--- a/exec.c
+++ b/exec.c
@@ -2867,6 +2867,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
 
     if (host) {
         new_block->host = host;
+        new_block->flags |= RAM_PREALLOC_MASK;
     } else {
         if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
@@ -2920,7 +2921,9 @@ void qemu_ram_free(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
-            if (mem_path) {
+            if (block->flags & RAM_PREALLOC_MASK) {
+                ;
+            } else if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
                 if (block->fd) {
                     munmap(block->host, block->length);
@@ -2943,6 +2946,64 @@ void qemu_ram_free(ram_addr_t addr)
 
 }
 
+#ifndef _WIN32
+void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    int flags;
+    void *area, *vaddr;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        offset = addr - block->offset;
+        if (offset < block->length) {
+            vaddr = block->host + offset;
+            if (block->flags & RAM_PREALLOC_MASK) {
+                ;
+            } else {
+                flags = MAP_FIXED;
+                munmap(vaddr, length);
+                if (mem_path) {
+#if defined(__linux__) && !defined(TARGET_S390X)
+                    if (block->fd) {
+#ifdef MAP_POPULATE
+                        flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
+                            MAP_PRIVATE;
+#else
+                        flags |= MAP_PRIVATE;
+#endif
+                        area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+                                    flags, block->fd, offset);
+                    } else {
+                        flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+                        area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+                                    flags, -1, 0);
+                    }
+#endif
+                } else {
+#if defined(TARGET_S390X) && defined(CONFIG_KVM)
+                    flags |= MAP_SHARED | MAP_ANONYMOUS;
+                    area = mmap(vaddr, length, PROT_EXEC|PROT_READ|PROT_WRITE,
+                                flags, -1, 0);
+#else
+                    flags |= MAP_PRIVATE | MAP_ANONYMOUS;
+                    area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+                                flags, -1, 0);
+#endif
+                }
+                if (area != vaddr) {
+                    fprintf(stderr, "Could not remap addr: %lx@%lx\n",
+                            length, addr);
+                    exit(1);
+                }
+                qemu_madvise(vaddr, length, QEMU_MADV_MERGEABLE);
+            }
+            return;
+        }
+    }
+}
+#endif /* !_WIN32 */
+
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
    With the exception of the softmmu code in this file, this should
    only be used for local memory (e.g. video ram) that the device owns,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v3 17/17] KVM, MCE, unpoison memory address across reboot
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (15 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 16/17] Add qemu_ram_remap Jan Kiszka
@ 2011-03-02  7:56 ` Jan Kiszka
  2011-03-03 18:02 ` [Qemu-devel] Re: [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Marcelo Tosatti
  17 siblings, 0 replies; 19+ messages in thread
From: Jan Kiszka @ 2011-03-02  7:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: qemu-devel, kvm, Huang Ying

From: Huang Ying <ying.huang@intel.com>

In Linux kernel HWPoison processing implementation, the virtual
address in processes mapping the error physical memory page is marked
as HWPoison.  So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.

If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS.  If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system.  But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated.  That is, guest system can not recover via rebooting.

In fact, across rebooting, the contents of guest physical memory page
need not to be kept.  We can allocate a new host physical page to
back the corresponding guest physical address.

This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
to clear the corresponding page table entry, so that make it possible
to allocate a new page to recover the issue.

[ Jan: rebasing and tiny cleanups]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 target-i386/kvm.c |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 44e5504..7b7105d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -173,7 +173,40 @@ static int get_para_features(CPUState *env)
 }
 #endif /* CONFIG_KVM_PARA */
 
+typedef struct HWPoisonPage {
+    ram_addr_t ram_addr;
+    QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+static void kvm_unpoison_all(void *param)
+{
+    HWPoisonPage *page, *next_page;
+
+    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+        QLIST_REMOVE(page, list);
+        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+        qemu_free(page);
+    }
+}
+
 #ifdef KVM_CAP_MCE
+static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+    HWPoisonPage *page;
+
+    QLIST_FOREACH(page, &hwpoison_page_list, list) {
+        if (page->ram_addr == ram_addr) {
+            return;
+        }
+    }
+    page = qemu_malloc(sizeof(HWPoisonPage));
+    page->ram_addr = ram_addr;
+    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
                                      int *max_banks)
 {
@@ -233,6 +266,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
                 hardware_memory_error();
             }
         }
+        kvm_hwpoison_page_add(ram_addr);
         kvm_mce_inject(env, paddr, code);
     } else
 #endif /* KVM_CAP_MCE */
@@ -263,6 +297,7 @@ int kvm_arch_on_sigbus(int code, void *addr)
                     "QEMU itself instead of guest system!: %p\n", addr);
             return 0;
         }
+        kvm_hwpoison_page_add(ram_addr);
         kvm_mce_inject(first_cpu, paddr, code);
     } else
 #endif /* KVM_CAP_MCE */
@@ -571,6 +606,7 @@ int kvm_arch_init(KVMState *s)
         fprintf(stderr, "e820_add_entry() table is full\n");
         return ret;
     }
+    qemu_register_reset(kvm_unpoison_all, NULL);
 
     return 0;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] Re: [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition)
  2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
                   ` (16 preceding siblings ...)
  2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 17/17] KVM, MCE, unpoison memory address across reboot Jan Kiszka
@ 2011-03-03 18:02 ` Marcelo Tosatti
  17 siblings, 0 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2011-03-03 18:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Hidetoshi Seto, kvm, qemu-devel, Avi Kivity, Huang Ying,
	Jin Dongming, David Gibson

On Wed, Mar 02, 2011 at 08:56:03AM +0100, Jan Kiszka wrote:
> This is mostly a rebase of the previous round, just including one
> additional ppc build fix (patch 2) for a regression in a previous part.
> Please merge.
> 
> CC: David Gibson <david@gibson.dropbear.id.au>
> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> CC: Huang Ying <ying.huang@intel.com>
> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-03 18:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-02  7:56 [Qemu-devel] [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 01/17] kvm: ppc: Fix breakage of kvm_arch_pre_run/process_irqchip_events Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 02/17] kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lacking Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 03/17] x86: Account for MCE in cpu_has_work Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 04/17] x86: Perform implicit mcg_status reset Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 05/17] x86: Small cleanups of MCE helpers Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 06/17] x86: Refine error reporting of MCE injection services Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 07/17] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 08/17] Synchronize VCPU states before reset Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 09/17] kvm: x86: Move MCE functions together Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 10/17] kvm: Rename kvm_arch_process_irqchip_events to async_events Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 11/17] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 12/17] x86: Run qemu_inject_x86_mce on target VCPU Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 13/17] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 14/17] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 15/17] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 16/17] Add qemu_ram_remap Jan Kiszka
2011-03-02  7:56 ` [Qemu-devel] [PATCH v3 17/17] KVM, MCE, unpoison memory address across reboot Jan Kiszka
2011-03-03 18:02 ` [Qemu-devel] Re: [PATCH v3 00/17] [uq/master] Patch queue, part IV (MCE edition) Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).