qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code
@ 2010-02-24 14:17 Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 01/10] qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness Jan Kiszka
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Let's proceed with cleaning up the VCPU state writeback. The differences
to v2 are:
 - included guest debugging update patch and rebased on top of it
 - renamed KVM_PUT_ASYNC_STATE->KVM_PUT_RUNTIME_STATE and added comments
 - fixed mp_state corruption due to early use of cpu_is_bsp

Some patches target uq/master (fully or partially), but I will post them
separately once these bits are acceptible.

Pull URL is

	git://git.kiszka.org/qemu-kvm.git queues/queues-kvm-merge

Jan Kiszka (10):
  qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness
  qemu-kvm: Rework VCPU state writeback API
  x86: Extend validity of cpu_is_bsp
  qemu-kvm: Clean up mpstate synchronization
  KVM: x86: Restrict writeback of VCPU state
  qemu-kvm: Use VCPU event state for reset and vmsave/load
  qemu-kvm: Cleanup/fix TSC and PV clock writeback
  qemu-kvm: Clean up KVM's APIC hooks
  qemu-kvm: Move kvm_set_boot_cpu_id
  qemu-kvm: Bring qemu_init_vcpu back home

 exec.c                |   17 -----
 hw/apic.c             |   47 ++-------------
 hw/pc.c               |   12 +---
 hw/ppc_newworld.c     |    3 -
 hw/ppc_oldworld.c     |    3 -
 hw/s390-virtio.c      |    1 -
 kvm-all.c             |   31 ++++++++--
 kvm.h                 |   28 ++++++++-
 qemu-kvm-ia64.c       |    6 +-
 qemu-kvm-x86.c        |  158 +++++++++++++++++++++++-------------------------
 qemu-kvm.c            |   46 ++++----------
 qemu-kvm.h            |   27 +-------
 savevm.c              |    4 +
 sysemu.h              |    4 +
 target-i386/helper.c  |    2 +
 target-i386/kvm.c     |   94 +++++++++++++++++------------
 target-i386/machine.c |   27 --------
 target-ia64/machine.c |    5 +-
 target-ppc/kvm.c      |    2 +-
 target-ppc/machine.c  |    4 -
 target-s390x/kvm.c    |    3 +-
 vl.c                  |   29 +++++++++
 22 files changed, 254 insertions(+), 299 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 01/10] qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 02/10] qemu-kvm: Rework VCPU state writeback API Jan Kiszka
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

This add-on patch to recent guest debugging refactorings adds the
requested awareness for KVM_CAP_X86_ROBUST_SINGLESTEP to both the
upstream as well as qemu-kvm's own code. Fortunately, code sharing
increased once again.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm-all.c         |   12 ++++++++++
 kvm.h             |    1 +
 qemu-kvm-x86.c    |   27 +----------------------
 qemu-kvm.h        |    1 +
 target-i386/kvm.c |   60 ++++++++++++++++++++++++++++++----------------------
 5 files changed, 51 insertions(+), 50 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index bed562f..0776bf5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -65,6 +65,7 @@ struct KVMState
     int broken_set_mem_region;
     int migration_log;
     int vcpu_events;
+    int robust_singlestep;
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
@@ -673,6 +674,12 @@ int kvm_init(int smp_cpus)
     s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
 #endif
 
+    s->robust_singlestep = 0;
+#ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
+    s->robust_singlestep =
+        kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
+#endif
+
     ret = kvm_arch_init(s, smp_cpus);
     if (ret < 0)
         goto err;
@@ -932,6 +939,11 @@ int kvm_has_vcpu_events(void)
     return kvm_state->vcpu_events;
 }
 
+int kvm_has_robust_singlestep(void)
+{
+    return kvm_state->robust_singlestep;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
     if (!kvm_has_sync_mmu()) {
diff --git a/kvm.h b/kvm.h
index a78a27f..427dad0 100644
--- a/kvm.h
+++ b/kvm.h
@@ -43,6 +43,7 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);
 
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
+int kvm_has_robust_singlestep(void);
 int kvm_put_vcpu_events(CPUState *env);
 int kvm_get_vcpu_events(CPUState *env);
 
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5af9ce1..5db95f8 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -922,31 +922,8 @@ void kvm_arch_load_regs(CPUState *env)
     if (rc == -1)
         perror("kvm_set_msrs FAILED");
 
-    /*
-     * Kernels before 2.6.33 (which correlates with !kvm_has_vcpu_events())
-     * overwrote flags.TF injected via SET_GUEST_DEBUG while updating GP regs.
-     * Work around this by updating the debug state once again if
-     * single-stepping is on.
-     * Another reason to call kvm_update_guest_debug here is a pending debug
-     * trap raise by the guest. On kernels without SET_VCPU_EVENTS we have to
-     * reinject them via SET_GUEST_DEBUG.
-     */
-    if (!kvm_has_vcpu_events() &&
-        (env->exception_injected != -1 || env->singlestep_enabled)) {
-        unsigned long reinject_trap = 0;
-
-        if (env->exception_injected == 1) {
-            reinject_trap = KVM_GUESTDBG_INJECT_DB;
-        } else if (env->exception_injected == 3) {
-            reinject_trap = KVM_GUESTDBG_INJECT_BP;
-        }
-        env->exception_injected = -1;
-
-        rc = kvm_update_guest_debug(env, reinject_trap);
-        if (rc < 0) {
-            perror("kvm_update_guest_debug FAILED");
-        }
-    }
+    /* must be last */
+    kvm_guest_debug_workarounds(env);
 }
 
 void kvm_load_tsc(CPUState *env)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 93b144f..172eba8 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1027,6 +1027,7 @@ struct KVMState {
     int broken_set_mem_region;
     int migration_log;
     int vcpu_events;
+    int robust_singlestep;
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_HEAD(, kvm_sw_breakpoint) kvm_sw_breakpoints;
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 01963e1..cf64c9a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -853,6 +853,37 @@ int kvm_get_vcpu_events(CPUState *env)
     return 0;
 }
 
+static int kvm_guest_debug_workarounds(CPUState *env)
+{
+    int ret = 0;
+#ifdef KVM_CAP_SET_GUEST_DEBUG
+    unsigned long reinject_trap = 0;
+
+    if (!kvm_has_vcpu_events()) {
+        if (env->exception_injected == 1) {
+            reinject_trap = KVM_GUESTDBG_INJECT_DB;
+        } else if (env->exception_injected == 3) {
+            reinject_trap = KVM_GUESTDBG_INJECT_BP;
+        }
+        env->exception_injected = -1;
+    }
+
+    /*
+     * Kernels before KVM_CAP_X86_ROBUST_SINGLESTEP overwrote flags.TF
+     * injected via SET_GUEST_DEBUG while updating GP regs. Work around this
+     * by updating the debug state once again if single-stepping is on.
+     * Another reason to call kvm_update_guest_debug here is a pending debug
+     * trap raise by the guest. On kernels without SET_VCPU_EVENTS we have to
+     * reinject them via SET_GUEST_DEBUG.
+     */
+    if (reinject_trap ||
+        (!kvm_has_robust_singlestep() && env->singlestep_enabled)) {
+        ret = kvm_update_guest_debug(env, reinject_trap);
+    }
+#endif /* KVM_CAP_SET_GUEST_DEBUG */
+    return ret;
+}
+
 #ifdef KVM_UPSTREAM
 int kvm_arch_put_registers(CPUState *env)
 {
@@ -882,31 +913,10 @@ int kvm_arch_put_registers(CPUState *env)
     if (ret < 0)
         return ret;
 
-    /*
-     * Kernels before 2.6.33 (which correlates with !kvm_has_vcpu_events())
-     * overwrote flags.TF injected via SET_GUEST_DEBUG while updating GP regs.
-     * Work around this by updating the debug state once again if
-     * single-stepping is on.
-     * Another reason to call kvm_update_guest_debug here is a pending debug
-     * trap raise by the guest. On kernels without SET_VCPU_EVENTS we have to
-     * reinject them via SET_GUEST_DEBUG.
-     */
-    if (!kvm_has_vcpu_events() &&
-        (env->exception_injected != -1 || env->singlestep_enabled)) {
-        unsigned long reinject_trap = 0;
-
-        if (env->exception_injected == 1) {
-            reinject_trap = KVM_GUESTDBG_INJECT_DB;
-        } else if (env->exception_injected == 3) {
-            reinject_trap = KVM_GUESTDBG_INJECT_BP;
-        }
-        env->exception_injected = -1;
-
-        ret = kvm_update_guest_debug(env, reinject_trap);
-        if (ret < 0) {
-            return ret;
-        }
-    }
+    /* must be last */
+    ret = kvm_guest_debug_workarounds(env);
+    if (ret < 0)
+        return ret;
 
     return 0;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 02/10] qemu-kvm: Rework VCPU state writeback API
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 01/10] qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 03/10] x86: Extend validity of cpu_is_bsp Jan Kiszka
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:

- cpu_synchronize_all_states in qemu_savevm_state_complete
  (initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
  (writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
  (writeback after system reset)

These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:

- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE    (on init or vmload, all VCPUs stopped as well)

This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.

cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.

Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs. It does
not touch qemu-kvm's special hooks for mpstate, vcpu_events, or tsc
loading. They will be cleaned up by individual patches.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 exec.c                |   17 -----------------
 hw/apic.c             |    3 ---
 hw/pc.c               |    1 -
 hw/ppc_newworld.c     |    3 ---
 hw/ppc_oldworld.c     |    3 ---
 hw/s390-virtio.c      |    1 -
 kvm-all.c             |   19 +++++++++++++------
 kvm.h                 |   25 ++++++++++++++++++++++++-
 qemu-kvm-ia64.c       |    2 +-
 qemu-kvm-x86.c        |    3 +--
 qemu-kvm.c            |   16 +++++++++++++---
 qemu-kvm.h            |    2 +-
 savevm.c              |    4 ++++
 sysemu.h              |    4 ++++
 target-i386/kvm.c     |    2 +-
 target-i386/machine.c |   10 ----------
 target-ia64/machine.c |    2 --
 target-ppc/kvm.c      |    2 +-
 target-ppc/machine.c  |    4 ----
 target-s390x/kvm.c    |    3 +--
 vl.c                  |   29 +++++++++++++++++++++++++++++
 21 files changed, 93 insertions(+), 62 deletions(-)

diff --git a/exec.c b/exec.c
index b647512..69003c2 100644
--- a/exec.c
+++ b/exec.c
@@ -530,21 +530,6 @@ void cpu_exec_init_all(unsigned long tb_size)
 
 #if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
 
-static void cpu_common_pre_save(void *opaque)
-{
-    CPUState *env = opaque;
-
-    cpu_synchronize_state(env);
-}
-
-static int cpu_common_pre_load(void *opaque)
-{
-    CPUState *env = opaque;
-
-    cpu_synchronize_state(env);
-    return 0;
-}
-
 static int cpu_common_post_load(void *opaque, int version_id)
 {
     CPUState *env = opaque;
@@ -562,8 +547,6 @@ static const VMStateDescription vmstate_cpu_common = {
     .version_id = 1,
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
-    .pre_save = cpu_common_pre_save,
-    .pre_load = cpu_common_pre_load,
     .post_load = cpu_common_post_load,
     .fields      = (VMStateField []) {
         VMSTATE_UINT32(halted, CPUState),
diff --git a/hw/apic.c b/hw/apic.c
index ae805dc..3e03e10 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -488,7 +488,6 @@ void apic_init_reset(CPUState *env)
     if (!s)
         return;
 
-    cpu_synchronize_state(env);
     s->tpr = 0;
     s->spurious_vec = 0xff;
     s->log_dest = 0;
@@ -1070,8 +1069,6 @@ static void apic_reset(void *opaque)
     APICState *s = opaque;
     int bsp;
 
-    cpu_synchronize_state(s->cpu_env);
-
     bsp = cpu_is_bsp(s->cpu_env);
     s->apicbase = 0xfee00000 |
         (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
diff --git a/hw/pc.c b/hw/pc.c
index dc935a8..9055866 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -744,7 +744,6 @@ CPUState *pc_new_cpu(const char *cpu_model)
         fprintf(stderr, "Unable to find x86 CPU definition\n");
         exit(1);
     }
-    env->kvm_vcpu_dirty = 1;
     if ((env->cpuid_features & CPUID_APIC) || smp_cpus > 1) {
         env->cpuid_apic_id = env->cpu_index;
         /* APIC reset callback resets cpu */
diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c
index a4c714a..9e288bd 100644
--- a/hw/ppc_newworld.c
+++ b/hw/ppc_newworld.c
@@ -139,9 +139,6 @@ static void ppc_core99_init (ram_addr_t ram_size,
         envs[i] = env;
     }
 
-    /* Make sure all register sets take effect */
-    cpu_synchronize_state(env);
-
     /* allocate RAM */
     ram_offset = qemu_ram_alloc(ram_size);
     cpu_register_physical_memory(0, ram_size, ram_offset);
diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c
index 7ccc6a1..1aa05ed 100644
--- a/hw/ppc_oldworld.c
+++ b/hw/ppc_oldworld.c
@@ -164,9 +164,6 @@ static void ppc_heathrow_init (ram_addr_t ram_size,
         envs[i] = env;
     }
 
-    /* Make sure all register sets take effect */
-    cpu_synchronize_state(env);
-
     /* allocate RAM */
     if (ram_size > (2047 << 20)) {
         fprintf(stderr,
diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
index 3582728..ad3386f 100644
--- a/hw/s390-virtio.c
+++ b/hw/s390-virtio.c
@@ -185,7 +185,6 @@ static void s390_init(ram_addr_t ram_size,
             exit(1);
         }
 
-        cpu_synchronize_state(env);
         env->psw.addr = KERN_IMAGE_START;
         env->psw.mask = 0x0000000180000000ULL;
     }
diff --git a/kvm-all.c b/kvm-all.c
index 0776bf5..278225e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -156,10 +156,6 @@ static void kvm_reset_vcpu(void *opaque)
     CPUState *env = opaque;
 
     kvm_arch_reset_vcpu(env);
-    if (kvm_arch_put_registers(env)) {
-        fprintf(stderr, "Fatal: kvm vcpu reset failed\n");
-        abort();
-    }
 }
 #endif
 
@@ -216,7 +212,6 @@ int kvm_init_vcpu(CPUState *env)
     if (ret == 0) {
         qemu_register_reset(kvm_reset_vcpu, env);
         kvm_arch_reset_vcpu(env);
-        ret = kvm_arch_put_registers(env);
     }
 err:
     return ret;
@@ -770,6 +765,18 @@ void kvm_cpu_synchronize_state(CPUState *env)
     }
 }
 
+void kvm_cpu_synchronize_post_reset(CPUState *env)
+{
+    kvm_arch_put_registers(env, KVM_PUT_RESET_STATE);
+    env->kvm_vcpu_dirty = 0;
+}
+
+void kvm_cpu_synchronize_post_init(CPUState *env)
+{
+    kvm_arch_put_registers(env, KVM_PUT_FULL_STATE);
+    env->kvm_vcpu_dirty = 0;
+}
+
 int kvm_cpu_exec(CPUState *env)
 {
     struct kvm_run *run = env->kvm_run;
@@ -785,7 +792,7 @@ int kvm_cpu_exec(CPUState *env)
         }
 
         if (env->kvm_vcpu_dirty) {
-            kvm_arch_put_registers(env);
+            kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
             env->kvm_vcpu_dirty = 0;
         }
 
diff --git a/kvm.h b/kvm.h
index 427dad0..3ec5b59 100644
--- a/kvm.h
+++ b/kvm.h
@@ -86,7 +86,14 @@ int kvm_arch_pre_run(CPUState *env, struct kvm_run *run);
 
 int kvm_arch_get_registers(CPUState *env);
 
-int kvm_arch_put_registers(CPUState *env);
+/* state subset only touched by the VCPU itself during runtime */
+#define KVM_PUT_RUNTIME_STATE   1
+/* state subset modified during VCPU reset */
+#define KVM_PUT_RESET_STATE     2
+/* full state set, modified during initialization or on vmload */
+#define KVM_PUT_FULL_STATE      3
+
+int kvm_arch_put_registers(CPUState *env, int level);
 
 int kvm_arch_init(KVMState *s, int smp_cpus);
 
@@ -130,6 +137,8 @@ int kvm_check_extension(KVMState *s, unsigned int extension);
 uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
                                       int reg);
 void kvm_cpu_synchronize_state(CPUState *env);
+void kvm_cpu_synchronize_post_reset(CPUState *env);
+void kvm_cpu_synchronize_post_init(CPUState *env);
 
 /* generic hooks - to be moved/refactored once there are more users */
 
@@ -140,4 +149,18 @@ static inline void cpu_synchronize_state(CPUState *env)
     }
 }
 
+static inline void cpu_synchronize_post_reset(CPUState *env)
+{
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_post_reset(env);
+    }
+}
+
+static inline void cpu_synchronize_post_init(CPUState *env)
+{
+    if (kvm_enabled()) {
+        kvm_cpu_synchronize_post_init(env);
+    }
+}
+
 #endif
diff --git a/qemu-kvm-ia64.c b/qemu-kvm-ia64.c
index a11fde8..fc8110e 100644
--- a/qemu-kvm-ia64.c
+++ b/qemu-kvm-ia64.c
@@ -16,7 +16,7 @@ int kvm_arch_qemu_create_context(void)
     return 0;
 }
 
-void kvm_arch_load_regs(CPUState *env)
+void kvm_arch_load_regs(CPUState *env, int level)
 {
 }
 
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5db95f8..47667b3 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -803,7 +803,7 @@ static void get_seg(SegmentCache *lhs, const struct kvm_segment *rhs)
 	| (rhs->avl * DESC_AVL_MASK);
 }
 
-void kvm_arch_load_regs(CPUState *env)
+void kvm_arch_load_regs(CPUState *env, int level)
 {
     struct kvm_regs regs;
     struct kvm_fpu fpu;
@@ -1362,7 +1362,6 @@ void kvm_arch_push_nmi(void *opaque)
 void kvm_arch_cpu_reset(CPUState *env)
 {
     kvm_arch_reset_vcpu(env);
-    kvm_arch_load_regs(env);
     kvm_put_vcpu_events(env);
     if (!cpu_is_bsp(env)) {
 	if (kvm_irqchip_in_kernel()) {
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 9790389..f6d3e4d 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -871,7 +871,7 @@ int pre_kvm_run(kvm_context_t kvm, CPUState *env)
     kvm_arch_pre_run(env, env->kvm_run);
 
     if (env->kvm_vcpu_dirty) {
-        kvm_arch_load_regs(env);
+        kvm_arch_load_regs(env, KVM_PUT_RUNTIME_STATE);
         env->kvm_vcpu_dirty = 0;
     }
 
@@ -1546,6 +1546,18 @@ void kvm_cpu_synchronize_state(CPUState *env)
         on_vcpu(env, do_kvm_cpu_synchronize_state, env);
 }
 
+void kvm_cpu_synchronize_post_reset(CPUState *env)
+{
+    kvm_arch_load_regs(env, KVM_PUT_RESET_STATE);
+    env->kvm_vcpu_dirty = 0;
+}
+
+void kvm_cpu_synchronize_post_init(CPUState *env)
+{
+    kvm_arch_load_regs(env, KVM_PUT_FULL_STATE);
+    env->kvm_vcpu_dirty = 0;
+}
+
 static void inject_interrupt(void *data)
 {
     cpu_interrupt(current_env, (long) data);
@@ -1891,8 +1903,6 @@ static void *ap_main_loop(void *_env)
 
     kvm_arch_init_vcpu(env);
 
-    kvm_arch_load_regs(env);
-
     /* signal VCPU creation */
     current_env->created = 1;
     pthread_cond_signal(&qemu_vcpu_cond);
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 172eba8..fab930d 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -908,7 +908,7 @@ int kvm_qemu_destroy_memory_alias(uint64_t phys_start);
 int kvm_arch_qemu_create_context(void);
 
 void kvm_arch_save_regs(CPUState *env);
-void kvm_arch_load_regs(CPUState *env);
+void kvm_arch_load_regs(CPUState *env, int level);
 void kvm_arch_load_mpstate(CPUState *env);
 void kvm_arch_save_mpstate(CPUState *env);
 int kvm_arch_has_work(CPUState *env);
diff --git a/savevm.c b/savevm.c
index 2fd3de6..f1e675c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1368,6 +1368,8 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 {
     SaveStateEntry *se;
 
+    cpu_synchronize_all_states();
+
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         if (se->save_live_state == NULL)
             continue;
@@ -1568,6 +1570,8 @@ int qemu_loadvm_state(QEMUFile *f)
         }
     }
 
+    cpu_synchronize_all_post_init();
+
     ret = 0;
 
 out:
diff --git a/sysemu.h b/sysemu.h
index ff97786..f92b94a 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -59,6 +59,10 @@ int load_vmstate(Monitor *mon, const char *name);
 void do_delvm(Monitor *mon, const QDict *qdict);
 void do_info_snapshots(Monitor *mon);
 
+void cpu_synchronize_all_states(void);
+void cpu_synchronize_all_post_reset(void);
+void cpu_synchronize_all_post_init(void);
+
 void qemu_announce_self(void);
 
 void main_loop_wait(int timeout);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index cf64c9a..5f0829b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -885,7 +885,7 @@ static int kvm_guest_debug_workarounds(CPUState *env)
 }
 
 #ifdef KVM_UPSTREAM
-int kvm_arch_put_registers(CPUState *env)
+int kvm_arch_put_registers(CPUState *env, int level)
 {
     int ret;
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 0b8a33a..bebde82 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -322,7 +322,6 @@ static void cpu_pre_save(void *opaque)
     CPUState *env = opaque;
     int i;
 
-    cpu_synchronize_state(env);
     if (kvm_enabled()) {
         kvm_save_mpstate(env);
         kvm_get_vcpu_events(env);
@@ -342,14 +341,6 @@ static void cpu_pre_save(void *opaque)
 #endif
 }
 
-static int cpu_pre_load(void *opaque)
-{
-    CPUState *env = opaque;
-
-    cpu_synchronize_state(env);
-    return 0;
-}
-
 static int cpu_post_load(void *opaque, int version_id)
 {
     CPUState *env = opaque;
@@ -389,7 +380,6 @@ static const VMStateDescription vmstate_cpu = {
     .minimum_version_id = 3,
     .minimum_version_id_old = 3,
     .pre_save = cpu_pre_save,
-    .pre_load = cpu_pre_load,
     .post_load = cpu_post_load,
     .fields      = (VMStateField []) {
         VMSTATE_UINTTL_ARRAY(regs, CPUState, CPU_NB_REGS),
diff --git a/target-ia64/machine.c b/target-ia64/machine.c
index 7d29575..fdbeeef 100644
--- a/target-ia64/machine.c
+++ b/target-ia64/machine.c
@@ -9,7 +9,6 @@ void cpu_save(QEMUFile *f, void *opaque)
     CPUState *env = opaque;
 
     if (kvm_enabled()) {
-        kvm_arch_save_regs(env);
         kvm_arch_save_mpstate(env);
     }
 }
@@ -19,7 +18,6 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     CPUState *env = opaque;
 
     if (kvm_enabled()) {
-        kvm_arch_load_regs(env);
         kvm_arch_load_mpstate(env);
     }
     return 0;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 0424a78..debe441 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -57,7 +57,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
 {
 }
 
-int kvm_arch_put_registers(CPUState *env)
+int kvm_arch_put_registers(CPUState *env, int level)
 {
     struct kvm_regs regs;
     int ret;
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index ead38e1..81d3f72 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -8,8 +8,6 @@ void cpu_save(QEMUFile *f, void *opaque)
     CPUState *env = (CPUState *)opaque;
     unsigned int i, j;
 
-    cpu_synchronize_state(env);
-
     for (i = 0; i < 32; i++)
         qemu_put_betls(f, &env->gpr[i]);
 #if !defined(TARGET_PPC64)
@@ -97,8 +95,6 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     CPUState *env = (CPUState *)opaque;
     unsigned int i, j;
 
-    cpu_synchronize_state(env);
-
     for (i = 0; i < 32; i++)
         qemu_get_betls(f, &env->gpr[i]);
 #if !defined(TARGET_PPC64)
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 0199a65..72e77b0 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -91,7 +91,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
     /* FIXME: add code to reset vcpu. */
 }
 
-int kvm_arch_put_registers(CPUState *env)
+int kvm_arch_put_registers(CPUState *env, int level)
 {
     struct kvm_regs regs;
     int ret;
@@ -296,7 +296,6 @@ static int handle_hypercall(CPUState *env, struct kvm_run *run)
 
     cpu_synchronize_state(env);
     r = s390_virtio_hypercall(env);
-    kvm_arch_put_registers(env);
 
     return r;
 }
diff --git a/vl.c b/vl.c
index f7f388f..7996da5 100644
--- a/vl.c
+++ b/vl.c
@@ -3066,6 +3066,33 @@ static void nographic_update(void *opaque)
     qemu_mod_timer(nographic_timer, interval + qemu_get_clock(rt_clock));
 }
 
+void cpu_synchronize_all_states(void)
+{
+    CPUState *cpu;
+
+    for (cpu = first_cpu; cpu; cpu = cpu->next_cpu) {
+        cpu_synchronize_state(cpu);
+    }
+}
+
+void cpu_synchronize_all_post_reset(void)
+{
+    CPUState *cpu;
+
+    for (cpu = first_cpu; cpu; cpu = cpu->next_cpu) {
+        cpu_synchronize_post_reset(cpu);
+    }
+}
+
+void cpu_synchronize_all_post_init(void)
+{
+    CPUState *cpu;
+
+    for (cpu = first_cpu; cpu; cpu = cpu->next_cpu) {
+        cpu_synchronize_post_init(cpu);
+    }
+}
+
 struct vm_change_state_entry {
     VMChangeStateHandler *cb;
     void *opaque;
@@ -3214,6 +3241,7 @@ void qemu_system_reset(void)
     QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
         re->func(re->opaque);
     }
+    cpu_synchronize_all_post_reset();
 }
 
 void qemu_system_reset_request(void)
@@ -5985,6 +6013,7 @@ int main(int argc, char **argv, char **envp)
     machine->init(ram_size, boot_devices,
                   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
 
+    cpu_synchronize_all_post_init();
 
 #ifndef _WIN32
     /* must be after terminal init, SDL library changes signal handlers */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 03/10] x86: Extend validity of cpu_is_bsp
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 01/10] qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 02/10] qemu-kvm: Rework VCPU state writeback API Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization Jan Kiszka
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

As we hard-wire the BSP to CPU 0 anyway and cpuid_apic_id equals
cpu_index, cpu_is_bsp can also be based on the latter directly. This
will help an early user of it: KVM while initializing mp_state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/pc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 9055866..8b5af35 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -732,7 +732,8 @@ static void pc_init_ne2k_isa(NICInfo *nd)
 
 int cpu_is_bsp(CPUState *env)
 {
-    return env->cpuid_apic_id == 0;
+    /* We hard-wire the BSP to the first CPU. */
+    return env->cpu_index == 0;
 }
 
 CPUState *pc_new_cpu(const char *cpu_model)
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (2 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 03/10] x86: Extend validity of cpu_is_bsp Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 22:44   ` [Qemu-devel] " Marcelo Tosatti
  2010-02-25 17:20   ` [Qemu-devel] [PATCH v4 " Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state Jan Kiszka
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Push mpstate reading/writing into kvm_arch_load/save_regs and, on x86,
properly synchronize with halted in the accessor functions.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c             |    7 ---
 qemu-kvm-ia64.c       |    4 +-
 qemu-kvm-x86.c        |  102 ++++++++++++++++++++++++++++++-------------------
 qemu-kvm.c            |   30 --------------
 qemu-kvm.h            |   15 -------
 target-i386/machine.c |    6 ---
 target-ia64/machine.c |    3 +
 7 files changed, 69 insertions(+), 98 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 3e03e10..092c61e 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -507,13 +507,6 @@ void apic_init_reset(CPUState *env)
     s->wait_for_sipi = 1;
 
     env->halted = !(s->apicbase & MSR_IA32_APICBASE_BSP);
-#ifdef KVM_CAP_MP_STATE
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        env->mp_state
-            = env->halted ? KVM_MP_STATE_UNINITIALIZED : KVM_MP_STATE_RUNNABLE;
-        kvm_load_mpstate(env);
-    }
-#endif
 }
 
 static void apic_startup(APICState *s, int vector_num)
diff --git a/qemu-kvm-ia64.c b/qemu-kvm-ia64.c
index fc8110e..39bcbeb 100644
--- a/qemu-kvm-ia64.c
+++ b/qemu-kvm-ia64.c
@@ -124,7 +124,9 @@ void kvm_arch_cpu_reset(CPUState *env)
 {
     if (kvm_irqchip_in_kernel(kvm_context)) {
 #ifdef KVM_CAP_MP_STATE
-	kvm_reset_mpstate(env->kvm_cpu_state.vcpu_ctx);
+        struct kvm_mp_state mp_state = {.mp_state = KVM_MP_STATE_UNINITIALIZED
+        };
+        kvm_set_mpstate(env, &mp_state);
 #endif
     } else {
 	env->interrupt_request &= ~CPU_INTERRUPT_HARD;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 47667b3..4e6ae70 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -754,6 +754,56 @@ static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env)
         return 0;
 }
 
+static void kvm_arch_save_mpstate(CPUState *env)
+{
+#ifdef KVM_CAP_MP_STATE
+    int r;
+    struct kvm_mp_state mp_state;
+
+    r = kvm_get_mpstate(env, &mp_state);
+    if (r < 0) {
+        env->mp_state = -1;
+    } else {
+        env->mp_state = mp_state.mp_state;
+        if (kvm_irqchip_in_kernel()) {
+            env->halted = (env->mp_state == KVM_MP_STATE_HALTED);
+        }
+    }
+#else
+    env->mp_state = -1;
+#endif
+}
+
+static void kvm_arch_load_mpstate(CPUState *env)
+{
+#ifdef KVM_CAP_MP_STATE
+    struct kvm_mp_state mp_state;
+
+    /*
+     * -1 indicates that the host did not support GET_MP_STATE ioctl,
+     *  so don't touch it.
+     */
+    if (env->mp_state != -1) {
+        mp_state.mp_state = env->mp_state;
+        kvm_set_mpstate(env, &mp_state);
+    }
+#endif
+}
+
+static void kvm_reset_mpstate(CPUState *env)
+{
+#ifdef KVM_CAP_MP_STATE
+    if (kvm_check_extension(kvm_state, KVM_CAP_MP_STATE)) {
+        if (kvm_irqchip_in_kernel()) {
+            env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE :
+                                              KVM_MP_STATE_UNINITIALIZED;
+        } else {
+            env->mp_state = KVM_MP_STATE_RUNNABLE;
+        }
+    }
+#endif
+}
+
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
 {
     lhs->selector = rhs->selector;
@@ -922,6 +972,14 @@ void kvm_arch_load_regs(CPUState *env, int level)
     if (rc == -1)
         perror("kvm_set_msrs FAILED");
 
+    if (level >= KVM_PUT_RESET_STATE) {
+        kvm_arch_load_mpstate(env);
+    }
+    if (kvm_irqchip_in_kernel()) {
+        /* Avoid deadlock: no user space IRQ will ever clear it. */
+        env->halted = 0;
+    }
+
     /* must be last */
     kvm_guest_debug_workarounds(env);
 }
@@ -938,36 +996,6 @@ void kvm_load_tsc(CPUState *env)
         perror("kvm_set_tsc FAILED.\n");
 }
 
-void kvm_arch_save_mpstate(CPUState *env)
-{
-#ifdef KVM_CAP_MP_STATE
-    int r;
-    struct kvm_mp_state mp_state;
-
-    r = kvm_get_mpstate(env, &mp_state);
-    if (r < 0)
-        env->mp_state = -1;
-    else
-        env->mp_state = mp_state.mp_state;
-#else
-    env->mp_state = -1;
-#endif
-}
-
-void kvm_arch_load_mpstate(CPUState *env)
-{
-#ifdef KVM_CAP_MP_STATE
-    struct kvm_mp_state mp_state = { .mp_state = env->mp_state };
-
-    /*
-     * -1 indicates that the host did not support GET_MP_STATE ioctl,
-     *  so don't touch it.
-     */
-    if (env->mp_state != -1)
-        kvm_set_mpstate(env, &mp_state);
-#endif
-}
-
 void kvm_arch_save_regs(CPUState *env)
 {
     struct kvm_regs regs;
@@ -1290,6 +1318,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 #ifdef KVM_EXIT_TPR_ACCESS
     kvm_tpr_vcpu_start(cenv);
 #endif
+    kvm_reset_mpstate(cenv);
     return 0;
 }
 
@@ -1363,15 +1392,10 @@ void kvm_arch_cpu_reset(CPUState *env)
 {
     kvm_arch_reset_vcpu(env);
     kvm_put_vcpu_events(env);
-    if (!cpu_is_bsp(env)) {
-	if (kvm_irqchip_in_kernel()) {
-#ifdef KVM_CAP_MP_STATE
-	    kvm_reset_mpstate(env);
-#endif
-	} else {
-	    env->interrupt_request &= ~CPU_INTERRUPT_HARD;
-	    env->halted = 1;
-	}
+    kvm_reset_mpstate(env);
+    if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
+        env->interrupt_request &= ~CPU_INTERRUPT_HARD;
+        env->halted = 1;
     }
 }
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index f6d3e4d..103e0d9 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1590,36 +1590,6 @@ void kvm_update_interrupt_request(CPUState *env)
     }
 }
 
-static void kvm_do_load_mpstate(void *_env)
-{
-    CPUState *env = _env;
-
-    kvm_arch_load_mpstate(env);
-}
-
-void kvm_load_mpstate(CPUState *env)
-{
-    if (kvm_enabled() && qemu_system_ready && kvm_vcpu_inited(env))
-        on_vcpu(env, kvm_do_load_mpstate, env);
-}
-
-static void kvm_do_save_mpstate(void *_env)
-{
-    CPUState *env = _env;
-
-    kvm_arch_save_mpstate(env);
-#ifdef KVM_CAP_MP_STATE
-    if (kvm_irqchip_in_kernel())
-        env->halted = (env->mp_state == KVM_MP_STATE_HALTED);
-#endif
-}
-
-void kvm_save_mpstate(CPUState *env)
-{
-    if (kvm_enabled())
-        on_vcpu(env, kvm_do_save_mpstate, env);
-}
-
 int kvm_cpu_exec(CPUState *env)
 {
     int r;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index fab930d..c6ff652 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -299,16 +299,6 @@ int kvm_get_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
  *
  */
 int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
-/*!
- *  * \brief Reset VCPU MP state
- *
- */
-static inline int kvm_reset_mpstate(CPUState *env)
-{
-    struct kvm_mp_state mp_state = {.mp_state = KVM_MP_STATE_UNINITIALIZED
-    };
-    return kvm_set_mpstate(env, &mp_state);
-}
 #endif
 
 /*!
@@ -874,8 +864,6 @@ static inline void kvm_inject_x86_mce(CPUState *cenv, int bank,
 int kvm_main_loop(void);
 int kvm_init_ap(void);
 int kvm_vcpu_inited(CPUState *env);
-void kvm_load_mpstate(CPUState *env);
-void kvm_save_mpstate(CPUState *env);
 void kvm_apic_init(CPUState *env);
 /* called from vcpu initialization */
 void qemu_kvm_load_lapic(CPUState *env);
@@ -909,8 +897,6 @@ int kvm_arch_qemu_create_context(void);
 
 void kvm_arch_save_regs(CPUState *env);
 void kvm_arch_load_regs(CPUState *env, int level);
-void kvm_arch_load_mpstate(CPUState *env);
-void kvm_arch_save_mpstate(CPUState *env);
 int kvm_arch_has_work(CPUState *env);
 void kvm_arch_process_irqchip_events(CPUState *env);
 int kvm_arch_try_push_interrupts(void *opaque);
@@ -979,7 +965,6 @@ void kvm_load_tsc(CPUState *env);
 #ifdef TARGET_I386
 #define qemu_kvm_has_pit_state2() (0)
 #endif
-#define kvm_save_mpstate(env)   do {} while(0)
 #define qemu_kvm_cpu_stop(env) do {} while(0)
 static inline void kvm_load_tsc(CPUState *env)
 {
diff --git a/target-i386/machine.c b/target-i386/machine.c
index bebde82..61e6a87 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -323,7 +323,6 @@ static void cpu_pre_save(void *opaque)
     int i;
 
     if (kvm_enabled()) {
-        kvm_save_mpstate(env);
         kvm_get_vcpu_events(env);
     }
 
@@ -362,12 +361,7 @@ static int cpu_post_load(void *opaque, int version_id)
     tlb_flush(env, 1);
 
     if (kvm_enabled()) {
-        /* when in-kernel irqchip is used, env->halted causes deadlock
-           because no userspace IRQs will ever clear this flag */
-        env->halted = 0;
-
         kvm_load_tsc(env);
-        kvm_load_mpstate(env);
         kvm_put_vcpu_events(env);
     }
 
diff --git a/target-ia64/machine.c b/target-ia64/machine.c
index fdbeeef..8cf5bdd 100644
--- a/target-ia64/machine.c
+++ b/target-ia64/machine.c
@@ -4,6 +4,9 @@
 #include "exec-all.h"
 #include "qemu-kvm.h"
 
+void kvm_arch_save_mpstate(CPUState *env);
+void kvm_arch_load_mpstate(CPUState *env);
+
 void cpu_save(QEMUFile *f, void *opaque)
 {
     CPUState *env = opaque;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (3 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 22:59   ` [Qemu-devel] " Marcelo Tosatti
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 06/10] qemu-kvm: Use VCPU event state for reset and vmsave/load Jan Kiszka
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Do not write nmi_pending, sipi_vector, and mpstate unless we at least go
through a reset. And TSC as well as KVM wallclocks should only be
written on full sync, otherwise we risk to drop some time on during
state read-modify-write.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm.h                 |    2 +-
 qemu-kvm-x86.c        |    2 +-
 target-i386/kvm.c     |   32 ++++++++++++++++++++------------
 target-i386/machine.c |    2 +-
 4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/kvm.h b/kvm.h
index 3ec5b59..3ee307d 100644
--- a/kvm.h
+++ b/kvm.h
@@ -44,7 +44,7 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
-int kvm_put_vcpu_events(CPUState *env);
+int kvm_put_vcpu_events(CPUState *env, int level);
 int kvm_get_vcpu_events(CPUState *env);
 
 void kvm_cpu_register_phys_memory_client(void);
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 4e6ae70..b0f9670 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1391,7 +1391,7 @@ void kvm_arch_push_nmi(void *opaque)
 void kvm_arch_cpu_reset(CPUState *env)
 {
     kvm_arch_reset_vcpu(env);
-    kvm_put_vcpu_events(env);
+    kvm_put_vcpu_events(env, KVM_PUT_RESET_STATE);
     kvm_reset_mpstate(env);
     if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
         env->interrupt_request &= ~CPU_INTERRUPT_HARD;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5f0829b..f1f44d3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -541,7 +541,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
     entry->data = value;
 }
 
-static int kvm_put_msrs(CPUState *env)
+static int kvm_put_msrs(CPUState *env, int level)
 {
     struct {
         struct kvm_msrs info;
@@ -555,7 +555,6 @@ static int kvm_put_msrs(CPUState *env)
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
     if (kvm_has_msr_star(env))
 	kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
-    kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
     kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
 #ifdef TARGET_X86_64
     /* FIXME if lm capable */
@@ -564,8 +563,12 @@ static int kvm_put_msrs(CPUState *env)
     kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask);
     kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
 #endif
-    kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
-    kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
+    if (level == KVM_PUT_FULL_STATE) {
+        kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
+                          env->system_time_msr);
+        kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+    }
 
     msr_data.info.nmsrs = n;
 
@@ -783,7 +786,7 @@ static int kvm_get_mp_state(CPUState *env)
 }
 #endif
 
-int kvm_put_vcpu_events(CPUState *env)
+int kvm_put_vcpu_events(CPUState *env, int level)
 {
 #ifdef KVM_CAP_VCPU_EVENTS
     struct kvm_vcpu_events events;
@@ -807,8 +810,11 @@ int kvm_put_vcpu_events(CPUState *env)
 
     events.sipi_vector = env->sipi_vector;
 
-    events.flags =
-        KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
+    events.flags = 0;
+    if (level >= KVM_PUT_RESET_STATE) {
+        events.flags |=
+            KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
+    }
 
     return kvm_vcpu_ioctl(env, KVM_SET_VCPU_EVENTS, &events);
 #else
@@ -901,15 +907,17 @@ int kvm_arch_put_registers(CPUState *env, int level)
     if (ret < 0)
         return ret;
 
-    ret = kvm_put_msrs(env);
+    ret = kvm_put_msrs(env, level);
     if (ret < 0)
         return ret;
 
-    ret = kvm_put_mp_state(env);
-    if (ret < 0)
-        return ret;
+    if (level >= KVM_PUT_RESET_STATE) {
+        ret = kvm_put_mp_state(env);
+        if (ret < 0)
+            return ret;
+    }
 
-    ret = kvm_put_vcpu_events(env);
+    ret = kvm_put_vcpu_events(env, level);
     if (ret < 0)
         return ret;
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 61e6a87..6fca559 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -362,7 +362,7 @@ static int cpu_post_load(void *opaque, int version_id)
 
     if (kvm_enabled()) {
         kvm_load_tsc(env);
-        kvm_put_vcpu_events(env);
+        kvm_put_vcpu_events(env, KVM_PUT_FULL_STATE);
     }
 
     return 0;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 06/10] qemu-kvm: Use VCPU event state for reset and vmsave/load
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (4 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback Jan Kiszka
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Push reading/writing of vcpu_events into kvm_arch_load/save_regs to
avoid KVM-specific hooks in generic code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm.h                 |    2 --
 qemu-kvm-x86.c        |    6 ++++--
 target-i386/kvm.c     |    4 ++--
 target-i386/machine.c |    6 ------
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/kvm.h b/kvm.h
index 3ee307d..b3c3162 100644
--- a/kvm.h
+++ b/kvm.h
@@ -44,8 +44,6 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
-int kvm_put_vcpu_events(CPUState *env, int level);
-int kvm_get_vcpu_events(CPUState *env);
 
 void kvm_cpu_register_phys_memory_client(void);
 
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index b0f9670..840c1c9 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -980,6 +980,8 @@ void kvm_arch_load_regs(CPUState *env, int level)
         env->halted = 0;
     }
 
+    kvm_put_vcpu_events(env, level);
+
     /* must be last */
     kvm_guest_debug_workarounds(env);
 }
@@ -1153,6 +1155,7 @@ void kvm_arch_save_regs(CPUState *env)
         }
     }
     kvm_arch_save_mpstate(env);
+    kvm_get_vcpu_events(env);
 }
 
 static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function,
@@ -1224,7 +1227,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 
     qemu_kvm_load_lapic(cenv);
 
-    cenv->interrupt_injected = -1;
+    kvm_arch_reset_vcpu(cenv);
 
 #ifdef KVM_CPUID_SIGNATURE
     /* Paravirtualization CPUIDs */
@@ -1391,7 +1394,6 @@ void kvm_arch_push_nmi(void *opaque)
 void kvm_arch_cpu_reset(CPUState *env)
 {
     kvm_arch_reset_vcpu(env);
-    kvm_put_vcpu_events(env, KVM_PUT_RESET_STATE);
     kvm_reset_mpstate(env);
     if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
         env->interrupt_request &= ~CPU_INTERRUPT_HARD;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f1f44d3..74993de 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -786,7 +786,7 @@ static int kvm_get_mp_state(CPUState *env)
 }
 #endif
 
-int kvm_put_vcpu_events(CPUState *env, int level)
+static int kvm_put_vcpu_events(CPUState *env, int level)
 {
 #ifdef KVM_CAP_VCPU_EVENTS
     struct kvm_vcpu_events events;
@@ -822,7 +822,7 @@ int kvm_put_vcpu_events(CPUState *env, int level)
 #endif
 }
 
-int kvm_get_vcpu_events(CPUState *env)
+static int kvm_get_vcpu_events(CPUState *env)
 {
 #ifdef KVM_CAP_VCPU_EVENTS
     struct kvm_vcpu_events events;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 6fca559..bcc315b 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -5,7 +5,6 @@
 
 #include "exec-all.h"
 #include "kvm.h"
-#include "qemu-kvm.h"
 
 static const VMStateDescription vmstate_segment = {
     .name = "segment",
@@ -322,10 +321,6 @@ static void cpu_pre_save(void *opaque)
     CPUState *env = opaque;
     int i;
 
-    if (kvm_enabled()) {
-        kvm_get_vcpu_events(env);
-    }
-
     /* FPU */
     env->fpus_vmstate = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11;
     env->fptag_vmstate = 0;
@@ -362,7 +357,6 @@ static int cpu_post_load(void *opaque, int version_id)
 
     if (kvm_enabled()) {
         kvm_load_tsc(env);
-        kvm_put_vcpu_events(env, KVM_PUT_FULL_STATE);
     }
 
     return 0;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (5 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 06/10] qemu-kvm: Use VCPU event state for reset and vmsave/load Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 23:17   ` [Qemu-devel] " Marcelo Tosatti
  2010-02-25 15:56   ` [Qemu-devel] Re: [PATCH v4 " Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 08/10] qemu-kvm: Clean up KVM's APIC hooks Jan Kiszka
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Drop kvm_load_tsc in favor of level-dependent writeback in
kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
should therefore only be written back on full sync.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm-x86.c        |   19 +++++--------------
 qemu-kvm.h            |    4 ----
 target-i386/machine.c |    5 -----
 3 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 840c1c9..84fd7fa 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
         set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
     }
 #endif
-    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
-    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
+    if (level == KVM_PUT_FULL_STATE) {
+        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
+        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+    }
 
     rc = kvm_set_msrs(env, msrs, n);
     if (rc == -1)
@@ -986,18 +989,6 @@ void kvm_arch_load_regs(CPUState *env, int level)
     kvm_guest_debug_workarounds(env);
 }
 
-void kvm_load_tsc(CPUState *env)
-{
-    int rc;
-    struct kvm_msr_entry msr;
-
-    set_msr_entry(&msr, MSR_IA32_TSC, env->tsc);
-
-    rc = kvm_set_msrs(env, &msr, 1);
-    if (rc == -1)
-        perror("kvm_set_tsc FAILED.\n");
-}
-
 void kvm_arch_save_regs(CPUState *env)
 {
     struct kvm_regs regs;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index c6ff652..827cac5 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -958,7 +958,6 @@ int handle_tpr_access(void *opaque, CPUState *env, uint64_t rip,
 #ifdef TARGET_I386
 #define qemu_kvm_has_pit_state2() kvm_has_pit_state2(kvm_context)
 #endif
-void kvm_load_tsc(CPUState *env);
 #else
 #define kvm_nested 0
 #define qemu_kvm_has_gsi_routing() (0)
@@ -966,9 +965,6 @@ void kvm_load_tsc(CPUState *env);
 #define qemu_kvm_has_pit_state2() (0)
 #endif
 #define qemu_kvm_cpu_stop(env) do {} while(0)
-static inline void kvm_load_tsc(CPUState *env)
-{
-}
 #endif
 
 void kvm_mutex_unlock(void);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index bcc315b..b547e2a 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -354,11 +354,6 @@ static int cpu_post_load(void *opaque, int version_id)
         hw_breakpoint_insert(env, i);
 
     tlb_flush(env, 1);
-
-    if (kvm_enabled()) {
-        kvm_load_tsc(env);
-    }
-
     return 0;
 }
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 08/10] qemu-kvm: Clean up KVM's APIC hooks
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (6 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 09/10] qemu-kvm: Move kvm_set_boot_cpu_id Jan Kiszka
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

The APIC is part of the VCPU state, so trigger its readout and writeback
from kvm_arch_save/load_regs. Thanks to the transparent sync on reset
and vmsave/load, we can also drop explicit sync code, reducing the diff
to upstream.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c      |   37 +++++--------------------------------
 qemu-kvm-x86.c |    4 ++--
 qemu-kvm.h     |    5 ++---
 3 files changed, 9 insertions(+), 37 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 092c61e..d8c4f7c 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -24,8 +24,6 @@
 #include "host-utils.h"
 #include "kvm.h"
 
-#include "qemu-kvm.h"
-
 //#define DEBUG_APIC
 
 /* APIC Local Vector Table */
@@ -951,36 +949,22 @@ static void kvm_kernel_lapic_load_from_user(APICState *s)
 
 #endif
 
-void qemu_kvm_load_lapic(CPUState *env)
+void kvm_load_lapic(CPUState *env)
 {
 #ifdef KVM_CAP_IRQCHIP
-    if (kvm_enabled() && kvm_vcpu_inited(env) && kvm_irqchip_in_kernel()) {
-        kvm_kernel_lapic_load_from_user(env->apic_state);
-    }
-#endif
-}
-
-static void apic_pre_save(void *opaque)
-{
-#ifdef KVM_CAP_IRQCHIP
-    APICState *s = (void *)opaque;
-
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_kernel_lapic_save_to_user(s);
+        kvm_kernel_lapic_load_from_user(env->apic_state);
     }
 #endif
 }
 
-static int apic_post_load(void *opaque, int version_id)
+void kvm_save_lapic(CPUState *env)
 {
 #ifdef KVM_CAP_IRQCHIP
-    APICState *s = opaque;
-
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_kernel_lapic_load_from_user(s);
+        kvm_kernel_lapic_save_to_user(env->apic_state);
     }
 #endif
-    return 0;
 }
 
 /* This function is only used for old state version 1 and 2 */
@@ -1019,9 +1003,6 @@ static int apic_load_old(QEMUFile *f, void *opaque, int version_id)
 
     if (version_id >= 2)
         qemu_get_timer(f, s->timer);
-
-    qemu_kvm_load_lapic(s->cpu_env);
-
     return 0;
 }
 
@@ -1052,9 +1033,7 @@ static const VMStateDescription vmstate_apic = {
         VMSTATE_INT64(next_time, APICState),
         VMSTATE_TIMER(timer, APICState),
         VMSTATE_END_OF_LIST()
-    },
-    .pre_save = apic_pre_save,
-    .post_load = apic_post_load,
+    }
 };
 
 static void apic_reset(void *opaque)
@@ -1077,7 +1056,6 @@ static void apic_reset(void *opaque)
          */
         s->lvt[APIC_LVT_LINT0] = 0x700;
     }
-    qemu_kvm_load_lapic(s->cpu_env);
 }
 
 static CPUReadMemoryFunc * const apic_mem_read[3] = {
@@ -1121,11 +1099,6 @@ int apic_init(CPUState *env)
     vmstate_register(s->idx, &vmstate_apic, s);
     qemu_register_reset(apic_reset, s);
 
-    /* apic_reset must be called before the vcpu threads are initialized and load
-     * registers, in qemu-kvm.
-     */
-    apic_reset(s);
-
     local_apics[s->idx] = s;
     return 0;
 }
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 84fd7fa..1ac5dbf 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -977,6 +977,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 
     if (level >= KVM_PUT_RESET_STATE) {
         kvm_arch_load_mpstate(env);
+        kvm_load_lapic(env);
     }
     if (kvm_irqchip_in_kernel()) {
         /* Avoid deadlock: no user space IRQ will ever clear it. */
@@ -1146,6 +1147,7 @@ void kvm_arch_save_regs(CPUState *env)
         }
     }
     kvm_arch_save_mpstate(env);
+    kvm_save_lapic(env);
     kvm_get_vcpu_events(env);
 }
 
@@ -1216,8 +1218,6 @@ int kvm_arch_init_vcpu(CPUState *cenv)
     CPUState copy;
     uint32_t i, j, limit;
 
-    qemu_kvm_load_lapic(cenv);
-
     kvm_arch_reset_vcpu(cenv);
 
 #ifdef KVM_CPUID_SIGNATURE
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 827cac5..0965152 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -864,9 +864,8 @@ static inline void kvm_inject_x86_mce(CPUState *cenv, int bank,
 int kvm_main_loop(void);
 int kvm_init_ap(void);
 int kvm_vcpu_inited(CPUState *env);
-void kvm_apic_init(CPUState *env);
-/* called from vcpu initialization */
-void qemu_kvm_load_lapic(CPUState *env);
+void kvm_save_lapic(CPUState *env);
+void kvm_load_lapic(CPUState *env);
 
 void kvm_hpet_enable_kpit(void);
 void kvm_hpet_disable_kpit(void);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 09/10] qemu-kvm: Move kvm_set_boot_cpu_id
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (7 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 08/10] qemu-kvm: Clean up KVM's APIC hooks Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 10/10] qemu-kvm: Bring qemu_init_vcpu back home Jan Kiszka
  2010-02-24 23:26 ` [Qemu-devel] Re: [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Marcelo Tosatti
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Setting the boot CPU ID is arch-specific KVM stuff. So push it where it
belongs to.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/pc.c        |    3 ---
 qemu-kvm-x86.c |    3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 8b5af35..971ae70 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -804,9 +804,6 @@ static void pc_init1(ram_addr_t ram_size,
 #endif
     }
 
-    if (kvm_enabled()) {
-        kvm_set_boot_cpu_id(0);
-    }
     for (i = 0; i < smp_cpus; i++) {
         env = pc_new_cpu(cpu_model);
     }
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 1ac5dbf..b0592ed 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -695,7 +695,8 @@ int kvm_arch_qemu_create_context(void)
     if (kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK))
         vmstate_register(0, &vmstate_kvmclock, &kvmclock_data);
 #endif
-    return 0;
+
+    return kvm_set_boot_cpu_id(0);
 }
 
 static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v3 10/10] qemu-kvm: Bring qemu_init_vcpu back home
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (8 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 09/10] qemu-kvm: Move kvm_set_boot_cpu_id Jan Kiszka
@ 2010-02-24 14:17 ` Jan Kiszka
  2010-02-24 23:26 ` [Qemu-devel] Re: [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Marcelo Tosatti
  10 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 14:17 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

There is no need for the this hack anymore, initialization is now robust
against reordering as it doesn't try to write the VCPU state on its own.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/pc.c              |    5 -----
 target-i386/helper.c |    2 ++
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 971ae70..a83f96e 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -752,11 +752,6 @@ CPUState *pc_new_cpu(const char *cpu_model)
     } else {
         qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
     }
-
-    /* kvm needs this to run after the apic is initialized. Otherwise,
-     * it can access invalid state and crash.
-     */
-    qemu_init_vcpu(env);
     return env;
 }
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index f9d63f6..f83e8cc 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1953,6 +1953,8 @@ CPUX86State *cpu_x86_init(const char *cpu_model)
     }
     mce_init(env);
 
+    qemu_init_vcpu(env);
+
     return env;
 }
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization Jan Kiszka
@ 2010-02-24 22:44   ` Marcelo Tosatti
  2010-02-25  0:02     ` Jan Kiszka
  2010-02-25 17:20   ` [Qemu-devel] [PATCH v4 " Jan Kiszka
  1 sibling, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-24 22:44 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

On Wed, Feb 24, 2010 at 03:17:52PM +0100, Jan Kiszka wrote:
> Push mpstate reading/writing into kvm_arch_load/save_regs and, on x86,
> properly synchronize with halted in the accessor functions.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

> @@ -1290,6 +1318,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
>  #ifdef KVM_EXIT_TPR_ACCESS
>      kvm_tpr_vcpu_start(cenv);
>  #endif
> +    kvm_reset_mpstate(cenv);
>      return 0;
>  }
>  
> @@ -1363,15 +1392,10 @@ void kvm_arch_cpu_reset(CPUState *env)
>  {
>      kvm_arch_reset_vcpu(env);
>      kvm_put_vcpu_events(env);
> -    if (!cpu_is_bsp(env)) {
> -	if (kvm_irqchip_in_kernel()) {
> -#ifdef KVM_CAP_MP_STATE
> -	    kvm_reset_mpstate(env);
> -#endif
> -	} else {
> -	    env->interrupt_request &= ~CPU_INTERRUPT_HARD;
> -	    env->halted = 1;
> -	}
> +    kvm_reset_mpstate(env);
> +    if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
> +        env->interrupt_request &= ~CPU_INTERRUPT_HARD;
> +        env->halted = 1;
>      }
>  }

Why are these two needed? Now that initialization of mp_state 
happens via synchronize_state(init/reset) -> arch_load_regs?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state Jan Kiszka
@ 2010-02-24 22:59   ` Marcelo Tosatti
  2010-02-24 23:51     ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-24 22:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

On Wed, Feb 24, 2010 at 03:17:53PM +0100, Jan Kiszka wrote:
> Do not write nmi_pending, sipi_vector, and mpstate unless we at least go
> through a reset. And TSC as well as KVM wallclocks should only be
> written on full sync, otherwise we risk to drop some time on during
> state read-modify-write.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  kvm.h                 |    2 +-
>  qemu-kvm-x86.c        |    2 +-
>  target-i386/kvm.c     |   32 ++++++++++++++++++++------------
>  target-i386/machine.c |    2 +-
>  4 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/kvm.h b/kvm.h
> index 3ec5b59..3ee307d 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -44,7 +44,7 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);
>  int kvm_has_sync_mmu(void);
>  int kvm_has_vcpu_events(void);
>  int kvm_has_robust_singlestep(void);
> -int kvm_put_vcpu_events(CPUState *env);
> +int kvm_put_vcpu_events(CPUState *env, int level);
>  int kvm_get_vcpu_events(CPUState *env);
>  
>  void kvm_cpu_register_phys_memory_client(void);
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 4e6ae70..b0f9670 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -1391,7 +1391,7 @@ void kvm_arch_push_nmi(void *opaque)
>  void kvm_arch_cpu_reset(CPUState *env)
>  {
>      kvm_arch_reset_vcpu(env);
> -    kvm_put_vcpu_events(env);
> +    kvm_put_vcpu_events(env, KVM_PUT_RESET_STATE);
>      kvm_reset_mpstate(env);
>      if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
>          env->interrupt_request &= ~CPU_INTERRUPT_HARD;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 5f0829b..f1f44d3 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -541,7 +541,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
>      entry->data = value;
>  }
>  
> -static int kvm_put_msrs(CPUState *env)
> +static int kvm_put_msrs(CPUState *env, int level)
>  {
>      struct {
>          struct kvm_msrs info;
> @@ -555,7 +555,6 @@ static int kvm_put_msrs(CPUState *env)
>      kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
>      if (kvm_has_msr_star(env))
>  	kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
> -    kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>      kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>  #ifdef TARGET_X86_64
>      /* FIXME if lm capable */
> @@ -564,8 +563,12 @@ static int kvm_put_msrs(CPUState *env)
>      kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask);
>      kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
>  #endif
> -    kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> -    kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> +    if (level == KVM_PUT_FULL_STATE) {
> +        kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
> +        kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
> +                          env->system_time_msr);
> +        kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> +    }
>  
>      msr_data.info.nmsrs = n;
>  
> @@ -783,7 +786,7 @@ static int kvm_get_mp_state(CPUState *env)
>  }
>  #endif
>  
> -int kvm_put_vcpu_events(CPUState *env)
> +int kvm_put_vcpu_events(CPUState *env, int level)
>  {
>  #ifdef KVM_CAP_VCPU_EVENTS
>      struct kvm_vcpu_events events;
> @@ -807,8 +810,11 @@ int kvm_put_vcpu_events(CPUState *env)
>  
>      events.sipi_vector = env->sipi_vector;
>  
> -    events.flags =
> -        KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
> +    events.flags = 0;
> +    if (level >= KVM_PUT_RESET_STATE) {
> +        events.flags |=
> +            KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
> +    }
>  
>      return kvm_vcpu_ioctl(env, KVM_SET_VCPU_EVENTS, &events);

What is the reason for write-back of any vcpu-event state for RUNTIME 
case again?

The debug workaround?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback Jan Kiszka
@ 2010-02-24 23:17   ` Marcelo Tosatti
  2010-02-24 23:45     ` Jan Kiszka
  2010-02-25 15:56   ` [Qemu-devel] Re: [PATCH v4 " Jan Kiszka
  1 sibling, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-24 23:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> Drop kvm_load_tsc in favor of level-dependent writeback in
> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> should therefore only be written back on full sync.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  qemu-kvm-x86.c        |   19 +++++--------------
>  qemu-kvm.h            |    4 ----
>  target-i386/machine.c |    5 -----
>  3 files changed, 5 insertions(+), 23 deletions(-)
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 840c1c9..84fd7fa 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>      }
>  #endif
> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> +    if (level == KVM_PUT_FULL_STATE) {
> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> +    }

As things stand today, the TSC should only be written on migration. See
53f658b3c33616a4997ee254311b335e59063289 in the kernel.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code
  2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
                   ` (9 preceding siblings ...)
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 10/10] qemu-kvm: Bring qemu_init_vcpu back home Jan Kiszka
@ 2010-02-24 23:26 ` Marcelo Tosatti
  2010-02-24 23:55   ` Jan Kiszka
  10 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-24 23:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

On Wed, Feb 24, 2010 at 03:17:48PM +0100, Jan Kiszka wrote:
> Let's proceed with cleaning up the VCPU state writeback. The differences
> to v2 are:
>  - included guest debugging update patch and rebased on top of it
>  - renamed KVM_PUT_ASYNC_STATE->KVM_PUT_RUNTIME_STATE and added comments
>  - fixed mp_state corruption due to early use of cpu_is_bsp
> 
> Some patches target uq/master (fully or partially), but I will post them
> separately once these bits are acceptible.
> 
> Pull URL is
> 
> 	git://git.kiszka.org/qemu-kvm.git queues/queues-kvm-merge
> 
> Jan Kiszka (10):
>   qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness
>   qemu-kvm: Rework VCPU state writeback API
>   x86: Extend validity of cpu_is_bsp
>   qemu-kvm: Clean up mpstate synchronization
>   KVM: x86: Restrict writeback of VCPU state
>   qemu-kvm: Use VCPU event state for reset and vmsave/load
>   qemu-kvm: Cleanup/fix TSC and PV clock writeback
>   qemu-kvm: Clean up KVM's APIC hooks
>   qemu-kvm: Move kvm_set_boot_cpu_id
>   qemu-kvm: Bring qemu_init_vcpu back home

Looks fine in general. Do you plan to submit the upstream parts 
on a separate patchset?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 23:17   ` [Qemu-devel] " Marcelo Tosatti
@ 2010-02-24 23:45     ` Jan Kiszka
  2010-02-24 23:49       ` Marcelo Tosatti
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 23:45 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

Marcelo Tosatti wrote:
> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
>> Drop kvm_load_tsc in favor of level-dependent writeback in
>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
>> should therefore only be written back on full sync.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  qemu-kvm-x86.c        |   19 +++++--------------
>>  qemu-kvm.h            |    4 ----
>>  target-i386/machine.c |    5 -----
>>  3 files changed, 5 insertions(+), 23 deletions(-)
>>
>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>> index 840c1c9..84fd7fa 100644
>> --- a/qemu-kvm-x86.c
>> +++ b/qemu-kvm-x86.c
>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>>      }
>>  #endif
>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>> +    if (level == KVM_PUT_FULL_STATE) {
>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>> +    }
> 
> As things stand today, the TSC should only be written on migration. See
> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.

Migration and power-up - that's what this patch ensures (=>
KVM_PUT_FULL_STATE). Or where do you see any problem?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 23:45     ` Jan Kiszka
@ 2010-02-24 23:49       ` Marcelo Tosatti
  2010-02-24 23:58         ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-24 23:49 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> >> Drop kvm_load_tsc in favor of level-dependent writeback in
> >> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> >> should therefore only be written back on full sync.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >>  qemu-kvm-x86.c        |   19 +++++--------------
> >>  qemu-kvm.h            |    4 ----
> >>  target-i386/machine.c |    5 -----
> >>  3 files changed, 5 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >> index 840c1c9..84fd7fa 100644
> >> --- a/qemu-kvm-x86.c
> >> +++ b/qemu-kvm-x86.c
> >> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
> >>      }
> >>  #endif
> >> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> >> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> >> +    if (level == KVM_PUT_FULL_STATE) {
> >> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >> +    }
> > 
> > As things stand today, the TSC should only be written on migration. See
> > 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
> 
> Migration and power-up - that's what this patch ensures (=>
> KVM_PUT_FULL_STATE). Or where do you see any problem?
> 
> Jan
> 

The problem is it should not write on power up (the kernel attempts
to synchronize the TSCs in that case, see the commit).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state
  2010-02-24 22:59   ` [Qemu-devel] " Marcelo Tosatti
@ 2010-02-24 23:51     ` Jan Kiszka
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 23:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4547 bytes --]

Marcelo Tosatti wrote:
> On Wed, Feb 24, 2010 at 03:17:53PM +0100, Jan Kiszka wrote:
>> Do not write nmi_pending, sipi_vector, and mpstate unless we at least go
>> through a reset. And TSC as well as KVM wallclocks should only be
>> written on full sync, otherwise we risk to drop some time on during
>> state read-modify-write.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  kvm.h                 |    2 +-
>>  qemu-kvm-x86.c        |    2 +-
>>  target-i386/kvm.c     |   32 ++++++++++++++++++++------------
>>  target-i386/machine.c |    2 +-
>>  4 files changed, 23 insertions(+), 15 deletions(-)
>>
>> diff --git a/kvm.h b/kvm.h
>> index 3ec5b59..3ee307d 100644
>> --- a/kvm.h
>> +++ b/kvm.h
>> @@ -44,7 +44,7 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);
>>  int kvm_has_sync_mmu(void);
>>  int kvm_has_vcpu_events(void);
>>  int kvm_has_robust_singlestep(void);
>> -int kvm_put_vcpu_events(CPUState *env);
>> +int kvm_put_vcpu_events(CPUState *env, int level);
>>  int kvm_get_vcpu_events(CPUState *env);
>>  
>>  void kvm_cpu_register_phys_memory_client(void);
>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>> index 4e6ae70..b0f9670 100644
>> --- a/qemu-kvm-x86.c
>> +++ b/qemu-kvm-x86.c
>> @@ -1391,7 +1391,7 @@ void kvm_arch_push_nmi(void *opaque)
>>  void kvm_arch_cpu_reset(CPUState *env)
>>  {
>>      kvm_arch_reset_vcpu(env);
>> -    kvm_put_vcpu_events(env);
>> +    kvm_put_vcpu_events(env, KVM_PUT_RESET_STATE);
>>      kvm_reset_mpstate(env);
>>      if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
>>          env->interrupt_request &= ~CPU_INTERRUPT_HARD;
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 5f0829b..f1f44d3 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -541,7 +541,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
>>      entry->data = value;
>>  }
>>  
>> -static int kvm_put_msrs(CPUState *env)
>> +static int kvm_put_msrs(CPUState *env, int level)
>>  {
>>      struct {
>>          struct kvm_msrs info;
>> @@ -555,7 +555,6 @@ static int kvm_put_msrs(CPUState *env)
>>      kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
>>      if (kvm_has_msr_star(env))
>>  	kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
>> -    kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>>      kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>>  #ifdef TARGET_X86_64
>>      /* FIXME if lm capable */
>> @@ -564,8 +563,12 @@ static int kvm_put_msrs(CPUState *env)
>>      kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask);
>>      kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
>>  #endif
>> -    kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>> -    kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>> +    if (level == KVM_PUT_FULL_STATE) {
>> +        kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> +        kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
>> +                          env->system_time_msr);
>> +        kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>> +    }
>>  
>>      msr_data.info.nmsrs = n;
>>  
>> @@ -783,7 +786,7 @@ static int kvm_get_mp_state(CPUState *env)
>>  }
>>  #endif
>>  
>> -int kvm_put_vcpu_events(CPUState *env)
>> +int kvm_put_vcpu_events(CPUState *env, int level)
>>  {
>>  #ifdef KVM_CAP_VCPU_EVENTS
>>      struct kvm_vcpu_events events;
>> @@ -807,8 +810,11 @@ int kvm_put_vcpu_events(CPUState *env)
>>  
>>      events.sipi_vector = env->sipi_vector;
>>  
>> -    events.flags =
>> -        KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
>> +    events.flags = 0;
>> +    if (level >= KVM_PUT_RESET_STATE) {
>> +        events.flags |=
>> +            KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
>> +    }
>>  
>>      return kvm_vcpu_ioctl(env, KVM_SET_VCPU_EVENTS, &events);
> 
> What is the reason for write-back of any vcpu-event state for RUNTIME 
> case again?
> 
> The debug workaround?

Consistency and maximum flexibility.

I don't want to start fiddling with this again when we start to
manipulate some VCPU runtime state that may not require writeback yet
(workarounds like the guest debugging stuff can be a reason for that).
Instead, we should now establish a clean concept that only knows those
three types and their well-defined writeback points.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code
  2010-02-24 23:26 ` [Qemu-devel] Re: [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Marcelo Tosatti
@ 2010-02-24 23:55   ` Jan Kiszka
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 23:55 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

Marcelo Tosatti wrote:
> On Wed, Feb 24, 2010 at 03:17:48PM +0100, Jan Kiszka wrote:
>> Let's proceed with cleaning up the VCPU state writeback. The differences
>> to v2 are:
>>  - included guest debugging update patch and rebased on top of it
>>  - renamed KVM_PUT_ASYNC_STATE->KVM_PUT_RUNTIME_STATE and added comments
>>  - fixed mp_state corruption due to early use of cpu_is_bsp
>>
>> Some patches target uq/master (fully or partially), but I will post them
>> separately once these bits are acceptible.
>>
>> Pull URL is
>>
>> 	git://git.kiszka.org/qemu-kvm.git queues/queues-kvm-merge
>>
>> Jan Kiszka (10):
>>   qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness
>>   qemu-kvm: Rework VCPU state writeback API
>>   x86: Extend validity of cpu_is_bsp
>>   qemu-kvm: Clean up mpstate synchronization
>>   KVM: x86: Restrict writeback of VCPU state
>>   qemu-kvm: Use VCPU event state for reset and vmsave/load
>>   qemu-kvm: Cleanup/fix TSC and PV clock writeback
>>   qemu-kvm: Clean up KVM's APIC hooks
>>   qemu-kvm: Move kvm_set_boot_cpu_id
>>   qemu-kvm: Bring qemu_init_vcpu back home
> 
> Looks fine in general. Do you plan to submit the upstream parts 
> on a separate patchset?

Yes, will be sent separately once no more issues with this series exist.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 23:49       ` Marcelo Tosatti
@ 2010-02-24 23:58         ` Jan Kiszka
  2010-02-25  3:58           ` Marcelo Tosatti
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-24 23:58 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]

Marcelo Tosatti wrote:
> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
>>>> should therefore only be written back on full sync.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>  qemu-kvm-x86.c        |   19 +++++--------------
>>>>  qemu-kvm.h            |    4 ----
>>>>  target-i386/machine.c |    5 -----
>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>>>> index 840c1c9..84fd7fa 100644
>>>> --- a/qemu-kvm-x86.c
>>>> +++ b/qemu-kvm-x86.c
>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>>>>      }
>>>>  #endif
>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>>>> +    if (level == KVM_PUT_FULL_STATE) {
>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>>> +    }
>>> As things stand today, the TSC should only be written on migration. See
>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
>> Migration and power-up - that's what this patch ensures (=>
>> KVM_PUT_FULL_STATE). Or where do you see any problem?
>>
>> Jan
>>
> 
> The problem is it should not write on power up (the kernel attempts
> to synchronize the TSCs in that case, see the commit).
> 

OK, need to read this more carefully.

I do not yet understand the difference from user space POV: it tries to
transfer the identical TSC values to all currently stopped VCPU threads.
That should not be different if we are booting a fresh VM or loading a
complete state of a migrated image. If it does, it looks like a KVM
kernel deficit on first glance.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization
  2010-02-24 22:44   ` [Qemu-devel] " Marcelo Tosatti
@ 2010-02-25  0:02     ` Jan Kiszka
  2010-02-25 11:56       ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-25  0:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

Marcelo Tosatti wrote:
> On Wed, Feb 24, 2010 at 03:17:52PM +0100, Jan Kiszka wrote:
>> Push mpstate reading/writing into kvm_arch_load/save_regs and, on x86,
>> properly synchronize with halted in the accessor functions.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
>> @@ -1290,6 +1318,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
>>  #ifdef KVM_EXIT_TPR_ACCESS
>>      kvm_tpr_vcpu_start(cenv);
>>  #endif
>> +    kvm_reset_mpstate(cenv);
>>      return 0;
>>  }
>>  
>> @@ -1363,15 +1392,10 @@ void kvm_arch_cpu_reset(CPUState *env)
>>  {
>>      kvm_arch_reset_vcpu(env);
>>      kvm_put_vcpu_events(env);
>> -    if (!cpu_is_bsp(env)) {
>> -	if (kvm_irqchip_in_kernel()) {
>> -#ifdef KVM_CAP_MP_STATE
>> -	    kvm_reset_mpstate(env);
>> -#endif
>> -	} else {
>> -	    env->interrupt_request &= ~CPU_INTERRUPT_HARD;
>> -	    env->halted = 1;
>> -	}
>> +    kvm_reset_mpstate(env);
>> +    if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
>> +        env->interrupt_request &= ~CPU_INTERRUPT_HARD;
>> +        env->halted = 1;
>>      }
>>  }
> 
> Why are these two needed? Now that initialization of mp_state 
> happens via synchronize_state(init/reset) -> arch_load_regs?

Maybe correct. env->halted is also reset on load, not sure about the
interrupt_request reset impact yet. This is (or at least was) hairy
stuff, /me has to sleep about it again.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 23:58         ` Jan Kiszka
@ 2010-02-25  3:58           ` Marcelo Tosatti
  2010-02-25  8:48             ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-25  3:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
> >> Marcelo Tosatti wrote:
> >>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> >>>> Drop kvm_load_tsc in favor of level-dependent writeback in
> >>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> >>>> should therefore only be written back on full sync.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>> ---
> >>>>  qemu-kvm-x86.c        |   19 +++++--------------
> >>>>  qemu-kvm.h            |    4 ----
> >>>>  target-i386/machine.c |    5 -----
> >>>>  3 files changed, 5 insertions(+), 23 deletions(-)
> >>>>
> >>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >>>> index 840c1c9..84fd7fa 100644
> >>>> --- a/qemu-kvm-x86.c
> >>>> +++ b/qemu-kvm-x86.c
> >>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
> >>>>      }
> >>>>  #endif
> >>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> >>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> >>>> +    if (level == KVM_PUT_FULL_STATE) {
> >>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>>> +    }
> >>> As things stand today, the TSC should only be written on migration. See
> >>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
> >> Migration and power-up - that's what this patch ensures (=>
> >> KVM_PUT_FULL_STATE). Or where do you see any problem?
> >>
> >> Jan
> >>
> > 
> > The problem is it should not write on power up (the kernel attempts
> > to synchronize the TSCs in that case, see the commit).
> > 
> 
> OK, need to read this more carefully.
> 
> I do not yet understand the difference from user space POV: it tries to
> transfer the identical TSC values to all currently stopped VCPU threads.

guest tsc = host tsc + offset

So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
For SMP guests, this does not happen exactly at the same time for all
vcpus.

> That should not be different if we are booting a fresh VM or loading a
> complete state of a migrated image. If it does, it looks like a KVM
> kernel deficit on first glance.

Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
Zachary is working on that.

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-25  3:58           ` Marcelo Tosatti
@ 2010-02-25  8:48             ` Jan Kiszka
  2010-02-25 15:07               ` Marcelo Tosatti
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-25  8:48 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3763 bytes --]

Marcelo Tosatti wrote:
> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
>>>> Marcelo Tosatti wrote:
>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
>>>>>> should therefore only be written back on full sync.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> ---
>>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
>>>>>>  qemu-kvm.h            |    4 ----
>>>>>>  target-i386/machine.c |    5 -----
>>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
>>>>>>
>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>>>>>> index 840c1c9..84fd7fa 100644
>>>>>> --- a/qemu-kvm-x86.c
>>>>>> +++ b/qemu-kvm-x86.c
>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>>>>>>      }
>>>>>>  #endif
>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>>>>>> +    if (level == KVM_PUT_FULL_STATE) {
>>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>>>>> +    }
>>>>> As things stand today, the TSC should only be written on migration. See
>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
>>>> Migration and power-up - that's what this patch ensures (=>
>>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
>>>>
>>>> Jan
>>>>
>>> The problem is it should not write on power up (the kernel attempts
>>> to synchronize the TSCs in that case, see the commit).
>>>
>> OK, need to read this more carefully.
>>
>> I do not yet understand the difference from user space POV: it tries to
>> transfer the identical TSC values to all currently stopped VCPU threads.
> 
> guest tsc = host tsc + offset
> 
> So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
> For SMP guests, this does not happen exactly at the same time for all
> vcpus.

Ouch.

> 
>> That should not be different if we are booting a fresh VM or loading a
>> complete state of a migrated image. If it does, it looks like a KVM
>> kernel deficit on first glance.
> 
> Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
> Zachary is working on that.
> 

OK, so we need a workaround, ideally without reintroducing hooks. Is
this one acceptable?


diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 84fd7fa..285c05a 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
     }
 #endif
     if (level == KVM_PUT_FULL_STATE) {
-        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        /*
+         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
+         * writeback. Until this is fixed, we only write the offset to SMP
+         * guests after migration, desynchronizing the VCPUs, but avoiding
+         * huge jump-backs that would occur without any writeback at all.
+         */
+        if (smp_cpus == 1 || env->tsc != 0) {
+            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        }
         set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
         set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
     }


Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization
  2010-02-25  0:02     ` Jan Kiszka
@ 2010-02-25 11:56       ` Jan Kiszka
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-25 11:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Avi Kivity, kvm, qemu-devel

Jan Kiszka wrote:
> Marcelo Tosatti wrote:
>> On Wed, Feb 24, 2010 at 03:17:52PM +0100, Jan Kiszka wrote:
>>> Push mpstate reading/writing into kvm_arch_load/save_regs and, on x86,
>>> properly synchronize with halted in the accessor functions.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> @@ -1290,6 +1318,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
>>>  #ifdef KVM_EXIT_TPR_ACCESS
>>>      kvm_tpr_vcpu_start(cenv);
>>>  #endif
>>> +    kvm_reset_mpstate(cenv);
>>>      return 0;
>>>  }
>>>  
>>> @@ -1363,15 +1392,10 @@ void kvm_arch_cpu_reset(CPUState *env)
>>>  {
>>>      kvm_arch_reset_vcpu(env);
>>>      kvm_put_vcpu_events(env);
>>> -    if (!cpu_is_bsp(env)) {
>>> -	if (kvm_irqchip_in_kernel()) {
>>> -#ifdef KVM_CAP_MP_STATE
>>> -	    kvm_reset_mpstate(env);
>>> -#endif
>>> -	} else {
>>> -	    env->interrupt_request &= ~CPU_INTERRUPT_HARD;
>>> -	    env->halted = 1;
>>> -	}
>>> +    kvm_reset_mpstate(env);
>>> +    if (!cpu_is_bsp(env) && !kvm_irqchip_in_kernel()) {
>>> +        env->interrupt_request &= ~CPU_INTERRUPT_HARD;
>>> +        env->halted = 1;
>>>      }
>>>  }
>> Why are these two needed? Now that initialization of mp_state 
>> happens via synchronize_state(init/reset) -> arch_load_regs?
> 
> Maybe correct. env->halted is also reset on load, not sure about the
> interrupt_request reset impact yet. This is (or at least was) hairy
> stuff, /me has to sleep about it again.

Regarding the !irqchip bits: env->halted is actually already reset in
apic_init_reset and can be dropped. I still wonder why I (was surprised
to find this out) once introduced the interrupt_request reset,
specifically why we may need it with KVM not not with TCG. Maybe we do,
and this should better be addressed generically.

But did you also ask about the need for kvm_reset_mpstate? That function
solely brings env->mp_state into the proper state after reset, something
that does not happen elsewhere.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-25  8:48             ` Jan Kiszka
@ 2010-02-25 15:07               ` Marcelo Tosatti
  2010-02-25 15:17                 ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-25 15:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
> >> Marcelo Tosatti wrote:
> >>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
> >>>> Marcelo Tosatti wrote:
> >>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> >>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
> >>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> >>>>>> should therefore only be written back on full sync.
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>> ---
> >>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
> >>>>>>  qemu-kvm.h            |    4 ----
> >>>>>>  target-i386/machine.c |    5 -----
> >>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
> >>>>>>
> >>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >>>>>> index 840c1c9..84fd7fa 100644
> >>>>>> --- a/qemu-kvm-x86.c
> >>>>>> +++ b/qemu-kvm-x86.c
> >>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
> >>>>>>      }
> >>>>>>  #endif
> >>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> >>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> >>>>>> +    if (level == KVM_PUT_FULL_STATE) {
> >>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>>>>> +    }
> >>>>> As things stand today, the TSC should only be written on migration. See
> >>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
> >>>> Migration and power-up - that's what this patch ensures (=>
> >>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
> >>>>
> >>>> Jan
> >>>>
> >>> The problem is it should not write on power up (the kernel attempts
> >>> to synchronize the TSCs in that case, see the commit).
> >>>
> >> OK, need to read this more carefully.
> >>
> >> I do not yet understand the difference from user space POV: it tries to
> >> transfer the identical TSC values to all currently stopped VCPU threads.
> > 
> > guest tsc = host tsc + offset
> > 
> > So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
> > For SMP guests, this does not happen exactly at the same time for all
> > vcpus.
> 
> Ouch.
> 
> > 
> >> That should not be different if we are booting a fresh VM or loading a
> >> complete state of a migrated image. If it does, it looks like a KVM
> >> kernel deficit on first glance.
> > 
> > Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
> > Zachary is working on that.
> > 
> 
> OK, so we need a workaround, ideally without reintroducing hooks. Is
> this one acceptable?
> 
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 84fd7fa..285c05a 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
>      }
>  #endif
>      if (level == KVM_PUT_FULL_STATE) {
> -        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> +        /*
> +         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
> +         * writeback. Until this is fixed, we only write the offset to SMP
> +         * guests after migration, desynchronizing the VCPUs, but avoiding
> +         * huge jump-backs that would occur without any writeback at all.
> +         */
> +        if (smp_cpus == 1 || env->tsc != 0) {
> +            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> +        }
>          set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>          set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>      }

Well, migration of the SMP TSCs is not precise, but it is better than
zeroing the TSCs on migration. So something like a new state (migration
only), or a hack that mimicks that behaviour, is needed :(

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-25 15:07               ` Marcelo Tosatti
@ 2010-02-25 15:17                 ` Jan Kiszka
  2010-02-25 15:48                   ` Marcelo Tosatti
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-25 15:17 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

Marcelo Tosatti wrote:
> On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
>>>> Marcelo Tosatti wrote:
>>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
>>>>>> Marcelo Tosatti wrote:
>>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
>>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
>>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
>>>>>>>> should therefore only be written back on full sync.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>> ---
>>>>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
>>>>>>>>  qemu-kvm.h            |    4 ----
>>>>>>>>  target-i386/machine.c |    5 -----
>>>>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>>>>>>>> index 840c1c9..84fd7fa 100644
>>>>>>>> --- a/qemu-kvm-x86.c
>>>>>>>> +++ b/qemu-kvm-x86.c
>>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>>>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>>>>>>>>      }
>>>>>>>>  #endif
>>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>>>>>>>> +    if (level == KVM_PUT_FULL_STATE) {
>>>>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>>>>>>> +    }
>>>>>>> As things stand today, the TSC should only be written on migration. See
>>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
>>>>>> Migration and power-up - that's what this patch ensures (=>
>>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
>>>>>>
>>>>>> Jan
>>>>>>
>>>>> The problem is it should not write on power up (the kernel attempts
>>>>> to synchronize the TSCs in that case, see the commit).
>>>>>
>>>> OK, need to read this more carefully.
>>>>
>>>> I do not yet understand the difference from user space POV: it tries to
>>>> transfer the identical TSC values to all currently stopped VCPU threads.
>>> guest tsc = host tsc + offset
>>>
>>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
>>> For SMP guests, this does not happen exactly at the same time for all
>>> vcpus.
>> Ouch.
>>
>>>> That should not be different if we are booting a fresh VM or loading a
>>>> complete state of a migrated image. If it does, it looks like a KVM
>>>> kernel deficit on first glance.
>>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
>>> Zachary is working on that.
>>>
>> OK, so we need a workaround, ideally without reintroducing hooks. Is
>> this one acceptable?
>>
>>
>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>> index 84fd7fa..285c05a 100644
>> --- a/qemu-kvm-x86.c
>> +++ b/qemu-kvm-x86.c
>> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>      }
>>  #endif
>>      if (level == KVM_PUT_FULL_STATE) {
>> -        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> +        /*
>> +         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
>> +         * writeback. Until this is fixed, we only write the offset to SMP
>> +         * guests after migration, desynchronizing the VCPUs, but avoiding
>> +         * huge jump-backs that would occur without any writeback at all.
>> +         */
>> +        if (smp_cpus == 1 || env->tsc != 0) {
>> +            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> +        }
>>          set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>          set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>      }
> 
> Well, migration of the SMP TSCs is not precise, but it is better than
> zeroing the TSCs on migration. So something like a new state (migration
> only), or a hack that mimicks that behaviour, is needed :(

It is not precise, but the above code shouldn't behave differently
compared to the existing one: tcp != 0 => we are loading values from
some VM that ran before, i.e. we are migrating.

If somehow possible, I do not want to introduce a new writeback level
for an x86-only kernel quirk.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-25 15:17                 ` Jan Kiszka
@ 2010-02-25 15:48                   ` Marcelo Tosatti
  0 siblings, 0 replies; 29+ messages in thread
From: Marcelo Tosatti @ 2010-02-25 15:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Zachary Amsden, Avi Kivity, kvm, qemu-devel

On Thu, Feb 25, 2010 at 04:17:22PM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote:
> >> Marcelo Tosatti wrote:
> >>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
> >>>> Marcelo Tosatti wrote:
> >>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
> >>>>>> Marcelo Tosatti wrote:
> >>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> >>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
> >>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> >>>>>>>> should therefore only be written back on full sync.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>>>> ---
> >>>>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
> >>>>>>>>  qemu-kvm.h            |    4 ----
> >>>>>>>>  target-i386/machine.c |    5 -----
> >>>>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >>>>>>>> index 840c1c9..84fd7fa 100644
> >>>>>>>> --- a/qemu-kvm-x86.c
> >>>>>>>> +++ b/qemu-kvm-x86.c
> >>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>>>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
> >>>>>>>>      }
> >>>>>>>>  #endif
> >>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> >>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> >>>>>>>> +    if (level == KVM_PUT_FULL_STATE) {
> >>>>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>>>>>>> +    }
> >>>>>>> As things stand today, the TSC should only be written on migration. See
> >>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
> >>>>>> Migration and power-up - that's what this patch ensures (=>
> >>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>> The problem is it should not write on power up (the kernel attempts
> >>>>> to synchronize the TSCs in that case, see the commit).
> >>>>>
> >>>> OK, need to read this more carefully.
> >>>>
> >>>> I do not yet understand the difference from user space POV: it tries to
> >>>> transfer the identical TSC values to all currently stopped VCPU threads.
> >>> guest tsc = host tsc + offset
> >>>
> >>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
> >>> For SMP guests, this does not happen exactly at the same time for all
> >>> vcpus.
> >> Ouch.
> >>
> >>>> That should not be different if we are booting a fresh VM or loading a
> >>>> complete state of a migrated image. If it does, it looks like a KVM
> >>>> kernel deficit on first glance.
> >>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
> >>> Zachary is working on that.
> >>>
> >> OK, so we need a workaround, ideally without reintroducing hooks. Is
> >> this one acceptable?
> >>
> >>
> >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >> index 84fd7fa..285c05a 100644
> >> --- a/qemu-kvm-x86.c
> >> +++ b/qemu-kvm-x86.c
> >> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>      }
> >>  #endif
> >>      if (level == KVM_PUT_FULL_STATE) {
> >> -        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >> +        /*
> >> +         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
> >> +         * writeback. Until this is fixed, we only write the offset to SMP
> >> +         * guests after migration, desynchronizing the VCPUs, but avoiding
> >> +         * huge jump-backs that would occur without any writeback at all.
> >> +         */
> >> +        if (smp_cpus == 1 || env->tsc != 0) {
> >> +            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >> +        }
> >>          set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>          set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>      }
> > 
> > Well, migration of the SMP TSCs is not precise, but it is better than
> > zeroing the TSCs on migration. So something like a new state (migration
> > only), or a hack that mimicks that behaviour, is needed :(
> 
> It is not precise, but the above code shouldn't behave differently
> compared to the existing one: tcp != 0 => we are loading values from
> some VM that ran before, i.e. we are migrating.
> 
> If somehow possible, I do not want to introduce a new writeback level
> for an x86-only kernel quirk.
> 
> Jan

I can't read. Looks OK.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH v4 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback Jan Kiszka
  2010-02-24 23:17   ` [Qemu-devel] " Marcelo Tosatti
@ 2010-02-25 15:56   ` Jan Kiszka
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-25 15:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Drop kvm_load_tsc in favor of level-dependent writeback in
kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
should therefore only be written back on full sync.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Changes in v4:
 - only write TSC MSRs of SMP guests if they are non-zero, ie. after
   migration (behavior of original code)

 qemu-kvm-x86.c        |   27 +++++++++++++--------------
 qemu-kvm.h            |    4 ----
 target-i386/machine.c |    5 -----
 3 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 840c1c9..285c05a 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -965,8 +965,19 @@ void kvm_arch_load_regs(CPUState *env, int level)
         set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
     }
 #endif
-    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
-    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
+    if (level == KVM_PUT_FULL_STATE) {
+        /*
+         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
+         * writeback. Until this is fixed, we only write the offset to SMP
+         * guests after migration, desynchronizing the VCPUs, but avoiding
+         * huge jump-backs that would occur without any writeback at all.
+         */
+        if (smp_cpus == 1 || env->tsc != 0) {
+            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        }
+        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
+        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+    }
 
     rc = kvm_set_msrs(env, msrs, n);
     if (rc == -1)
@@ -986,18 +997,6 @@ void kvm_arch_load_regs(CPUState *env, int level)
     kvm_guest_debug_workarounds(env);
 }
 
-void kvm_load_tsc(CPUState *env)
-{
-    int rc;
-    struct kvm_msr_entry msr;
-
-    set_msr_entry(&msr, MSR_IA32_TSC, env->tsc);
-
-    rc = kvm_set_msrs(env, &msr, 1);
-    if (rc == -1)
-        perror("kvm_set_tsc FAILED.\n");
-}
-
 void kvm_arch_save_regs(CPUState *env)
 {
     struct kvm_regs regs;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index c6ff652..827cac5 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -958,7 +958,6 @@ int handle_tpr_access(void *opaque, CPUState *env, uint64_t rip,
 #ifdef TARGET_I386
 #define qemu_kvm_has_pit_state2() kvm_has_pit_state2(kvm_context)
 #endif
-void kvm_load_tsc(CPUState *env);
 #else
 #define kvm_nested 0
 #define qemu_kvm_has_gsi_routing() (0)
@@ -966,9 +965,6 @@ void kvm_load_tsc(CPUState *env);
 #define qemu_kvm_has_pit_state2() (0)
 #endif
 #define qemu_kvm_cpu_stop(env) do {} while(0)
-static inline void kvm_load_tsc(CPUState *env)
-{
-}
 #endif
 
 void kvm_mutex_unlock(void);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index bcc315b..b547e2a 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -354,11 +354,6 @@ static int cpu_post_load(void *opaque, int version_id)
         hw_breakpoint_insert(env, i);
 
     tlb_flush(env, 1);
-
-    if (kvm_enabled()) {
-        kvm_load_tsc(env);
-    }
-
     return 0;
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v4 04/10] qemu-kvm: Clean up mpstate synchronization
  2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization Jan Kiszka
  2010-02-24 22:44   ` [Qemu-devel] " Marcelo Tosatti
@ 2010-02-25 17:20   ` Jan Kiszka
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-25 17:20 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Gleb Natapov, qemu-devel, kvm

Push mpstate reading/writing into kvm_arch_load/save_regs and, on x86,
properly synchronize with halted in the accessor functions.

At this chance, drop the special reset of interrupt_request and halted
in kvm_arch_cpu_reset. The former is done via memset in cpu_reset, the
latter in apic_init_reset anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Changes in v4:
 - drop redundant reset code

 hw/apic.c             |    7 ---
 qemu-kvm-ia64.c       |    4 +-
 qemu-kvm-x86.c        |  100 +++++++++++++++++++++++++++++--------------------
 qemu-kvm.c            |   30 ---------------
 qemu-kvm.h            |   15 -------
 target-i386/machine.c |    6 ---
 target-ia64/machine.c |    3 +
 7 files changed, 66 insertions(+), 99 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 3e03e10..092c61e 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -507,13 +507,6 @@ void apic_init_reset(CPUState *env)
     s->wait_for_sipi = 1;
 
     env->halted = !(s->apicbase & MSR_IA32_APICBASE_BSP);
-#ifdef KVM_CAP_MP_STATE
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        env->mp_state
-            = env->halted ? KVM_MP_STATE_UNINITIALIZED : KVM_MP_STATE_RUNNABLE;
-        kvm_load_mpstate(env);
-    }
-#endif
 }
 
 static void apic_startup(APICState *s, int vector_num)
diff --git a/qemu-kvm-ia64.c b/qemu-kvm-ia64.c
index fc8110e..39bcbeb 100644
--- a/qemu-kvm-ia64.c
+++ b/qemu-kvm-ia64.c
@@ -124,7 +124,9 @@ void kvm_arch_cpu_reset(CPUState *env)
 {
     if (kvm_irqchip_in_kernel(kvm_context)) {
 #ifdef KVM_CAP_MP_STATE
-	kvm_reset_mpstate(env->kvm_cpu_state.vcpu_ctx);
+        struct kvm_mp_state mp_state = {.mp_state = KVM_MP_STATE_UNINITIALIZED
+        };
+        kvm_set_mpstate(env, &mp_state);
 #endif
     } else {
 	env->interrupt_request &= ~CPU_INTERRUPT_HARD;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 47667b3..61b3dde 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -754,6 +754,56 @@ static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env)
         return 0;
 }
 
+static void kvm_arch_save_mpstate(CPUState *env)
+{
+#ifdef KVM_CAP_MP_STATE
+    int r;
+    struct kvm_mp_state mp_state;
+
+    r = kvm_get_mpstate(env, &mp_state);
+    if (r < 0) {
+        env->mp_state = -1;
+    } else {
+        env->mp_state = mp_state.mp_state;
+        if (kvm_irqchip_in_kernel()) {
+            env->halted = (env->mp_state == KVM_MP_STATE_HALTED);
+        }
+    }
+#else
+    env->mp_state = -1;
+#endif
+}
+
+static void kvm_arch_load_mpstate(CPUState *env)
+{
+#ifdef KVM_CAP_MP_STATE
+    struct kvm_mp_state mp_state;
+
+    /*
+     * -1 indicates that the host did not support GET_MP_STATE ioctl,
+     *  so don't touch it.
+     */
+    if (env->mp_state != -1) {
+        mp_state.mp_state = env->mp_state;
+        kvm_set_mpstate(env, &mp_state);
+    }
+#endif
+}
+
+static void kvm_reset_mpstate(CPUState *env)
+{
+#ifdef KVM_CAP_MP_STATE
+    if (kvm_check_extension(kvm_state, KVM_CAP_MP_STATE)) {
+        if (kvm_irqchip_in_kernel()) {
+            env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE :
+                                              KVM_MP_STATE_UNINITIALIZED;
+        } else {
+            env->mp_state = KVM_MP_STATE_RUNNABLE;
+        }
+    }
+#endif
+}
+
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
 {
     lhs->selector = rhs->selector;
@@ -922,6 +972,14 @@ void kvm_arch_load_regs(CPUState *env, int level)
     if (rc == -1)
         perror("kvm_set_msrs FAILED");
 
+    if (level >= KVM_PUT_RESET_STATE) {
+        kvm_arch_load_mpstate(env);
+    }
+    if (kvm_irqchip_in_kernel()) {
+        /* Avoid deadlock: no user space IRQ will ever clear it. */
+        env->halted = 0;
+    }
+
     /* must be last */
     kvm_guest_debug_workarounds(env);
 }
@@ -938,36 +996,6 @@ void kvm_load_tsc(CPUState *env)
         perror("kvm_set_tsc FAILED.\n");
 }
 
-void kvm_arch_save_mpstate(CPUState *env)
-{
-#ifdef KVM_CAP_MP_STATE
-    int r;
-    struct kvm_mp_state mp_state;
-
-    r = kvm_get_mpstate(env, &mp_state);
-    if (r < 0)
-        env->mp_state = -1;
-    else
-        env->mp_state = mp_state.mp_state;
-#else
-    env->mp_state = -1;
-#endif
-}
-
-void kvm_arch_load_mpstate(CPUState *env)
-{
-#ifdef KVM_CAP_MP_STATE
-    struct kvm_mp_state mp_state = { .mp_state = env->mp_state };
-
-    /*
-     * -1 indicates that the host did not support GET_MP_STATE ioctl,
-     *  so don't touch it.
-     */
-    if (env->mp_state != -1)
-        kvm_set_mpstate(env, &mp_state);
-#endif
-}
-
 void kvm_arch_save_regs(CPUState *env)
 {
     struct kvm_regs regs;
@@ -1290,6 +1318,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 #ifdef KVM_EXIT_TPR_ACCESS
     kvm_tpr_vcpu_start(cenv);
 #endif
+    kvm_reset_mpstate(cenv);
     return 0;
 }
 
@@ -1363,16 +1392,7 @@ void kvm_arch_cpu_reset(CPUState *env)
 {
     kvm_arch_reset_vcpu(env);
     kvm_put_vcpu_events(env);
-    if (!cpu_is_bsp(env)) {
-	if (kvm_irqchip_in_kernel()) {
-#ifdef KVM_CAP_MP_STATE
-	    kvm_reset_mpstate(env);
-#endif
-	} else {
-	    env->interrupt_request &= ~CPU_INTERRUPT_HARD;
-	    env->halted = 1;
-	}
-    }
+    kvm_reset_mpstate(env);
 }
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
diff --git a/qemu-kvm.c b/qemu-kvm.c
index f6d3e4d..103e0d9 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1590,36 +1590,6 @@ void kvm_update_interrupt_request(CPUState *env)
     }
 }
 
-static void kvm_do_load_mpstate(void *_env)
-{
-    CPUState *env = _env;
-
-    kvm_arch_load_mpstate(env);
-}
-
-void kvm_load_mpstate(CPUState *env)
-{
-    if (kvm_enabled() && qemu_system_ready && kvm_vcpu_inited(env))
-        on_vcpu(env, kvm_do_load_mpstate, env);
-}
-
-static void kvm_do_save_mpstate(void *_env)
-{
-    CPUState *env = _env;
-
-    kvm_arch_save_mpstate(env);
-#ifdef KVM_CAP_MP_STATE
-    if (kvm_irqchip_in_kernel())
-        env->halted = (env->mp_state == KVM_MP_STATE_HALTED);
-#endif
-}
-
-void kvm_save_mpstate(CPUState *env)
-{
-    if (kvm_enabled())
-        on_vcpu(env, kvm_do_save_mpstate, env);
-}
-
 int kvm_cpu_exec(CPUState *env)
 {
     int r;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index fab930d..c6ff652 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -299,16 +299,6 @@ int kvm_get_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
  *
  */
 int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
-/*!
- *  * \brief Reset VCPU MP state
- *
- */
-static inline int kvm_reset_mpstate(CPUState *env)
-{
-    struct kvm_mp_state mp_state = {.mp_state = KVM_MP_STATE_UNINITIALIZED
-    };
-    return kvm_set_mpstate(env, &mp_state);
-}
 #endif
 
 /*!
@@ -874,8 +864,6 @@ static inline void kvm_inject_x86_mce(CPUState *cenv, int bank,
 int kvm_main_loop(void);
 int kvm_init_ap(void);
 int kvm_vcpu_inited(CPUState *env);
-void kvm_load_mpstate(CPUState *env);
-void kvm_save_mpstate(CPUState *env);
 void kvm_apic_init(CPUState *env);
 /* called from vcpu initialization */
 void qemu_kvm_load_lapic(CPUState *env);
@@ -909,8 +897,6 @@ int kvm_arch_qemu_create_context(void);
 
 void kvm_arch_save_regs(CPUState *env);
 void kvm_arch_load_regs(CPUState *env, int level);
-void kvm_arch_load_mpstate(CPUState *env);
-void kvm_arch_save_mpstate(CPUState *env);
 int kvm_arch_has_work(CPUState *env);
 void kvm_arch_process_irqchip_events(CPUState *env);
 int kvm_arch_try_push_interrupts(void *opaque);
@@ -979,7 +965,6 @@ void kvm_load_tsc(CPUState *env);
 #ifdef TARGET_I386
 #define qemu_kvm_has_pit_state2() (0)
 #endif
-#define kvm_save_mpstate(env)   do {} while(0)
 #define qemu_kvm_cpu_stop(env) do {} while(0)
 static inline void kvm_load_tsc(CPUState *env)
 {
diff --git a/target-i386/machine.c b/target-i386/machine.c
index bebde82..61e6a87 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -323,7 +323,6 @@ static void cpu_pre_save(void *opaque)
     int i;
 
     if (kvm_enabled()) {
-        kvm_save_mpstate(env);
         kvm_get_vcpu_events(env);
     }
 
@@ -362,12 +361,7 @@ static int cpu_post_load(void *opaque, int version_id)
     tlb_flush(env, 1);
 
     if (kvm_enabled()) {
-        /* when in-kernel irqchip is used, env->halted causes deadlock
-           because no userspace IRQs will ever clear this flag */
-        env->halted = 0;
-
         kvm_load_tsc(env);
-        kvm_load_mpstate(env);
         kvm_put_vcpu_events(env);
     }
 
diff --git a/target-ia64/machine.c b/target-ia64/machine.c
index fdbeeef..8cf5bdd 100644
--- a/target-ia64/machine.c
+++ b/target-ia64/machine.c
@@ -4,6 +4,9 @@
 #include "exec-all.h"
 #include "qemu-kvm.h"
 
+void kvm_arch_save_mpstate(CPUState *env);
+void kvm_arch_load_mpstate(CPUState *env);
+
 void cpu_save(QEMUFile *f, void *opaque)
 {
     CPUState *env = opaque;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-02-25 17:20 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-24 14:17 [Qemu-devel] [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 01/10] qemu-kvm: Add KVM_CAP_X86_ROBUST_SINGLESTEP-awareness Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 02/10] qemu-kvm: Rework VCPU state writeback API Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 03/10] x86: Extend validity of cpu_is_bsp Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 04/10] qemu-kvm: Clean up mpstate synchronization Jan Kiszka
2010-02-24 22:44   ` [Qemu-devel] " Marcelo Tosatti
2010-02-25  0:02     ` Jan Kiszka
2010-02-25 11:56       ` Jan Kiszka
2010-02-25 17:20   ` [Qemu-devel] [PATCH v4 " Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 05/10] KVM: x86: Restrict writeback of VCPU state Jan Kiszka
2010-02-24 22:59   ` [Qemu-devel] " Marcelo Tosatti
2010-02-24 23:51     ` Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 06/10] qemu-kvm: Use VCPU event state for reset and vmsave/load Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback Jan Kiszka
2010-02-24 23:17   ` [Qemu-devel] " Marcelo Tosatti
2010-02-24 23:45     ` Jan Kiszka
2010-02-24 23:49       ` Marcelo Tosatti
2010-02-24 23:58         ` Jan Kiszka
2010-02-25  3:58           ` Marcelo Tosatti
2010-02-25  8:48             ` Jan Kiszka
2010-02-25 15:07               ` Marcelo Tosatti
2010-02-25 15:17                 ` Jan Kiszka
2010-02-25 15:48                   ` Marcelo Tosatti
2010-02-25 15:56   ` [Qemu-devel] Re: [PATCH v4 " Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 08/10] qemu-kvm: Clean up KVM's APIC hooks Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 09/10] qemu-kvm: Move kvm_set_boot_cpu_id Jan Kiszka
2010-02-24 14:17 ` [Qemu-devel] [PATCH v3 10/10] qemu-kvm: Bring qemu_init_vcpu back home Jan Kiszka
2010-02-24 23:26 ` [Qemu-devel] Re: [PATCH v3 00/10] qemu-kvm: Hook cleanups and yet more use of upstream code Marcelo Tosatti
2010-02-24 23:55   ` Jan Kiszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).