qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
@ 2012-06-21 15:06 Liu Ping Fan
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock Liu Ping Fan
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock Liu Ping Fan
  0 siblings, 2 replies; 18+ messages in thread
From: Liu Ping Fan @ 2012-06-21 15:06 UTC (permalink / raw)
  To: qemu-devel

Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
protect the race to access the emulated dev launched by vcpu threads & iothread.

But this lock is too big. We can break it down.
These patches separate the CPUArchState's protection from the other devices, so we
can have a per-cpu lock for each CPUArchState, not the big lock any longer.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
  2012-06-21 15:06 [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Liu Ping Fan
@ 2012-06-21 15:06 ` Liu Ping Fan
  2012-06-22 12:36   ` Stefan Hajnoczi
  2012-06-22 12:55   ` Andreas Färber
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock Liu Ping Fan
  1 sibling, 2 replies; 18+ messages in thread
From: Liu Ping Fan @ 2012-06-21 15:06 UTC (permalink / raw)
  To: qemu-devel

introduce a lock for per-cpu to protect agaist accesing from
other vcpu thread.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpu-defs.h  |    2 ++
 cpus.c      |   17 +++++++++++++++++
 main-loop.h |    3 +++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index f49e950..7305822 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -30,6 +30,7 @@
 #include "osdep.h"
 #include "qemu-queue.h"
 #include "targphys.h"
+#include "qemu-thread-posix.h"
 
 #ifndef TARGET_LONG_BITS
 #error TARGET_LONG_BITS must be defined before including this header
@@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
     CPU_COMMON_THREAD                                                   \
     struct QemuCond *halt_cond;                                         \
     int thread_kicked;                                                  \
+    struct QemuMutex *cpu_lock;                                         \
     struct qemu_work_item *queued_work_first, *queued_work_last;        \
     const char *cpu_model_str;                                          \
     struct KVMState *kvm_state;                                         \
diff --git a/cpus.c b/cpus.c
index b182b3d..554f7bc 100644
--- a/cpus.c
+++ b/cpus.c
@@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     env->thread_id = qemu_get_thread_id();
     cpu_single_env = env;
 
+
     r = kvm_init_vcpu(env);
     if (r < 0) {
         fprintf(stderr, "kvm_init_vcpu failed: %s\n", strerror(-r));
@@ -891,6 +892,20 @@ int qemu_cpu_is_self(void *_env)
     return qemu_thread_is_self(env->thread);
 }
 
+void qemu_mutex_lock_cpu(void *_env)
+{
+    CPUArchState *env = _env;
+
+    qemu_mutex_lock(env->cpu_lock);
+}
+
+void qemu_mutex_unlock_cpu(void *_env)
+{
+    CPUArchState *env = _env;
+
+    qemu_mutex_unlock(env->cpu_lock);
+}
+
 void qemu_mutex_lock_iothread(void)
 {
     if (!tcg_enabled()) {
@@ -1027,6 +1042,8 @@ void qemu_init_vcpu(void *_env)
     env->nr_cores = smp_cores;
     env->nr_threads = smp_threads;
     env->stopped = 1;
+    env->cpu_lock = g_malloc0(sizeof(QemuMutex));
+    qemu_mutex_init(env->cpu_lock);
     if (kvm_enabled()) {
         qemu_kvm_start_vcpu(env);
     } else if (tcg_enabled()) {
diff --git a/main-loop.h b/main-loop.h
index dce1cd9..d8d44a4 100644
--- a/main-loop.h
+++ b/main-loop.h
@@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
 int qemu_add_child_watch(pid_t pid);
 #endif
 
+void qemu_mutex_lock_cpu(void *_env);
+void qemu_mutex_unlock_cpu(void *_env);
+
 /**
  * qemu_mutex_lock_iothread: Lock the main loop mutex.
  *
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock
  2012-06-21 15:06 [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Liu Ping Fan
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock Liu Ping Fan
@ 2012-06-21 15:06 ` Liu Ping Fan
  2012-06-22  2:29   ` 陳韋任 (Wei-Ren Chen)
  1 sibling, 1 reply; 18+ messages in thread
From: Liu Ping Fan @ 2012-06-21 15:06 UTC (permalink / raw)
  To: qemu-devel

In order to break the big lock, using per-cpu_lock in kvm_cpu_exec()
to protect the race from other cpu's access to env->apic_state & related
field in env.
Also, we need to protect agaist run_on_cpu().

Race condition can be like this:
1.  vcpu-1 IPI vcpu-2
    vcpu-3 IPI vcpu-2
    Open window exists for accessing to vcpu-2's apic_state & env

2. run_on_cpu() write env->queued_work_last, while flush_queued_work()
   read

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c    |    6 ++++--
 hw/apic.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 hw/pc.c   |    8 +++++++-
 kvm-all.c |   13 +++++++++++--
 4 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/cpus.c b/cpus.c
index 554f7bc..ac99afe 100644
--- a/cpus.c
+++ b/cpus.c
@@ -649,6 +649,7 @@ void run_on_cpu(CPUArchState *env, void (*func)(void *data), void *data)
 
     wi.func = func;
     wi.data = data;
+    qemu_mutex_lock(env->cpu_lock);
     if (!env->queued_work_first) {
         env->queued_work_first = &wi;
     } else {
@@ -657,6 +658,7 @@ void run_on_cpu(CPUArchState *env, void (*func)(void *data), void *data)
     env->queued_work_last = &wi;
     wi.next = NULL;
     wi.done = false;
+    qemu_mutex_unlock(env->cpu_lock);
 
     qemu_cpu_kick(env);
     while (!wi.done) {
@@ -718,7 +720,7 @@ static void qemu_tcg_wait_io_event(void)
 static void qemu_kvm_wait_io_event(CPUArchState *env)
 {
     while (cpu_thread_is_idle(env)) {
-        qemu_cond_wait(env->halt_cond, &qemu_global_mutex);
+        qemu_cond_wait(env->halt_cond, env->cpu_lock);
     }
 
     qemu_kvm_eat_signals(env);
@@ -729,8 +731,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 {
     CPUArchState *env = arg;
     int r;
+    qemu_mutex_lock_cpu(env);
 
-    qemu_mutex_lock(&qemu_global_mutex);
     qemu_thread_get_self(env->thread);
     env->thread_id = qemu_get_thread_id();
     cpu_single_env = env;
diff --git a/hw/apic.c b/hw/apic.c
index 4eeaf88..b999a40 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -22,6 +22,7 @@
 #include "host-utils.h"
 #include "trace.h"
 #include "pc.h"
+#include "qemu-thread.h"
 
 #define MAX_APIC_WORDS 8
 
@@ -94,6 +95,7 @@ static int get_highest_priority_int(uint32_t *tab)
     return -1;
 }
 
+/* Caller must hold the lock */
 static void apic_sync_vapic(APICCommonState *s, int sync_type)
 {
     VAPICState vapic_state;
@@ -141,11 +143,13 @@ static void apic_sync_vapic(APICCommonState *s, int sync_type)
     }
 }
 
+/* Caller must hold lock */
 static void apic_vapic_base_update(APICCommonState *s)
 {
     apic_sync_vapic(s, SYNC_TO_VAPIC);
 }
 
+/* Caller must hold the lock */
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
     uint32_t lvt = s->lvt[vector];
@@ -175,9 +179,11 @@ static void apic_local_deliver(APICCommonState *s, int vector)
             (lvt & APIC_LVT_LEVEL_TRIGGER))
             trigger_mode = APIC_TRIGGER_LEVEL;
         apic_set_irq(s, lvt & 0xff, trigger_mode);
+        break;
     }
 }
 
+/* Caller must hold the lock */
 void apic_deliver_pic_intr(DeviceState *d, int level)
 {
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
@@ -200,9 +206,12 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
     }
 }
 
+/* Must hold lock */
 static void apic_external_nmi(APICCommonState *s)
 {
+    qemu_mutex_lock_cpu(s->cpu_env);
     apic_local_deliver(s, APIC_LVT_LINT1);
+    qemu_mutex_unlock_cpu(s->cpu_env);
 }
 
 #define foreach_apic(apic, deliver_bitmask, code) \
@@ -215,7 +224,9 @@ static void apic_external_nmi(APICCommonState *s)
                 if (__mask & (1 << __j)) {\
                     apic = local_apics[__i * 32 + __j];\
                     if (apic) {\
+                        qemu_mutex_lock_cpu(apic->cpu_env);\
                         code;\
+                        qemu_mutex_unlock_cpu(apic->cpu_env);\
                     }\
                 }\
             }\
@@ -244,7 +255,9 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
                 if (d >= 0) {
                     apic_iter = local_apics[d];
                     if (apic_iter) {
+                        qemu_mutex_lock_cpu(apic_iter->cpu_env);
                         apic_set_irq(apic_iter, vector_num, trigger_mode);
+                        qemu_mutex_unlock_cpu(apic_iter->cpu_env);
                     }
                 }
             }
@@ -293,6 +306,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+/* Caller must hold lock */
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
     s->apicbase = (val & 0xfffff000) |
@@ -305,6 +319,7 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
     }
 }
 
+/* caller must hold lock */
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
     /* Updates from cr8 are ignored while the VAPIC is active */
@@ -314,12 +329,14 @@ static void apic_set_tpr(APICCommonState *s, uint8_t val)
     }
 }
 
+/* caller must hold lock */
 static uint8_t apic_get_tpr(APICCommonState *s)
 {
     apic_sync_vapic(s, SYNC_FROM_VAPIC);
     return s->tpr >> 4;
 }
 
+/* Caller must hold the lock */
 static int apic_get_ppr(APICCommonState *s)
 {
     int tpr, isrv, ppr;
@@ -348,6 +365,7 @@ static int apic_get_arb_pri(APICCommonState *s)
  * 0  - no interrupt,
  * >0 - interrupt number
  */
+/* Caller must hold cpu_lock */
 static int apic_irq_pending(APICCommonState *s)
 {
     int irrv, ppr;
@@ -363,6 +381,7 @@ static int apic_irq_pending(APICCommonState *s)
     return irrv;
 }
 
+/* caller must ensure the lock has been taken */
 /* signal the CPU if an irq is pending */
 static void apic_update_irq(APICCommonState *s)
 {
@@ -380,11 +399,13 @@ static void apic_update_irq(APICCommonState *s)
 void apic_poll_irq(DeviceState *d)
 {
     APICCommonState *s = APIC_COMMON(d);
-
+    qemu_mutex_lock_cpu(s->cpu_env);
     apic_sync_vapic(s, SYNC_FROM_VAPIC);
     apic_update_irq(s);
+    qemu_mutex_unlock_cpu(s->cpu_env);
 }
 
+/* caller must hold the lock */
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
 {
     apic_report_irq_delivered(!get_bit(s->irr, vector_num));
@@ -407,6 +428,7 @@ static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
     apic_update_irq(s);
 }
 
+/* caller must hold the lock */
 static void apic_eoi(APICCommonState *s)
 {
     int isrv;
@@ -415,6 +437,7 @@ static void apic_eoi(APICCommonState *s)
         return;
     reset_bit(s->isr, isrv);
     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && get_bit(s->tmr, isrv)) {
+        //fix me,need to take extra lock for the bus?
         ioapic_eoi_broadcast(isrv);
     }
     apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
@@ -480,18 +503,23 @@ static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
 static void apic_startup(APICCommonState *s, int vector_num)
 {
     s->sipi_vector = vector_num;
+    qemu_mutex_lock_cpu(s->cpu_env);
     cpu_interrupt(s->cpu_env, CPU_INTERRUPT_SIPI);
+    qemu_mutex_unlock_cpu(s->cpu_env);
 }
 
 void apic_sipi(DeviceState *d)
 {
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
-
+    qemu_mutex_lock_cpu(s->cpu_env);
     cpu_reset_interrupt(s->cpu_env, CPU_INTERRUPT_SIPI);
 
-    if (!s->wait_for_sipi)
+    if (!s->wait_for_sipi) {
+        qemu_mutex_unlock_cpu(s->cpu_env);
         return;
+    }
     cpu_x86_load_seg_cache_sipi(s->cpu_env, s->sipi_vector);
+    qemu_mutex_unlock_cpu(s->cpu_env);
     s->wait_for_sipi = 0;
 }
 
@@ -543,6 +571,7 @@ static void apic_deliver(DeviceState *d, uint8_t dest, uint8_t dest_mode,
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+/* Caller must take lock */
 int apic_get_interrupt(DeviceState *d)
 {
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
@@ -572,6 +601,7 @@ int apic_get_interrupt(DeviceState *d)
     return intno;
 }
 
+/* Caller must hold the cpu_lock */
 int apic_accept_pic_intr(DeviceState *d)
 {
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
@@ -589,6 +619,7 @@ int apic_accept_pic_intr(DeviceState *d)
     return 0;
 }
 
+/* Caller hold lock */
 static uint32_t apic_get_current_count(APICCommonState *s)
 {
     int64_t d;
@@ -619,9 +650,10 @@ static void apic_timer_update(APICCommonState *s, int64_t current_time)
 static void apic_timer(void *opaque)
 {
     APICCommonState *s = opaque;
-
+    qemu_mutex_lock_cpu(s->cpu_env);
     apic_local_deliver(s, APIC_LVT_TIMER);
     apic_timer_update(s, s->next_time);
+    qemu_mutex_unlock_cpu(s->cpu_env);
 }
 
 static uint32_t apic_mem_readb(void *opaque, target_phys_addr_t addr)
@@ -664,18 +696,22 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
         val = 0x11 | ((APIC_LVT_NB - 1) << 16); /* version 0x11 */
         break;
     case 0x08:
+        qemu_mutex_lock_cpu(s->cpu_env);
         apic_sync_vapic(s, SYNC_FROM_VAPIC);
         if (apic_report_tpr_access) {
             cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_READ);
         }
         val = s->tpr;
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x09:
         val = apic_get_arb_pri(s);
         break;
     case 0x0a:
         /* ppr */
+        qemu_mutex_lock_cpu(s->cpu_env);
         val = apic_get_ppr(s);
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x0b:
         val = 0;
@@ -712,7 +748,9 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
         val = s->initial_count;
         break;
     case 0x39:
+        qemu_mutex_lock_cpu(s->cpu_env);
         val = apic_get_current_count(s);
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x3e:
         val = s->divide_conf;
@@ -767,18 +805,22 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
     case 0x03:
         break;
     case 0x08:
+       qemu_mutex_lock_cpu(s->cpu_env);
         if (apic_report_tpr_access) {
             cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_WRITE);
         }
         s->tpr = val;
         apic_sync_vapic(s, SYNC_TO_VAPIC);
         apic_update_irq(s);
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x09:
     case 0x0a:
         break;
     case 0x0b: /* EOI */
+        qemu_mutex_lock_cpu(s->cpu_env);
         apic_eoi(s);
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x0d:
         s->log_dest = val >> 24;
@@ -787,8 +829,10 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
         s->dest_mode = val >> 28;
         break;
     case 0x0f:
+        qemu_mutex_lock_cpu(s->cpu_env);
         s->spurious_vec = val & 0x1ff;
         apic_update_irq(s);
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x10 ... 0x17:
     case 0x18 ... 0x1f:
@@ -807,24 +851,30 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
     case 0x32 ... 0x37:
         {
             int n = index - 0x32;
+            qemu_mutex_lock_cpu(s->cpu_env);
             s->lvt[n] = val;
             if (n == APIC_LVT_TIMER)
                 apic_timer_update(s, qemu_get_clock_ns(vm_clock));
+            qemu_mutex_unlock_cpu(s->cpu_env);
         }
         break;
     case 0x38:
+       qemu_mutex_lock_cpu(s->cpu_env);
         s->initial_count = val;
         s->initial_count_load_time = qemu_get_clock_ns(vm_clock);
         apic_timer_update(s, s->initial_count_load_time);
+        qemu_mutex_unlock_cpu(s->cpu_env);
         break;
     case 0x39:
         break;
     case 0x3e:
         {
             int v;
+            qemu_mutex_lock_cpu(s->cpu_env);
             s->divide_conf = val & 0xb;
             v = (s->divide_conf & 3) | ((s->divide_conf >> 1) & 4);
             s->count_shift = (v + 1) & 7;
+            qemu_mutex_unlock_cpu(s->cpu_env);
         }
         break;
     default:
diff --git a/hw/pc.c b/hw/pc.c
index 4d34a33..5e7350c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -147,7 +147,7 @@ void cpu_smm_update(CPUX86State *env)
         smm_set(!!(env->hflags & HF_SMM_MASK), smm_arg);
 }
 
-
+/* Called by kvm_cpu_exec(), already with lock protection */
 /* IRQ handling */
 int cpu_get_pic_interrupt(CPUX86State *env)
 {
@@ -173,9 +173,15 @@ static void pic_irq_request(void *opaque, int irq, int level)
     DPRINTF("pic_irqs: %s irq %d\n", level? "raise" : "lower", irq);
     if (env->apic_state) {
         while (env) {
+            if (!qemu_cpu_is_self(env)) {
+                qemu_mutex_lock_cpu(env);
+            }
             if (apic_accept_pic_intr(env->apic_state)) {
                 apic_deliver_pic_intr(env->apic_state, level);
             }
+            if (!qemu_cpu_is_self(env)) {
+                qemu_mutex_unlock_cpu(env);
+            }
             env = env->next_cpu;
         }
     } else {
diff --git a/kvm-all.c b/kvm-all.c
index 9b73ccf..dc9aa54 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -29,6 +29,7 @@
 #include "bswap.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "qemu-thread.h"
 
 /* This check must be after config-host.h is included */
 #ifdef CONFIG_EVENTFD
@@ -1252,12 +1253,15 @@ int kvm_cpu_exec(CPUArchState *env)
              */
             qemu_cpu_kick_self();
         }
-        qemu_mutex_unlock_iothread();
+        qemu_mutex_unlock(env->cpu_lock);
 
         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 
-        qemu_mutex_lock_iothread();
+        qemu_mutex_lock(env->cpu_lock);
         kvm_arch_post_run(env, run);
+        qemu_mutex_unlock_cpu(env);
+
+        qemu_mutex_lock_iothread();
 
         kvm_flush_coalesced_mmio_buffer();
 
@@ -1265,6 +1269,8 @@ int kvm_cpu_exec(CPUArchState *env)
             if (run_ret == -EINTR || run_ret == -EAGAIN) {
                 DPRINTF("io window exit\n");
                 ret = EXCP_INTERRUPT;
+                qemu_mutex_unlock_iothread();
+                qemu_mutex_lock_cpu(env);
                 break;
             }
             fprintf(stderr, "error: kvm run failed %s\n",
@@ -1312,6 +1318,9 @@ int kvm_cpu_exec(CPUArchState *env)
             ret = kvm_arch_handle_exit(env, run);
             break;
         }
+
+        qemu_mutex_unlock_iothread();
+        qemu_mutex_lock_cpu(env);
     } while (ret == 0);
 
     if (ret < 0) {
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
       [not found] <1340290158-11036-1-git-send-email-qemulist@gmail.com>
@ 2012-06-21 15:23 ` Jan Kiszka
  2012-06-22 10:24   ` liu ping fan
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2012-06-21 15:23 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

On 2012-06-21 16:49, Liu Ping Fan wrote:
> Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
> protect the race to access the emulated dev launched by vcpu threads & iothread.
> 
> But this lock is too big. We can break it down.
> These patches separate the CPUArchState's protection from the other devices, so we
> can have a per-cpu lock for each CPUArchState, not the big lock any longer.

Anything that reduces lock dependencies is generally welcome. But can
you specify in more details what you gain, and under which conditions?

I'm skeptical if this is the right area to start. With the in-kernel
irqchip enabled, no CPUArchState field is touched during normal
operations (unless I missed something subtle in the past). At the same
time, this locking is unfortunately fairly complex and invasive, so not
"cheap" to integrate.

IMO more interesting is breaking out some I/O path, e.g. from a NIC to a
network backend, and get this processed in a separate thread without
touching the BQL (Big QEMU Lock). I've experimental patches for this
here, but they need rebasing and polishing.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock Liu Ping Fan
@ 2012-06-22  2:29   ` 陳韋任 (Wei-Ren Chen)
  2012-06-22 10:26     ` liu ping fan
  0 siblings, 1 reply; 18+ messages in thread
From: 陳韋任 (Wei-Ren Chen) @ 2012-06-22  2:29 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel

Hi Liu,

On Thu, Jun 21, 2012 at 11:06:58PM +0800, Liu Ping Fan wrote:
> In order to break the big lock, using per-cpu_lock in kvm_cpu_exec()
> to protect the race from other cpu's access to env->apic_state & related
> field in env.

  Can this also be applied on tcg_cpu_exec(), too?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-21 15:23 ` [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Jan Kiszka
@ 2012-06-22 10:24   ` liu ping fan
  2012-06-22 10:37     ` Jan Kiszka
  0 siblings, 1 reply; 18+ messages in thread
From: liu ping fan @ 2012-06-22 10:24 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Anthony Liguori

On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-06-21 16:49, Liu Ping Fan wrote:
>> Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
>> protect the race to access the emulated dev launched by vcpu threads & iothread.
>>
>> But this lock is too big. We can break it down.
>> These patches separate the CPUArchState's protection from the other devices, so we
>> can have a per-cpu lock for each CPUArchState, not the big lock any longer.
>
> Anything that reduces lock dependencies is generally welcome. But can
> you specify in more details what you gain, and under which conditions?
>
In fact, there are several steps to break down the Qemu big lock. This
step just aims to shrink the code area protected by
qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
working on the following steps, which focus on breaking down the big
lock when calling handle_{io,mmio}

Thanks and regards,
pingfan

> I'm skeptical if this is the right area to start. With the in-kernel
> irqchip enabled, no CPUArchState field is touched during normal
> operations (unless I missed something subtle in the past). At the same
> time, this locking is unfortunately fairly complex and invasive, so not
> "cheap" to integrate.
>
> IMO more interesting is breaking out some I/O path, e.g. from a NIC to a
> network backend, and get this processed in a separate thread without
> touching the BQL (Big QEMU Lock). I've experimental patches for this
> here, but they need rebasing and polishing.
>
> Thanks,
> Jan
>
> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock
  2012-06-22  2:29   ` 陳韋任 (Wei-Ren Chen)
@ 2012-06-22 10:26     ` liu ping fan
  0 siblings, 0 replies; 18+ messages in thread
From: liu ping fan @ 2012-06-22 10:26 UTC (permalink / raw)
  To: 陳韋任 (Wei-Ren Chen)
  Cc: Jan Kiszka, qemu-devel, Anthony Liguori

On Fri, Jun 22, 2012 at 10:29 AM, 陳韋任 (Wei-Ren Chen)
<chenwj@iis.sinica.edu.tw> wrote:
> Hi Liu,
>
> On Thu, Jun 21, 2012 at 11:06:58PM +0800, Liu Ping Fan wrote:
>> In order to break the big lock, using per-cpu_lock in kvm_cpu_exec()
>> to protect the race from other cpu's access to env->apic_state & related
>> field in env.
>
>  Can this also be applied on tcg_cpu_exec(), too?
>
No, currently, I am focusing on kvm branch

Regards,
pingfan

> Regards,
> chenwj
>
> --
> Wei-Ren Chen (陳韋任)
> Computer Systems Lab, Institute of Information Science,
> Academia Sinica, Taiwan (R.O.C.)
> Tel:886-2-2788-3799 #1667
> Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 10:24   ` liu ping fan
@ 2012-06-22 10:37     ` Jan Kiszka
  2012-06-22 20:11       ` Anthony Liguori
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2012-06-22 10:37 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel@nongnu.org, Anthony Liguori

On 2012-06-22 12:24, liu ping fan wrote:
> On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-06-21 16:49, Liu Ping Fan wrote:
>>> Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
>>> protect the race to access the emulated dev launched by vcpu threads & iothread.
>>>
>>> But this lock is too big. We can break it down.
>>> These patches separate the CPUArchState's protection from the other devices, so we
>>> can have a per-cpu lock for each CPUArchState, not the big lock any longer.
>>
>> Anything that reduces lock dependencies is generally welcome. But can
>> you specify in more details what you gain, and under which conditions?
>>
> In fact, there are several steps to break down the Qemu big lock. This
> step just aims to shrink the code area protected by
> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
> working on the following steps, which focus on breaking down the big
> lock when calling handle_{io,mmio}

Then let us discuss the strategy. This is important as it is unrealistic
to break up the lock for all code paths. We really need to focus on
goals that provide benefits for relevant use cases.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock Liu Ping Fan
@ 2012-06-22 12:36   ` Stefan Hajnoczi
  2012-06-24 14:05     ` liu ping fan
  2012-06-22 12:55   ` Andreas Färber
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-06-22 12:36 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel

On Thu, Jun 21, 2012 at 4:06 PM, Liu Ping Fan <qemulist@gmail.com> wrote:
> diff --git a/cpu-defs.h b/cpu-defs.h
> index f49e950..7305822 100644
> --- a/cpu-defs.h
> +++ b/cpu-defs.h
> @@ -30,6 +30,7 @@
>  #include "osdep.h"
>  #include "qemu-queue.h"
>  #include "targphys.h"
> +#include "qemu-thread-posix.h"

This breaks Windows, you need qemu-thread.h.

>
>  #ifndef TARGET_LONG_BITS
>  #error TARGET_LONG_BITS must be defined before including this header
> @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
>     CPU_COMMON_THREAD                                                   \
>     struct QemuCond *halt_cond;                                         \
>     int thread_kicked;                                                  \
> +    struct QemuMutex *cpu_lock;                                         \

It would be nicer to declare it QemuMutex cpu_lock (no pointer) so
that you don't need to worry about malloc/free.

>     struct qemu_work_item *queued_work_first, *queued_work_last;        \
>     const char *cpu_model_str;                                          \
>     struct KVMState *kvm_state;                                         \
> diff --git a/cpus.c b/cpus.c
> index b182b3d..554f7bc 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>     env->thread_id = qemu_get_thread_id();
>     cpu_single_env = env;
>
> +
>     r = kvm_init_vcpu(env);
>     if (r < 0) {
>         fprintf(stderr, "kvm_init_vcpu failed: %s\n", strerror(-r));

Spurious whitespace change, this should be dropped from the patch.

> diff --git a/main-loop.h b/main-loop.h
> index dce1cd9..d8d44a4 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
>  int qemu_add_child_watch(pid_t pid);
>  #endif
>
> +void qemu_mutex_lock_cpu(void *_env);
> +void qemu_mutex_unlock_cpu(void *_env);

Why void* instead of CPUArchState*?

Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
  2012-06-21 15:06 ` [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock Liu Ping Fan
  2012-06-22 12:36   ` Stefan Hajnoczi
@ 2012-06-22 12:55   ` Andreas Färber
  2012-06-24 14:02     ` liu ping fan
  1 sibling, 1 reply; 18+ messages in thread
From: Andreas Färber @ 2012-06-22 12:55 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: Jan Kiszka, qemu-devel, Anthony Liguori

Am 21.06.2012 17:06, schrieb Liu Ping Fan:
> introduce a lock for per-cpu to protect agaist accesing from
> other vcpu thread.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpu-defs.h  |    2 ++
>  cpus.c      |   17 +++++++++++++++++
>  main-loop.h |    3 +++
>  3 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/cpu-defs.h b/cpu-defs.h
> index f49e950..7305822 100644
> --- a/cpu-defs.h
> +++ b/cpu-defs.h
> @@ -30,6 +30,7 @@
>  #include "osdep.h"
>  #include "qemu-queue.h"
>  #include "targphys.h"
> +#include "qemu-thread-posix.h"
>  
>  #ifndef TARGET_LONG_BITS
>  #error TARGET_LONG_BITS must be defined before including this header
> @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
>      CPU_COMMON_THREAD                                                   \
>      struct QemuCond *halt_cond;                                         \
>      int thread_kicked;                                                  \
> +    struct QemuMutex *cpu_lock;                                         \
>      struct qemu_work_item *queued_work_first, *queued_work_last;        \
>      const char *cpu_model_str;                                          \
>      struct KVMState *kvm_state;                                         \

Please don't add stuff to CPU_COMMON. Instead add to CPUState in
qom/cpu.c. The QOM CPUState part 4 series moves many of those fields there.

> diff --git a/cpus.c b/cpus.c
> index b182b3d..554f7bc 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>      env->thread_id = qemu_get_thread_id();
>      cpu_single_env = env;
>  
> +

Stray whitespace addition.

>      r = kvm_init_vcpu(env);
>      if (r < 0) {
>          fprintf(stderr, "kvm_init_vcpu failed: %s\n", strerror(-r));
> @@ -891,6 +892,20 @@ int qemu_cpu_is_self(void *_env)
>      return qemu_thread_is_self(env->thread);
>  }
>  
> +void qemu_mutex_lock_cpu(void *_env)
> +{
> +    CPUArchState *env = _env;
> +
> +    qemu_mutex_lock(env->cpu_lock);
> +}
> +
> +void qemu_mutex_unlock_cpu(void *_env)
> +{
> +    CPUArchState *env = _env;
> +
> +    qemu_mutex_unlock(env->cpu_lock);
> +}
> +

I don't like these helpers. For one, you are using void * arguments and
casting them, for another you are using CPUArchState at all. With my
suggestion above these can be CPUState *cpu.

>  void qemu_mutex_lock_iothread(void)
>  {
>      if (!tcg_enabled()) {
> @@ -1027,6 +1042,8 @@ void qemu_init_vcpu(void *_env)
>      env->nr_cores = smp_cores;
>      env->nr_threads = smp_threads;
>      env->stopped = 1;
> +    env->cpu_lock = g_malloc0(sizeof(QemuMutex));
> +    qemu_mutex_init(env->cpu_lock);

Are you sure this is not needed for linux-user/bsd-user? If not needed,
then the field should be #ifdef'ed in the struct to assure that.
Otherwise this function is never called and you need to move the
initialization to the initfn in qom/cpu.c and then should also clean it
up in a finalizer.

Andreas

>      if (kvm_enabled()) {
>          qemu_kvm_start_vcpu(env);
>      } else if (tcg_enabled()) {
> diff --git a/main-loop.h b/main-loop.h
> index dce1cd9..d8d44a4 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
>  int qemu_add_child_watch(pid_t pid);
>  #endif
>  
> +void qemu_mutex_lock_cpu(void *_env);
> +void qemu_mutex_unlock_cpu(void *_env);
> +
>  /**
>   * qemu_mutex_lock_iothread: Lock the main loop mutex.
>   *

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 10:37     ` Jan Kiszka
@ 2012-06-22 20:11       ` Anthony Liguori
  2012-06-22 21:14         ` Jan Kiszka
  0 siblings, 1 reply; 18+ messages in thread
From: Anthony Liguori @ 2012-06-22 20:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: liu ping fan, Stefan Hajnoczi, qemu-devel@nongnu.org

On 06/22/2012 05:37 AM, Jan Kiszka wrote:
> On 2012-06-22 12:24, liu ping fan wrote:
>> On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
>>> On 2012-06-21 16:49, Liu Ping Fan wrote:
>>>> Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
>>>> protect the race to access the emulated dev launched by vcpu threads&  iothread.
>>>>
>>>> But this lock is too big. We can break it down.
>>>> These patches separate the CPUArchState's protection from the other devices, so we
>>>> can have a per-cpu lock for each CPUArchState, not the big lock any longer.
>>>
>>> Anything that reduces lock dependencies is generally welcome. But can
>>> you specify in more details what you gain, and under which conditions?
>>>
>> In fact, there are several steps to break down the Qemu big lock. This
>> step just aims to shrink the code area protected by
>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
>> working on the following steps, which focus on breaking down the big
>> lock when calling handle_{io,mmio}
>
> Then let us discuss the strategy. This is important as it is unrealistic
> to break up the lock for all code paths. We really need to focus on
> goals that provide benefits for relevant use cases.

Stefan put together a proof of concept that implemented the data-plane portion 
of virtio-blk in a separate thread.  This is possible because of I/O eventfd (we 
were able to select() on that fd in a separate thread).

The performance difference between virtio-blk-pci and virtio-blk-data-plane is 
staggering when dealing with a very large storage system.

So we'd like to get the infrastructure in place where we can start 
multithreading devices in QEMU to we can integrate this work.

The basic plan is introduce granular locking starting at the KVM dispatch level 
until we can get to MemoryRegion dispatch.  We'll then have some way to indicate 
that a MemoryRegion's callbacks should be invoked without holding the qemu 
global mutex.

We can then convert devices one at a time.

While the threading in the KVM code is certainly complex, it's also relatively 
isolated from the rest of QEMU.  So we don't have to worry about auditing large 
subsystems for re-entrancy safety.

Once we have unlocked MemoryRegions, we can start writing some synthetic test 
cases to really stress the locking code too.

Regards,

Anthony Liguori

>
> Jan
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 20:11       ` Anthony Liguori
@ 2012-06-22 21:14         ` Jan Kiszka
  2012-06-22 21:44           ` Anthony Liguori
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2012-06-22 21:14 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel@nongnu.org, liu ping fan, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2998 bytes --]

On 2012-06-22 22:11, Anthony Liguori wrote:
> On 06/22/2012 05:37 AM, Jan Kiszka wrote:
>> On 2012-06-22 12:24, liu ping fan wrote:
>>> On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka<jan.kiszka@siemens.com> 
>>> wrote:
>>>> On 2012-06-21 16:49, Liu Ping Fan wrote:
>>>>> Nowadays, we use
>>>>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
>>>>> protect the race to access the emulated dev launched by vcpu
>>>>> threads&  iothread.
>>>>>
>>>>> But this lock is too big. We can break it down.
>>>>> These patches separate the CPUArchState's protection from the other
>>>>> devices, so we
>>>>> can have a per-cpu lock for each CPUArchState, not the big lock any
>>>>> longer.
>>>>
>>>> Anything that reduces lock dependencies is generally welcome. But can
>>>> you specify in more details what you gain, and under which conditions?
>>>>
>>> In fact, there are several steps to break down the Qemu big lock. This
>>> step just aims to shrink the code area protected by
>>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
>>> working on the following steps, which focus on breaking down the big
>>> lock when calling handle_{io,mmio}
>>
>> Then let us discuss the strategy. This is important as it is unrealistic
>> to break up the lock for all code paths. We really need to focus on
>> goals that provide benefits for relevant use cases.
> 
> Stefan put together a proof of concept that implemented the data-plane
> portion of virtio-blk in a separate thread.  This is possible because of
> I/O eventfd (we were able to select() on that fd in a separate thread).
> 
> The performance difference between virtio-blk-pci and
> virtio-blk-data-plane is staggering when dealing with a very large
> storage system.
> 
> So we'd like to get the infrastructure in place where we can start
> multithreading devices in QEMU to we can integrate this work.

Can you name the primary bits? We really need to see the whole picture
before adding new locks. They alone are not the solution.

> 
> The basic plan is introduce granular locking starting at the KVM
> dispatch level until we can get to MemoryRegion dispatch.  We'll then
> have some way to indicate that a MemoryRegion's callbacks should be
> invoked without holding the qemu global mutex.

I don't disagree, but this end really looks like starting at the wrong
edge. The changes are not isolated and surely not yet correct
(run_on_cpu is broken for tcg e.g.).

Then, none of this locking should be needed for in-kernel irqchips. All
touched states are thread local or should be modifiable atomically - if
not let's fix *that*, it's more beneficial.

Actually, cpu_lock is counterproductive as it adds locking ops to a path
where we will not need them later on in the normal configuration. User
space irqchip is a slow path and perfectly fine to handle under BQL. So
is VCPU control (pause/resume/run-on). It's better to focus on the fast
path first.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 21:14         ` Jan Kiszka
@ 2012-06-22 21:44           ` Anthony Liguori
  2012-06-22 22:27             ` Jan Kiszka
  0 siblings, 1 reply; 18+ messages in thread
From: Anthony Liguori @ 2012-06-22 21:44 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel@nongnu.org, liu ping fan, Stefan Hajnoczi

On 06/22/2012 04:14 PM, Jan Kiszka wrote:
> On 2012-06-22 22:11, Anthony Liguori wrote:
>> On 06/22/2012 05:37 AM, Jan Kiszka wrote:
>>> On 2012-06-22 12:24, liu ping fan wrote:
>>>> On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka<jan.kiszka@siemens.com>
>>>> wrote:
>>>>> On 2012-06-21 16:49, Liu Ping Fan wrote:
>>>>>> Nowadays, we use
>>>>>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
>>>>>> protect the race to access the emulated dev launched by vcpu
>>>>>> threads&   iothread.
>>>>>>
>>>>>> But this lock is too big. We can break it down.
>>>>>> These patches separate the CPUArchState's protection from the other
>>>>>> devices, so we
>>>>>> can have a per-cpu lock for each CPUArchState, not the big lock any
>>>>>> longer.
>>>>>
>>>>> Anything that reduces lock dependencies is generally welcome. But can
>>>>> you specify in more details what you gain, and under which conditions?
>>>>>
>>>> In fact, there are several steps to break down the Qemu big lock. This
>>>> step just aims to shrink the code area protected by
>>>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
>>>> working on the following steps, which focus on breaking down the big
>>>> lock when calling handle_{io,mmio}
>>>
>>> Then let us discuss the strategy. This is important as it is unrealistic
>>> to break up the lock for all code paths. We really need to focus on
>>> goals that provide benefits for relevant use cases.
>>
>> Stefan put together a proof of concept that implemented the data-plane
>> portion of virtio-blk in a separate thread.  This is possible because of
>> I/O eventfd (we were able to select() on that fd in a separate thread).
>>
>> The performance difference between virtio-blk-pci and
>> virtio-blk-data-plane is staggering when dealing with a very large
>> storage system.
>>
>> So we'd like to get the infrastructure in place where we can start
>> multithreading devices in QEMU to we can integrate this work.
>
> Can you name the primary bits? We really need to see the whole picture
> before adding new locks. They alone are not the solution.

Sorry, not sure what you mean by "the primary bits".

At a high level, the plan is to:

1) unlock iothread before entering the do {} look in kvm_cpu_exec()
    a) reacquire the lock after the loop
    b) reacquire the lock in kvm_handle_io()
    c) introduce an unlocked memory accessor that for now, just requires the 
iothread lock() and calls cpu_physical_memory_rw()

2) focus initially on killing the lock in kvm_handle_io()
    a) the ioport table is pretty simplistic so adding fine grain locking won't 
be hard.
    b) reacquire lock right before ioport dispatch

3) allow for register ioport handlers w/o the dispatch function carrying a iothread
    a) this is mostly memory API plumbing

4) focus on going back and adding fine grain locking to the 
cpu_physical_memory_rw() accessor

Note that whenever possible, we should be using rwlocks instead of a normal 
mutex.  In particular, for the ioport data structures, a rwlock seems pretty 
obvious.

>
>>
>> The basic plan is introduce granular locking starting at the KVM
>> dispatch level until we can get to MemoryRegion dispatch.  We'll then
>> have some way to indicate that a MemoryRegion's callbacks should be
>> invoked without holding the qemu global mutex.
>
> I don't disagree, but this end really looks like starting at the wrong
> edge. The changes are not isolated and surely not yet correct
> (run_on_cpu is broken for tcg e.g.).
>
> Then, none of this locking should be needed for in-kernel irqchips. All
> touched states are thread local or should be modifiable atomically - if
> not let's fix *that*, it's more beneficial.
>
> Actually, cpu_lock is counterproductive as it adds locking ops to a path
> where we will not need them later on in the normal configuration. User
> space irqchip is a slow path and perfectly fine to handle under BQL. So
> is VCPU control (pause/resume/run-on). It's better to focus on the fast
> path first.

To be clear, I'm not advocating introducing cpu_lock.  We should do whatever 
makes the most sense to not have to hold iothread lock while processing an exit 
from KVM.

Note that this is an RFC, the purpose of this series is to have this discussion :-)

Regards,

Anthony Liguori

>
> Jan
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 21:44           ` Anthony Liguori
@ 2012-06-22 22:27             ` Jan Kiszka
  2012-06-22 22:56               ` Anthony Liguori
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2012-06-22 22:27 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel@nongnu.org, liu ping fan, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 6411 bytes --]

On 2012-06-22 23:44, Anthony Liguori wrote:
> On 06/22/2012 04:14 PM, Jan Kiszka wrote:
>> On 2012-06-22 22:11, Anthony Liguori wrote:
>>> On 06/22/2012 05:37 AM, Jan Kiszka wrote:
>>>> On 2012-06-22 12:24, liu ping fan wrote:
>>>>> On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka<jan.kiszka@siemens.com>
>>>>> wrote:
>>>>>> On 2012-06-21 16:49, Liu Ping Fan wrote:
>>>>>>> Nowadays, we use
>>>>>>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
>>>>>>> protect the race to access the emulated dev launched by vcpu
>>>>>>> threads&   iothread.
>>>>>>>
>>>>>>> But this lock is too big. We can break it down.
>>>>>>> These patches separate the CPUArchState's protection from the other
>>>>>>> devices, so we
>>>>>>> can have a per-cpu lock for each CPUArchState, not the big lock any
>>>>>>> longer.
>>>>>>
>>>>>> Anything that reduces lock dependencies is generally welcome. But can
>>>>>> you specify in more details what you gain, and under which
>>>>>> conditions?
>>>>>>
>>>>> In fact, there are several steps to break down the Qemu big lock. This
>>>>> step just aims to shrink the code area protected by
>>>>> qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
>>>>> working on the following steps, which focus on breaking down the big
>>>>> lock when calling handle_{io,mmio}
>>>>
>>>> Then let us discuss the strategy. This is important as it is
>>>> unrealistic
>>>> to break up the lock for all code paths. We really need to focus on
>>>> goals that provide benefits for relevant use cases.
>>>
>>> Stefan put together a proof of concept that implemented the data-plane
>>> portion of virtio-blk in a separate thread.  This is possible because of
>>> I/O eventfd (we were able to select() on that fd in a separate thread).
>>>
>>> The performance difference between virtio-blk-pci and
>>> virtio-blk-data-plane is staggering when dealing with a very large
>>> storage system.
>>>
>>> So we'd like to get the infrastructure in place where we can start
>>> multithreading devices in QEMU to we can integrate this work.
>>
>> Can you name the primary bits? We really need to see the whole picture
>> before adding new locks. They alone are not the solution.
> 
> Sorry, not sure what you mean by "the primary bits".
> 
> At a high level, the plan is to:
> 
> 1) unlock iothread before entering the do {} look in kvm_cpu_exec()
>    a) reacquire the lock after the loop
>    b) reacquire the lock in kvm_handle_io()
>    c) introduce an unlocked memory accessor that for now, just requires
> the iothread lock() and calls cpu_physical_memory_rw()

Right, that's what we have here as well. The latter is modeled as a so
called "I/O pathway", a thread-based execution context for
frontend/backend pairs with some logic to transfer certain I/O requests
asynchronously to the pathway thread.

The tricky part was to get nested requests right, i.e. when a requests
triggers another one from within the device model. This is where things
get ugly. In theory, you can end up with a vm deadlock if you just apply
per-device locking. I'm currently trying to rebase our patches, review
and document the logic behind it.

> 
> 2) focus initially on killing the lock in kvm_handle_io()
>    a) the ioport table is pretty simplistic so adding fine grain locking
> won't be hard.
>    b) reacquire lock right before ioport dispatch
> 
> 3) allow for register ioport handlers w/o the dispatch function carrying
> a iothread
>    a) this is mostly memory API plumbing

We skipped this as our NICs didn't do PIO, but you clearly need it for
virtio.

> 
> 4) focus on going back and adding fine grain locking to the
> cpu_physical_memory_rw() accessor

In the end, PIO and MMIO should use the same patterns - and will face
the same challenges. Ideally, we model things very similar right from
the start.

And then there is also

5) provide direct IRQ delivery from the device model to the IRQ chip.
That's much like what we need for VFIO and KVM device assignment. But
here we won't be able to cheat and ignore correct generation of vmstates
of the bypassed PCI host bridges etc... Which leads me to that other
thread about how to handle this for PCI device pass-through.
Contributions to that discussion are welcome as well.

> 
> Note that whenever possible, we should be using rwlocks instead of a
> normal mutex.  In particular, for the ioport data structures, a rwlock
> seems pretty obvious.

I think we should mostly be fine with a "big hammer" rwlock: unlocked
read access from VCPUs and iothreads, and vmstop/resume around
modifications of fast path data structures (like the memory region
hierarchy or the PIO table). Where that's not sufficient, RCU will be
needed. Sleeping rwlocks have horrible semantics (specifically when
thread priorities come into play) and are performance-wise inferior. We
should avoid them completely.

> 
>>
>>>
>>> The basic plan is introduce granular locking starting at the KVM
>>> dispatch level until we can get to MemoryRegion dispatch.  We'll then
>>> have some way to indicate that a MemoryRegion's callbacks should be
>>> invoked without holding the qemu global mutex.
>>
>> I don't disagree, but this end really looks like starting at the wrong
>> edge. The changes are not isolated and surely not yet correct
>> (run_on_cpu is broken for tcg e.g.).
>>
>> Then, none of this locking should be needed for in-kernel irqchips. All
>> touched states are thread local or should be modifiable atomically - if
>> not let's fix *that*, it's more beneficial.
>>
>> Actually, cpu_lock is counterproductive as it adds locking ops to a path
>> where we will not need them later on in the normal configuration. User
>> space irqchip is a slow path and perfectly fine to handle under BQL. So
>> is VCPU control (pause/resume/run-on). It's better to focus on the fast
>> path first.
> 
> To be clear, I'm not advocating introducing cpu_lock.  We should do
> whatever makes the most sense to not have to hold iothread lock while
> processing an exit from KVM.

Good that we agree. :)

> 
> Note that this is an RFC, the purpose of this series is to have this
> discussion :-)

Yep, I think we have it now ;). Hope I can contribute some code bits to
it soon, though I didn't schedule this task for the next week.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 22:27             ` Jan Kiszka
@ 2012-06-22 22:56               ` Anthony Liguori
  2012-06-23  9:10                 ` Jan Kiszka
  0 siblings, 1 reply; 18+ messages in thread
From: Anthony Liguori @ 2012-06-22 22:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel@nongnu.org, liu ping fan, Stefan Hajnoczi

On 06/22/2012 05:27 PM, Jan Kiszka wrote:
> On 2012-06-22 23:44, Anthony Liguori wrote:
>> 1) unlock iothread before entering the do {} look in kvm_cpu_exec()
>>     a) reacquire the lock after the loop
>>     b) reacquire the lock in kvm_handle_io()
>>     c) introduce an unlocked memory accessor that for now, just requires
>> the iothread lock() and calls cpu_physical_memory_rw()
>
> Right, that's what we have here as well. The latter is modeled as a so
> called "I/O pathway", a thread-based execution context for
> frontend/backend pairs with some logic to transfer certain I/O requests
> asynchronously to the pathway thread.

Interesting, so the VCPU threads always hold the iothread mutex but some 
requests are routed to other threads?

I hadn't considered a design like that.  I've been thinking about a long term 
architecture that's a bit more invasive.

What we think of the I/O thread today wouldn't be special.  It would be one of N 
I/O threads all running separate copies of the main loop.  All of the functions 
that defer dispatch to a main loop would take a context as an argument and 
devices would essentially have a "vector" array of main loops as input.

So virtio-net probably would have two main loop "vectors" since it would like to 
schedule tx and rx independently.  There's nothing that says that you can't pass 
the same main loop context for each vector but that's a configuration choice.

Dispatch from VCPU context would behave the same it does today but obviously 
per-device locking is needed.

> The tricky part was to get nested requests right, i.e. when a requests
> triggers another one from within the device model. This is where things
> get ugly. In theory, you can end up with a vm deadlock if you just apply
> per-device locking. I'm currently trying to rebase our patches, review
> and document the logic behind it.

I really think the only way to solve this is to separate map()'d DMA access 
(where the device really wants to deal with RAM only) and copy-based access 
(where devices map DMA to other devices).

For copy-based access, we really ought to move to a callback based API.  It adds 
quite a bit of complexity but it's really the only way to solve the problem 
robustly.

>> 2) focus initially on killing the lock in kvm_handle_io()
>>     a) the ioport table is pretty simplistic so adding fine grain locking
>> won't be hard.
>>     b) reacquire lock right before ioport dispatch
>>
>> 3) allow for register ioport handlers w/o the dispatch function carrying
>> a iothread
>>     a) this is mostly memory API plumbing
>
> We skipped this as our NICs didn't do PIO, but you clearly need it for
> virtio.

Right.

>> 4) focus on going back and adding fine grain locking to the
>> cpu_physical_memory_rw() accessor
>
> In the end, PIO and MMIO should use the same patterns - and will face
> the same challenges. Ideally, we model things very similar right from
> the start.

Yes.

> And then there is also
>
> 5) provide direct IRQ delivery from the device model to the IRQ chip.
> That's much like what we need for VFIO and KVM device assignment. But
> here we won't be able to cheat and ignore correct generation of vmstates
> of the bypassed PCI host bridges etc... Which leads me to that other
> thread about how to handle this for PCI device pass-through.
> Contributions to that discussion are welcome as well.

I think you mean to the in-kernel IRQ chip.  I'm thinking about this still so I 
don't have a plan yet that I'm ready to share.  I have some ideas though.

>
>>
>> Note that whenever possible, we should be using rwlocks instead of a
>> normal mutex.  In particular, for the ioport data structures, a rwlock
>> seems pretty obvious.
>
> I think we should mostly be fine with a "big hammer" rwlock: unlocked
> read access from VCPUs and iothreads, and vmstop/resume around
> modifications of fast path data structures (like the memory region
> hierarchy or the PIO table).

Ack.

> Where that's not sufficient, RCU will be
> needed. Sleeping rwlocks have horrible semantics (specifically when
> thread priorities come into play) and are performance-wise inferior. We
> should avoid them completely.

Yes, I think RCU is inevitable here but I think starting with rwlocks will help 
with the big refactoring.

>>
>> To be clear, I'm not advocating introducing cpu_lock.  We should do
>> whatever makes the most sense to not have to hold iothread lock while
>> processing an exit from KVM.
>
> Good that we agree. :)
>
>>
>> Note that this is an RFC, the purpose of this series is to have this
>> discussion :-)
>
> Yep, I think we have it now ;). Hope I can contribute some code bits to
> it soon, though I didn't schedule this task for the next week.

Great!  If you have something you can share, I'd be eager to look at it 
regardless of the condition of the code.

Regards,

Anthony Liguori

>
> Jan
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
  2012-06-22 22:56               ` Anthony Liguori
@ 2012-06-23  9:10                 ` Jan Kiszka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kiszka @ 2012-06-23  9:10 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel@nongnu.org, liu ping fan, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 6401 bytes --]

On 2012-06-23 00:56, Anthony Liguori wrote:
> On 06/22/2012 05:27 PM, Jan Kiszka wrote:
>> On 2012-06-22 23:44, Anthony Liguori wrote:
>>> 1) unlock iothread before entering the do {} look in kvm_cpu_exec()
>>>     a) reacquire the lock after the loop
>>>     b) reacquire the lock in kvm_handle_io()
>>>     c) introduce an unlocked memory accessor that for now, just requires
>>> the iothread lock() and calls cpu_physical_memory_rw()
>>
>> Right, that's what we have here as well. The latter is modeled as a so
>> called "I/O pathway", a thread-based execution context for
>> frontend/backend pairs with some logic to transfer certain I/O requests
>> asynchronously to the pathway thread.
> 
> Interesting, so the VCPU threads always hold the iothread mutex but some
> requests are routed to other threads?

The VCPUs only acquire the iothread locks _unless_ the request can be
handled directly or forwarded to a pathway thread. In the latter case,
pathway-specific locks will be taken. One big advantage of this model is
that you do not need to worry about locks in the device models
themselves. That helps migrating existing models but should also be
sufficient for quite a few use cases.

> 
> I hadn't considered a design like that.  I've been thinking about a long
> term architecture that's a bit more invasive.
> 
> What we think of the I/O thread today wouldn't be special.  It would be
> one of N I/O threads all running separate copies of the main loop.  All
> of the functions that defer dispatch to a main loop would take a context
> as an argument and devices would essentially have a "vector" array of
> main loops as input.
> 
> So virtio-net probably would have two main loop "vectors" since it would
> like to schedule tx and rx independently.  There's nothing that says
> that you can't pass the same main loop context for each vector but
> that's a configuration choice.
> 
> Dispatch from VCPU context would behave the same it does today but
> obviously per-device locking is needed.

And every backend would run over its own thread - I guess this is
conceptually close to what we have. However, the devil is in the detail.
E.g., we will also need per-iothread timer services (we skipped this so
far). And the device-to-device request problem needs to be solved (see
below).

> 
>> The tricky part was to get nested requests right, i.e. when a requests
>> triggers another one from within the device model. This is where things
>> get ugly. In theory, you can end up with a vm deadlock if you just apply
>> per-device locking. I'm currently trying to rebase our patches, review
>> and document the logic behind it.
> 
> I really think the only way to solve this is to separate map()'d DMA
> access (where the device really wants to deal with RAM only) and
> copy-based access (where devices map DMA to other devices).
> 
> For copy-based access, we really ought to move to a callback based API. 
> It adds quite a bit of complexity but it's really the only way to solve
> the problem robustly.

Maybe we are talking about the same thing: What we need is a mechanism
to queue MMIO requests for execution over some iothread / pathway
context in case we are about to get trapped by lock recursion. Then we
also have to make sure that queued requests are not overtaken by
requests issued afterward. This is an important part of our approach.

> 
>>> 2) focus initially on killing the lock in kvm_handle_io()
>>>     a) the ioport table is pretty simplistic so adding fine grain
>>> locking
>>> won't be hard.
>>>     b) reacquire lock right before ioport dispatch
>>>
>>> 3) allow for register ioport handlers w/o the dispatch function carrying
>>> a iothread
>>>     a) this is mostly memory API plumbing
>>
>> We skipped this as our NICs didn't do PIO, but you clearly need it for
>> virtio.
> 
> Right.
> 
>>> 4) focus on going back and adding fine grain locking to the
>>> cpu_physical_memory_rw() accessor
>>
>> In the end, PIO and MMIO should use the same patterns - and will face
>> the same challenges. Ideally, we model things very similar right from
>> the start.
> 
> Yes.
> 
>> And then there is also
>>
>> 5) provide direct IRQ delivery from the device model to the IRQ chip.
>> That's much like what we need for VFIO and KVM device assignment. But
>> here we won't be able to cheat and ignore correct generation of vmstates
>> of the bypassed PCI host bridges etc... Which leads me to that other
>> thread about how to handle this for PCI device pass-through.
>> Contributions to that discussion are welcome as well.
> 
> I think you mean to the in-kernel IRQ chip.  I'm thinking about this
> still so I don't have a plan yet that I'm ready to share.  I have some
> ideas though.
> 
>>
>>>
>>> Note that whenever possible, we should be using rwlocks instead of a
>>> normal mutex.  In particular, for the ioport data structures, a rwlock
>>> seems pretty obvious.
>>
>> I think we should mostly be fine with a "big hammer" rwlock: unlocked
>> read access from VCPUs and iothreads, and vmstop/resume around
>> modifications of fast path data structures (like the memory region
>> hierarchy or the PIO table).
> 
> Ack.
> 
>> Where that's not sufficient, RCU will be
>> needed. Sleeping rwlocks have horrible semantics (specifically when
>> thread priorities come into play) and are performance-wise inferior. We
>> should avoid them completely.
> 
> Yes, I think RCU is inevitable here but I think starting with rwlocks
> will help with the big refactoring.

Let's wait for the first patches... :)

> 
>>>
>>> To be clear, I'm not advocating introducing cpu_lock.  We should do
>>> whatever makes the most sense to not have to hold iothread lock while
>>> processing an exit from KVM.
>>
>> Good that we agree. :)
>>
>>>
>>> Note that this is an RFC, the purpose of this series is to have this
>>> discussion :-)
>>
>> Yep, I think we have it now ;). Hope I can contribute some code bits to
>> it soon, though I didn't schedule this task for the next week.
> 
> Great!  If you have something you can share, I'd be eager to look at it
> regardless of the condition of the code.

Let me just finish the rebasing. The completed switch to memory region
abstractions makes the code cleaner in some important parts.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
  2012-06-22 12:55   ` Andreas Färber
@ 2012-06-24 14:02     ` liu ping fan
  0 siblings, 0 replies; 18+ messages in thread
From: liu ping fan @ 2012-06-24 14:02 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Jan Kiszka, qemu-devel, Anthony Liguori

On Fri, Jun 22, 2012 at 8:55 PM, Andreas Färber <afaerber@suse.de> wrote:
> Am 21.06.2012 17:06, schrieb Liu Ping Fan:
>> introduce a lock for per-cpu to protect agaist accesing from
>> other vcpu thread.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  cpu-defs.h  |    2 ++
>>  cpus.c      |   17 +++++++++++++++++
>>  main-loop.h |    3 +++
>>  3 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpu-defs.h b/cpu-defs.h
>> index f49e950..7305822 100644
>> --- a/cpu-defs.h
>> +++ b/cpu-defs.h
>> @@ -30,6 +30,7 @@
>>  #include "osdep.h"
>>  #include "qemu-queue.h"
>>  #include "targphys.h"
>> +#include "qemu-thread-posix.h"
>>
>>  #ifndef TARGET_LONG_BITS
>>  #error TARGET_LONG_BITS must be defined before including this header
>> @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
>>      CPU_COMMON_THREAD                                                   \
>>      struct QemuCond *halt_cond;                                         \
>>      int thread_kicked;                                                  \
>> +    struct QemuMutex *cpu_lock;                                         \
>>      struct qemu_work_item *queued_work_first, *queued_work_last;        \
>>      const char *cpu_model_str;                                          \
>>      struct KVMState *kvm_state;                                         \
>
> Please don't add stuff to CPU_COMMON. Instead add to CPUState in
> qom/cpu.c. The QOM CPUState part 4 series moves many of those fields there.
>
Ok, thanks.
>> diff --git a/cpus.c b/cpus.c
>> index b182b3d..554f7bc 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>      env->thread_id = qemu_get_thread_id();
>>      cpu_single_env = env;
>>
>> +
>
> Stray whitespace addition.
>
>>      r = kvm_init_vcpu(env);
>>      if (r < 0) {
>>          fprintf(stderr, "kvm_init_vcpu failed: %s\n", strerror(-r));
>> @@ -891,6 +892,20 @@ int qemu_cpu_is_self(void *_env)
>>      return qemu_thread_is_self(env->thread);
>>  }
>>
>> +void qemu_mutex_lock_cpu(void *_env)
>> +{
>> +    CPUArchState *env = _env;
>> +
>> +    qemu_mutex_lock(env->cpu_lock);
>> +}
>> +
>> +void qemu_mutex_unlock_cpu(void *_env)
>> +{
>> +    CPUArchState *env = _env;
>> +
>> +    qemu_mutex_unlock(env->cpu_lock);
>> +}
>> +
>
> I don't like these helpers. For one, you are using void * arguments and
> casting them, for another you are using CPUArchState at all. With my
> suggestion above these can be CPUState *cpu.
>
For using it in apic.c, we need to hide the  CPUArchState structure
>>  void qemu_mutex_lock_iothread(void)
>>  {
>>      if (!tcg_enabled()) {
>> @@ -1027,6 +1042,8 @@ void qemu_init_vcpu(void *_env)
>>      env->nr_cores = smp_cores;
>>      env->nr_threads = smp_threads;
>>      env->stopped = 1;
>> +    env->cpu_lock = g_malloc0(sizeof(QemuMutex));
>> +    qemu_mutex_init(env->cpu_lock);
>
> Are you sure this is not needed for linux-user/bsd-user? If not needed,
> then the field should be #ifdef'ed in the struct to assure that.
> Otherwise this function is never called and you need to move the
> initialization to the initfn in qom/cpu.c and then should also clean it
> up in a finalizer.
>
It is not needed for linux-user/bsd-user. I will use the macro,

Thanks and regards,
pingfan

> Andreas
>
>>      if (kvm_enabled()) {
>>          qemu_kvm_start_vcpu(env);
>>      } else if (tcg_enabled()) {
>> diff --git a/main-loop.h b/main-loop.h
>> index dce1cd9..d8d44a4 100644
>> --- a/main-loop.h
>> +++ b/main-loop.h
>> @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
>>  int qemu_add_child_watch(pid_t pid);
>>  #endif
>>
>> +void qemu_mutex_lock_cpu(void *_env);
>> +void qemu_mutex_unlock_cpu(void *_env);
>> +
>>  /**
>>   * qemu_mutex_lock_iothread: Lock the main loop mutex.
>>   *
>
> --
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock
  2012-06-22 12:36   ` Stefan Hajnoczi
@ 2012-06-24 14:05     ` liu ping fan
  0 siblings, 0 replies; 18+ messages in thread
From: liu ping fan @ 2012-06-24 14:05 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Fri, Jun 22, 2012 at 8:36 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Jun 21, 2012 at 4:06 PM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> diff --git a/cpu-defs.h b/cpu-defs.h
>> index f49e950..7305822 100644
>> --- a/cpu-defs.h
>> +++ b/cpu-defs.h
>> @@ -30,6 +30,7 @@
>>  #include "osdep.h"
>>  #include "qemu-queue.h"
>>  #include "targphys.h"
>> +#include "qemu-thread-posix.h"
>
> This breaks Windows, you need qemu-thread.h.
>
>>
>>  #ifndef TARGET_LONG_BITS
>>  #error TARGET_LONG_BITS must be defined before including this header
>> @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
>>     CPU_COMMON_THREAD                                                   \
>>     struct QemuCond *halt_cond;                                         \
>>     int thread_kicked;                                                  \
>> +    struct QemuMutex *cpu_lock;                                         \
>
> It would be nicer to declare it QemuMutex cpu_lock (no pointer) so
> that you don't need to worry about malloc/free.
>
Yes :), I have considered about it, and not quite sure.  But I figure
out the lock for per-device to break BQL will be like this for some
reason.
After all, have not decided yet.
>>     struct qemu_work_item *queued_work_first, *queued_work_last;        \
>>     const char *cpu_model_str;                                          \
>>     struct KVMState *kvm_state;                                         \
>> diff --git a/cpus.c b/cpus.c
>> index b182b3d..554f7bc 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>     env->thread_id = qemu_get_thread_id();
>>     cpu_single_env = env;
>>
>> +
>>     r = kvm_init_vcpu(env);
>>     if (r < 0) {
>>         fprintf(stderr, "kvm_init_vcpu failed: %s\n", strerror(-r));
>
> Spurious whitespace change, this should be dropped from the patch.
>
>> diff --git a/main-loop.h b/main-loop.h
>> index dce1cd9..d8d44a4 100644
>> --- a/main-loop.h
>> +++ b/main-loop.h
>> @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
>>  int qemu_add_child_watch(pid_t pid);
>>  #endif
>>
>> +void qemu_mutex_lock_cpu(void *_env);
>> +void qemu_mutex_unlock_cpu(void *_env);
>
> Why void* instead of CPUArchState*?
>
Because we can hide the CPUArchState from apic.c

Thanks and regards,
pingfan
> Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-06-24 14:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-21 15:06 [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Liu Ping Fan
2012-06-21 15:06 ` [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock Liu Ping Fan
2012-06-22 12:36   ` Stefan Hajnoczi
2012-06-24 14:05     ` liu ping fan
2012-06-22 12:55   ` Andreas Färber
2012-06-24 14:02     ` liu ping fan
2012-06-21 15:06 ` [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock Liu Ping Fan
2012-06-22  2:29   ` 陳韋任 (Wei-Ren Chen)
2012-06-22 10:26     ` liu ping fan
     [not found] <1340290158-11036-1-git-send-email-qemulist@gmail.com>
2012-06-21 15:23 ` [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Jan Kiszka
2012-06-22 10:24   ` liu ping fan
2012-06-22 10:37     ` Jan Kiszka
2012-06-22 20:11       ` Anthony Liguori
2012-06-22 21:14         ` Jan Kiszka
2012-06-22 21:44           ` Anthony Liguori
2012-06-22 22:27             ` Jan Kiszka
2012-06-22 22:56               ` Anthony Liguori
2012-06-23  9:10                 ` Jan Kiszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).