[PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
@ 2023-04-27  2:09 Jamie Iles
  2023-04-27  2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jamie Iles @ 2023-04-27  2:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, pbonzini, philmd, peter.maydell

From: Jamie Iles <jiles@qti.qualcomm.com>

The round-robin scheduler will iterate over the CPU list with an
assigned budget until the next timer expiry and may exit early because
of a TB exit.  This is fine under normal operation but with icount
enabled and SMP it is possible for a CPU to be starved of run time and
the system live-locks.

For example, booting a riscv64 platform with '-icount
shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
has timers enabled and starts performing TLB shootdowns.  In this case
we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
1.  As we enter the TCG loop, we assign the icount budget to next timer
interrupt to CPU 0 and begin executing where the guest is sat in a busy
loop exhausting all of the budget before we try to execute CPU 1 which
is the target of the IPI but CPU 1 is left with no budget with which to
execute and the process repeats.

We try here to add some fairness by splitting the budget across all of
the CPUs on the thread fairly before entering each one.  The CPU count
is cached on CPU list generation ID to avoid iterating the list on each
loop iteration.  With this change it is possible to boot an SMP rv64
guest with icount enabled and no hangs.

New in v3 (address feedback from Richard Henderson):
 - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where 
   appropriate
 - Move rr_cpu_count() call to be conditional on icount_enabled()
 - Initialize cpu_budget to 0

Jamie Iles (2):
  cpu: expose qemu_cpu_list_lock for lock-guard use
  accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

 accel/tcg/tcg-accel-ops-icount.c | 17 +++++++++++++--
 accel/tcg/tcg-accel-ops-icount.h |  3 ++-
 accel/tcg/tcg-accel-ops-rr.c     | 37 +++++++++++++++++++++++++++++++-
 cpus-common.c                    |  2 +-
 include/exec/cpu-common.h        |  1 +
 linux-user/elfload.c             | 12 +++++------
 migration/dirtyrate.c            | 26 +++++++++++-----------
 trace/control-target.c           |  9 ++++----
 8 files changed, 78 insertions(+), 29 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use
  2023-04-27  2:09 [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
@ 2023-04-27  2:09 ` Jamie Iles
  2023-04-27  7:44   ` Richard Henderson
  2023-04-28 23:05   ` Philippe Mathieu-Daudé
  2023-04-27  2:09 ` [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
  2023-04-29  9:28 ` [PATCH v3 0/2] " Richard Henderson
  2 siblings, 2 replies; 10+ messages in thread
From: Jamie Iles @ 2023-04-27  2:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, pbonzini, philmd, peter.maydell

Expose qemu_cpu_list_lock globally so that we can use
WITH_QEMU_LOCK_GUARD and QEMU_LOCK_GUARD to simplify a few code paths
now and in future.

Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
---
 cpus-common.c             |  2 +-
 include/exec/cpu-common.h |  1 +
 linux-user/elfload.c      | 12 ++++++------
 migration/dirtyrate.c     | 26 +++++++++++++-------------
 trace/control-target.c    |  9 ++++-----
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/cpus-common.c b/cpus-common.c
index b0047e456f93..82d439add5c1 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -25,7 +25,7 @@
 #include "qemu/lockable.h"
 #include "trace/trace-root.h"
 
-static QemuMutex qemu_cpu_list_lock;
+QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
 static QemuCond exclusive_resume;
 static QemuCond qemu_work_cond;
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6feaa40ca7b0..0c833d6ac9c6 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -32,6 +32,7 @@ extern intptr_t qemu_host_page_mask;
 #define REAL_HOST_PAGE_ALIGN(addr) ROUND_UP((addr), qemu_real_host_page_size())
 
 /* The CPU list lock nests outside page_(un)lock or mmap_(un)lock */
+extern QemuMutex qemu_cpu_list_lock;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1dbc1f0f9baa..3ff16b163382 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -4238,14 +4238,14 @@ static int fill_note_info(struct elf_note_info *info,
         info->notes_size += note_size(&info->notes[i]);
 
     /* read and fill status of all threads */
-    cpu_list_lock();
-    CPU_FOREACH(cpu) {
-        if (cpu == thread_cpu) {
-            continue;
+    WITH_QEMU_LOCK_GUARD(&qemu_cpu_list_lock) {
+        CPU_FOREACH(cpu) {
+            if (cpu == thread_cpu) {
+                continue;
+            }
+            fill_thread_info(info, cpu->env_ptr);
         }
-        fill_thread_info(info, cpu->env_ptr);
     }
-    cpu_list_unlock();
 
     return (0);
 }
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 180ba38c7a80..388337a33249 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -150,25 +150,25 @@ int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
 retry:
     init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
-    cpu_list_lock();
-    gen_id = cpu_list_generation_id_get();
-    records = vcpu_dirty_stat_alloc(stat);
-    vcpu_dirty_stat_collect(stat, records, true);
-    cpu_list_unlock();
+    WITH_QEMU_LOCK_GUARD(&qemu_cpu_list_lock) {
+        gen_id = cpu_list_generation_id_get();
+        records = vcpu_dirty_stat_alloc(stat);
+        vcpu_dirty_stat_collect(stat, records, true);
+    }
 
     duration = dirty_stat_wait(calc_time_ms, init_time_ms);
 
     global_dirty_log_sync(flag, one_shot);
 
-    cpu_list_lock();
-    if (gen_id != cpu_list_generation_id_get()) {
-        g_free(records);
-        g_free(stat->rates);
-        cpu_list_unlock();
-        goto retry;
+    WITH_QEMU_LOCK_GUARD(&qemu_cpu_list_lock) {
+        if (gen_id != cpu_list_generation_id_get()) {
+            g_free(records);
+            g_free(stat->rates);
+            cpu_list_unlock();
+            goto retry;
+        }
+        vcpu_dirty_stat_collect(stat, records, false);
     }
-    vcpu_dirty_stat_collect(stat, records, false);
-    cpu_list_unlock();
 
     for (i = 0; i < stat->nvcpu; i++) {
         dirtyrate = do_calculate_dirtyrate(records[i], duration);
diff --git a/trace/control-target.c b/trace/control-target.c
index 232c97a4a183..c0c1e2310a51 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -8,6 +8,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/lockable.h"
 #include "cpu.h"
 #include "trace/trace-root.h"
 #include "trace/control.h"
@@ -116,11 +117,9 @@ static bool adding_first_cpu1(void)
 
 static bool adding_first_cpu(void)
 {
-    bool res;
-    cpu_list_lock();
-    res = adding_first_cpu1();
-    cpu_list_unlock();
-    return res;
+    QEMU_LOCK_GUARD(&qemu_cpu_list_lock);
+
+    return adding_first_cpu1();
 }
 
 void trace_init_vcpu(CPUState *vcpu)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
  2023-04-27  2:09 [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
  2023-04-27  2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
@ 2023-04-27  2:09 ` Jamie Iles
  2023-04-27  7:46   ` Richard Henderson
  2023-04-29  9:28 ` [PATCH v3 0/2] " Richard Henderson
  2 siblings, 1 reply; 10+ messages in thread
From: Jamie Iles @ 2023-04-27  2:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson, pbonzini, philmd, peter.maydell

The round-robin scheduler will iterate over the CPU list with an
assigned budget until the next timer expiry and may exit early because
of a TB exit.  This is fine under normal operation but with icount
enabled and SMP it is possible for a CPU to be starved of run time and
the system live-locks.

For example, booting a riscv64 platform with '-icount
shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
has timers enabled and starts performing TLB shootdowns.  In this case
we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
1.  As we enter the TCG loop, we assign the icount budget to next timer
interrupt to CPU 0 and begin executing where the guest is sat in a busy
loop exhausting all of the budget before we try to execute CPU 1 which
is the target of the IPI but CPU 1 is left with no budget with which to
execute and the process repeats.

We try here to add some fairness by splitting the budget across all of
the CPUs on the thread fairly before entering each one.  The CPU count
is cached on CPU list generation ID to avoid iterating the list on each
loop iteration.  With this change it is possible to boot an SMP rv64
guest with icount enabled and no hangs.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
---
 accel/tcg/tcg-accel-ops-icount.c | 17 +++++++++++++--
 accel/tcg/tcg-accel-ops-icount.h |  3 ++-
 accel/tcg/tcg-accel-ops-rr.c     | 37 +++++++++++++++++++++++++++++++-
 3 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
index 84cc7421be88..e1e8afaf2f99 100644
--- a/accel/tcg/tcg-accel-ops-icount.c
+++ b/accel/tcg/tcg-accel-ops-icount.c
@@ -89,7 +89,20 @@ void icount_handle_deadline(void)
     }
 }
 
-void icount_prepare_for_run(CPUState *cpu)
+/* Distribute the budget evenly across all CPUs */
+int64_t icount_percpu_budget(int cpu_count)
+{
+    int64_t limit = icount_get_limit();
+    int64_t timeslice = limit / cpu_count;
+
+    if (timeslice == 0) {
+        timeslice = limit;
+    }
+
+    return timeslice;
+}
+
+void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget)
 {
     int insns_left;
 
@@ -101,7 +114,7 @@ void icount_prepare_for_run(CPUState *cpu)
     g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
     g_assert(cpu->icount_extra == 0);
 
-    cpu->icount_budget = icount_get_limit();
+    cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
     insns_left = MIN(0xffff, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
     cpu->icount_extra = cpu->icount_budget - insns_left;
diff --git a/accel/tcg/tcg-accel-ops-icount.h b/accel/tcg/tcg-accel-ops-icount.h
index 1b6fd9c60751..16a301b6dc0b 100644
--- a/accel/tcg/tcg-accel-ops-icount.h
+++ b/accel/tcg/tcg-accel-ops-icount.h
@@ -11,7 +11,8 @@
 #define TCG_ACCEL_OPS_ICOUNT_H
 
 void icount_handle_deadline(void);
-void icount_prepare_for_run(CPUState *cpu);
+void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget);
+int64_t icount_percpu_budget(int cpu_count);
 void icount_process_data(CPUState *cpu);
 
 void icount_handle_interrupt(CPUState *cpu, int mask);
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 290833a37fb2..5788efa5ff4d 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -24,6 +24,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/lockable.h"
 #include "sysemu/tcg.h"
 #include "sysemu/replay.h"
 #include "sysemu/cpu-timers.h"
@@ -139,6 +140,33 @@ static void rr_force_rcu(Notifier *notify, void *data)
     rr_kick_next_cpu();
 }
 
+/*
+ * Calculate the number of CPUs that we will process in a single iteration of
+ * the main CPU thread loop so that we can fairly distribute the instruction
+ * count across CPUs.
+ *
+ * The CPU count is cached based on the CPU list generation ID to avoid
+ * iterating the list every time.
+ */
+static int rr_cpu_count(void)
+{
+    static unsigned int last_gen_id = ~0;
+    static int cpu_count;
+    CPUState *cpu;
+
+    QEMU_LOCK_GUARD(&qemu_cpu_list_lock);
+
+    if (cpu_list_generation_id_get() != last_gen_id) {
+        cpu_count = 0;
+        CPU_FOREACH(cpu) {
+            ++cpu_count;
+        }
+        last_gen_id = cpu_list_generation_id_get();
+    }
+
+    return cpu_count;
+}
+
 /*
  * In the single-threaded case each vCPU is simulated in turn. If
  * there is more than a single vCPU we create a simple timer to kick
@@ -185,11 +213,16 @@ static void *rr_cpu_thread_fn(void *arg)
     cpu->exit_request = 1;
 
     while (1) {
+        /* Only used for icount_enabled() */
+        int64_t cpu_budget = 0;
+
         qemu_mutex_unlock_iothread();
         replay_mutex_lock();
         qemu_mutex_lock_iothread();
 
         if (icount_enabled()) {
+            int cpu_count = rr_cpu_count();
+
             /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
             icount_account_warp_timer();
             /*
@@ -197,6 +230,8 @@ static void *rr_cpu_thread_fn(void *arg)
              * waking up the I/O thread and waiting for completion.
              */
             icount_handle_deadline();
+
+            cpu_budget = icount_percpu_budget(cpu_count);
         }
 
         replay_mutex_unlock();
@@ -218,7 +253,7 @@ static void *rr_cpu_thread_fn(void *arg)
 
                 qemu_mutex_unlock_iothread();
                 if (icount_enabled()) {
-                    icount_prepare_for_run(cpu);
+                    icount_prepare_for_run(cpu, cpu_budget);
                 }
                 r = tcg_cpus_exec(cpu);
                 if (icount_enabled()) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use
  2023-04-27  2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
@ 2023-04-27  7:44   ` Richard Henderson
  2023-04-28 23:05   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2023-04-27  7:44 UTC (permalink / raw)
  To: Jamie Iles, qemu-devel; +Cc: pbonzini, philmd, peter.maydell

On 4/27/23 03:09, Jamie Iles wrote:
> Expose qemu_cpu_list_lock globally so that we can use
> WITH_QEMU_LOCK_GUARD and QEMU_LOCK_GUARD to simplify a few code paths
> now and in future.
> 
> Signed-off-by: Jamie Iles<quic_jiles@quicinc.com>
> ---
>   cpus-common.c             |  2 +-
>   include/exec/cpu-common.h |  1 +
>   linux-user/elfload.c      | 12 ++++++------
>   migration/dirtyrate.c     | 26 +++++++++++++-------------
>   trace/control-target.c    |  9 ++++-----
>   5 files changed, 25 insertions(+), 25 deletions(-)

Thanks,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
  2023-04-27  2:09 ` [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
@ 2023-04-27  7:46   ` Richard Henderson
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2023-04-27  7:46 UTC (permalink / raw)
  To: Jamie Iles, qemu-devel; +Cc: pbonzini, philmd, peter.maydell

On 4/27/23 03:09, Jamie Iles wrote:
> The round-robin scheduler will iterate over the CPU list with an
> assigned budget until the next timer expiry and may exit early because
> of a TB exit.  This is fine under normal operation but with icount
> enabled and SMP it is possible for a CPU to be starved of run time and
> the system live-locks.
> 
> For example, booting a riscv64 platform with '-icount
> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> has timers enabled and starts performing TLB shootdowns.  In this case
> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> 1.  As we enter the TCG loop, we assign the icount budget to next timer
> interrupt to CPU 0 and begin executing where the guest is sat in a busy
> loop exhausting all of the budget before we try to execute CPU 1 which
> is the target of the IPI but CPU 1 is left with no budget with which to
> execute and the process repeats.
> 
> We try here to add some fairness by splitting the budget across all of
> the CPUs on the thread fairly before entering each one.  The CPU count
> is cached on CPU list generation ID to avoid iterating the list on each
> loop iteration.  With this change it is possible to boot an SMP rv64
> guest with icount enabled and no hangs.
> 
> Reviewed-by: Philippe Mathieu-Daudé<philmd@linaro.org>
> Tested-by: Peter Maydell<peter.maydell@linaro.org>
> Signed-off-by: Jamie Iles<quic_jiles@quicinc.com>
> ---
>   accel/tcg/tcg-accel-ops-icount.c | 17 +++++++++++++--
>   accel/tcg/tcg-accel-ops-icount.h |  3 ++-
>   accel/tcg/tcg-accel-ops-rr.c     | 37 +++++++++++++++++++++++++++++++-
>   3 files changed, 53 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use
  2023-04-27  2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
  2023-04-27  7:44   ` Richard Henderson
@ 2023-04-28 23:05   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-04-28 23:05 UTC (permalink / raw)
  To: Jamie Iles, qemu-devel; +Cc: richard.henderson, pbonzini, peter.maydell

On 27/4/23 04:09, Jamie Iles wrote:
> Expose qemu_cpu_list_lock globally so that we can use
> WITH_QEMU_LOCK_GUARD and QEMU_LOCK_GUARD to simplify a few code paths
> now and in future.
> 
> Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
> ---
>   cpus-common.c             |  2 +-
>   include/exec/cpu-common.h |  1 +
>   linux-user/elfload.c      | 12 ++++++------
>   migration/dirtyrate.c     | 26 +++++++++++++-------------
>   trace/control-target.c    |  9 ++++-----
>   5 files changed, 25 insertions(+), 25 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
  2023-04-27  2:09 [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
  2023-04-27  2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
  2023-04-27  2:09 ` [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
@ 2023-04-29  9:28 ` Richard Henderson
  2023-04-29 12:23   ` Philippe Mathieu-Daudé
  2023-05-03  9:44   ` Jamie Iles
  2 siblings, 2 replies; 10+ messages in thread
From: Richard Henderson @ 2023-04-29  9:28 UTC (permalink / raw)
  To: Jamie Iles, qemu-devel; +Cc: pbonzini, philmd, peter.maydell

On 4/27/23 03:09, Jamie Iles wrote:
> From: Jamie Iles <jiles@qti.qualcomm.com>
> 
> The round-robin scheduler will iterate over the CPU list with an
> assigned budget until the next timer expiry and may exit early because
> of a TB exit.  This is fine under normal operation but with icount
> enabled and SMP it is possible for a CPU to be starved of run time and
> the system live-locks.
> 
> For example, booting a riscv64 platform with '-icount
> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> has timers enabled and starts performing TLB shootdowns.  In this case
> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> 1.  As we enter the TCG loop, we assign the icount budget to next timer
> interrupt to CPU 0 and begin executing where the guest is sat in a busy
> loop exhausting all of the budget before we try to execute CPU 1 which
> is the target of the IPI but CPU 1 is left with no budget with which to
> execute and the process repeats.
> 
> We try here to add some fairness by splitting the budget across all of
> the CPUs on the thread fairly before entering each one.  The CPU count
> is cached on CPU list generation ID to avoid iterating the list on each
> loop iteration.  With this change it is possible to boot an SMP rv64
> guest with icount enabled and no hangs.
> 
> New in v3 (address feedback from Richard Henderson):
>   - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>     appropriate
>   - Move rr_cpu_count() call to be conditional on icount_enabled()
>   - Initialize cpu_budget to 0
> 
> Jamie Iles (2):
>    cpu: expose qemu_cpu_list_lock for lock-guard use
>    accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

It appears as if one of these two patches causes a failure in replay, e.g.

   https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162

Would you have a look, please?


r~


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
  2023-04-29  9:28 ` [PATCH v3 0/2] " Richard Henderson
@ 2023-04-29 12:23   ` Philippe Mathieu-Daudé
  2023-05-03  9:44   ` Jamie Iles
  1 sibling, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-04-29 12:23 UTC (permalink / raw)
  To: Richard Henderson, Jamie Iles, qemu-devel; +Cc: pbonzini, peter.maydell

On 29/4/23 11:28, Richard Henderson wrote:
> On 4/27/23 03:09, Jamie Iles wrote:
>> From: Jamie Iles <jiles@qti.qualcomm.com>
>>
>> The round-robin scheduler will iterate over the CPU list with an
>> assigned budget until the next timer expiry and may exit early because
>> of a TB exit.  This is fine under normal operation but with icount
>> enabled and SMP it is possible for a CPU to be starved of run time and
>> the system live-locks.
>>
>> For example, booting a riscv64 platform with '-icount
>> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
>> has timers enabled and starts performing TLB shootdowns.  In this case
>> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
>> 1.  As we enter the TCG loop, we assign the icount budget to next timer
>> interrupt to CPU 0 and begin executing where the guest is sat in a busy
>> loop exhausting all of the budget before we try to execute CPU 1 which
>> is the target of the IPI but CPU 1 is left with no budget with which to
>> execute and the process repeats.
>>
>> We try here to add some fairness by splitting the budget across all of
>> the CPUs on the thread fairly before entering each one.  The CPU count
>> is cached on CPU list generation ID to avoid iterating the list on each
>> loop iteration.  With this change it is possible to boot an SMP rv64
>> guest with icount enabled and no hangs.
>>
>> New in v3 (address feedback from Richard Henderson):
>>   - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>>     appropriate
>>   - Move rr_cpu_count() call to be conditional on icount_enabled()
>>   - Initialize cpu_budget to 0
>>
>> Jamie Iles (2):
>>    cpu: expose qemu_cpu_list_lock for lock-guard use
>>    accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
> 
> It appears as if one of these two patches causes a failure in replay, e.g.
> 
>    https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
> 
> Would you have a look, please?

cpu count only going forward in rr_cpu_count()?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
  2023-04-29  9:28 ` [PATCH v3 0/2] " Richard Henderson
  2023-04-29 12:23   ` Philippe Mathieu-Daudé
@ 2023-05-03  9:44   ` Jamie Iles
  2023-05-10 15:01     ` Richard Henderson
  1 sibling, 1 reply; 10+ messages in thread
From: Jamie Iles @ 2023-05-03  9:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jamie Iles, qemu-devel, pbonzini, philmd, peter.maydell

Hi Richard,

On Sat, Apr 29, 2023 at 10:28:03AM +0100, Richard Henderson wrote:
> On 4/27/23 03:09, Jamie Iles wrote:
> > From: Jamie Iles <jiles@qti.qualcomm.com>
> > 
> > The round-robin scheduler will iterate over the CPU list with an
> > assigned budget until the next timer expiry and may exit early because
> > of a TB exit.  This is fine under normal operation but with icount
> > enabled and SMP it is possible for a CPU to be starved of run time and
> > the system live-locks.
> > 
> > For example, booting a riscv64 platform with '-icount
> > shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> > has timers enabled and starts performing TLB shootdowns.  In this case
> > we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> > 1.  As we enter the TCG loop, we assign the icount budget to next timer
> > interrupt to CPU 0 and begin executing where the guest is sat in a busy
> > loop exhausting all of the budget before we try to execute CPU 1 which
> > is the target of the IPI but CPU 1 is left with no budget with which to
> > execute and the process repeats.
> > 
> > We try here to add some fairness by splitting the budget across all of
> > the CPUs on the thread fairly before entering each one.  The CPU count
> > is cached on CPU list generation ID to avoid iterating the list on each
> > loop iteration.  With this change it is possible to boot an SMP rv64
> > guest with icount enabled and no hangs.
> > 
> > New in v3 (address feedback from Richard Henderson):
> >   - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
> >     appropriate
> >   - Move rr_cpu_count() call to be conditional on icount_enabled()
> >   - Initialize cpu_budget to 0
> > 
> > Jamie Iles (2):
> >    cpu: expose qemu_cpu_list_lock for lock-guard use
> >    accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
> 
> It appears as if one of these two patches causes a failure in replay, e.g.
> 
>   https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
> 
> Would you have a look, please?

I was out on vacation and it looks like this job got cleaned up, but was 
this a mutex check failing for the replay mutex and/or iothread mutex?  
If so then the following patch fixes it for me which I can include in a 
v4

Thanks,

Jamie


diff --git i/accel/tcg/tcg-accel-ops-icount.c w/accel/tcg/tcg-accel-ops-icount.c
index e1e8afaf2f99..3d2cfbbc9778 100644
--- i/accel/tcg/tcg-accel-ops-icount.c
+++ w/accel/tcg/tcg-accel-ops-icount.c
@@ -114,13 +114,13 @@ void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget)
     g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
     g_assert(cpu->icount_extra == 0);
 
+    replay_mutex_lock();
+
     cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
     insns_left = MIN(0xffff, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
     cpu->icount_extra = cpu->icount_budget - insns_left;
 
-    replay_mutex_lock();
-
     if (cpu->icount_budget == 0) {
         /*
          * We're called without the iothread lock, so must take it while
diff --git i/replay/replay.c w/replay/replay.c
index c39156c52221..0f7d766efe81 100644
--- i/replay/replay.c
+++ w/replay/replay.c
@@ -74,7 +74,7 @@ uint64_t replay_get_current_icount(void)
 int replay_get_instructions(void)
 {
     int res = 0;
-    replay_mutex_lock();
+    g_assert(replay_mutex_locked());
     if (replay_next_event_is(EVENT_INSTRUCTION)) {
         res = replay_state.instruction_count;
         if (replay_break_icount != -1LL) {
@@ -85,7 +85,6 @@ int replay_get_instructions(void)
             }
         }
     }
-    replay_mutex_unlock();
     return res;
 }
 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
  2023-05-03  9:44   ` Jamie Iles
@ 2023-05-10 15:01     ` Richard Henderson
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2023-05-10 15:01 UTC (permalink / raw)
  To: Jamie Iles; +Cc: qemu-devel, pbonzini, philmd, peter.maydell

On 5/3/23 10:44, Jamie Iles wrote:
> Hi Richard,
> 
> On Sat, Apr 29, 2023 at 10:28:03AM +0100, Richard Henderson wrote:
>> On 4/27/23 03:09, Jamie Iles wrote:
>>> From: Jamie Iles <jiles@qti.qualcomm.com>
>>>
>>> The round-robin scheduler will iterate over the CPU list with an
>>> assigned budget until the next timer expiry and may exit early because
>>> of a TB exit.  This is fine under normal operation but with icount
>>> enabled and SMP it is possible for a CPU to be starved of run time and
>>> the system live-locks.
>>>
>>> For example, booting a riscv64 platform with '-icount
>>> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
>>> has timers enabled and starts performing TLB shootdowns.  In this case
>>> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
>>> 1.  As we enter the TCG loop, we assign the icount budget to next timer
>>> interrupt to CPU 0 and begin executing where the guest is sat in a busy
>>> loop exhausting all of the budget before we try to execute CPU 1 which
>>> is the target of the IPI but CPU 1 is left with no budget with which to
>>> execute and the process repeats.
>>>
>>> We try here to add some fairness by splitting the budget across all of
>>> the CPUs on the thread fairly before entering each one.  The CPU count
>>> is cached on CPU list generation ID to avoid iterating the list on each
>>> loop iteration.  With this change it is possible to boot an SMP rv64
>>> guest with icount enabled and no hangs.
>>>
>>> New in v3 (address feedback from Richard Henderson):
>>>    - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>>>      appropriate
>>>    - Move rr_cpu_count() call to be conditional on icount_enabled()
>>>    - Initialize cpu_budget to 0
>>>
>>> Jamie Iles (2):
>>>     cpu: expose qemu_cpu_list_lock for lock-guard use
>>>     accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
>>
>> It appears as if one of these two patches causes a failure in replay, e.g.
>>
>>    https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
>>
>> Would you have a look, please?
> 
> I was out on vacation and it looks like this job got cleaned up, but was
> this a mutex check failing for the replay mutex and/or iothread mutex?
> If so then the following patch fixes it for me which I can include in a
> v4

That was it.  I'll squash this fix and re-queue.


r~

> 
> Thanks,
> 
> Jamie
> 
> 
> diff --git i/accel/tcg/tcg-accel-ops-icount.c w/accel/tcg/tcg-accel-ops-icount.c
> index e1e8afaf2f99..3d2cfbbc9778 100644
> --- i/accel/tcg/tcg-accel-ops-icount.c
> +++ w/accel/tcg/tcg-accel-ops-icount.c
> @@ -114,13 +114,13 @@ void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget)
>       g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
>       g_assert(cpu->icount_extra == 0);
>   
> +    replay_mutex_lock();
> +
>       cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
>       insns_left = MIN(0xffff, cpu->icount_budget);
>       cpu_neg(cpu)->icount_decr.u16.low = insns_left;
>       cpu->icount_extra = cpu->icount_budget - insns_left;
>   
> -    replay_mutex_lock();
> -
>       if (cpu->icount_budget == 0) {
>           /*
>            * We're called without the iothread lock, so must take it while
> diff --git i/replay/replay.c w/replay/replay.c
> index c39156c52221..0f7d766efe81 100644
> --- i/replay/replay.c
> +++ w/replay/replay.c
> @@ -74,7 +74,7 @@ uint64_t replay_get_current_icount(void)
>   int replay_get_instructions(void)
>   {
>       int res = 0;
> -    replay_mutex_lock();
> +    g_assert(replay_mutex_locked());
>       if (replay_next_event_is(EVENT_INSTRUCTION)) {
>           res = replay_state.instruction_count;
>           if (replay_break_icount != -1LL) {
> @@ -85,7 +85,6 @@ int replay_get_instructions(void)
>               }
>           }
>       }
> -    replay_mutex_unlock();
>       return res;
>   }
>   



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-10 15:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27  2:09 [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
2023-04-27  2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
2023-04-27  7:44   ` Richard Henderson
2023-04-28 23:05   ` Philippe Mathieu-Daudé
2023-04-27  2:09 ` [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
2023-04-27  7:46   ` Richard Henderson
2023-04-29  9:28 ` [PATCH v3 0/2] " Richard Henderson
2023-04-29 12:23   ` Philippe Mathieu-Daudé
2023-05-03  9:44   ` Jamie Iles
2023-05-10 15:01     ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).