Re: Lockup with --accel tcg,thread=single

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Doug Gale <doug16k@gmail.com>, qemu-devel <qemu-devel@nongnu.org>
Subject: Re: Lockup with --accel tcg,thread=single
Date: Mon, 30 Sep 2019 17:38:15 +0100	[thread overview]
Message-ID: <87h84tloy0.fsf@linaro.org> (raw)
In-Reply-To: <CAFEAcA9nz9S4R+O9fwa0k38dB3r1smguG4bQRzwm1s0zJCvfDA@mail.gmail.com>


Peter Maydell <peter.maydell@linaro.org> writes:

> On Mon, 30 Sep 2019 at 14:17, Doug Gale <doug16k@gmail.com> wrote:
>>
>> I found a lockup in single threaded TCG, with OVMF bios, 16 CPUs.
>>
>> qemu-system-x86_64 --accel tcg,thread=single -smp cpus=16 -bios
>> /usr/share/ovmf/OVMF.fd
>>
>> Using Ubuntu 18.04 LTS, default gnome desktop. There is no guest OS,
>> there is no hard drive, just the OVMF firmware locks it up. (I
>> narrowed it down to this from a much larger repro)
>
>> Peter Maydell helped me bisect it in IRC.
>>  Works fine at commit 1e8a98b53867f61
>>  Fails at commit 9458a9a1df1a4c7
>>
>> MTTCG works fine all the way up to master.
>
> Thanks for this bug report. I've reproduced it and think
> I have figured out what is going on here.
>
> Commit 9458a9a1df1a4c719e245 is Paolo's commit that fixes the
> TCG-vs-dirty-bitmap race by having the thread which is
> doing a memory access wait for the vCPU thread to finish
> its current TB using a no-op run_on_cpu() operation.
>
> In the case of the hang the thread doing the memory access
> is the iothread, like this:
>
> #14 0x000055c150c0a98c in run_on_cpu (cpu=0x55c153801c60,
> func=0x55c150bbb542 <do_nothing>, data=...)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/cpus.c:1205
> #15 0x000055c150bbb58c in tcg_log_global_after_sync
> (listener=0x55c1538410c8) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/exec.c:2963
> #16 0x000055c150c1fe18 in memory_global_after_dirty_log_sync () at
> /home/petmay01/linaro/qemu-from-laptop/qemu/memory.c:2598
> #17 0x000055c150c1e6b8 in memory_region_snapshot_and_clear_dirty
> (mr=0x55c1541e82b0, addr=0, size=1920000, client=0)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/memory.c:2106
> #18 0x000055c150c76c05 in vga_draw_graphic (s=0x55c1541e82a0, full_update=0)
>     at /home/petmay01/linaro/qemu-from-laptop/qemu/hw/display/vga.c:1661
> #19 0x000055c150c771c4 in vga_update_display (opaque=0x55c1541e82a0)
> at /home/petmay01/linaro/qemu-from-laptop/qemu/hw/display/vga.c:1785
> #20 0x000055c151052a83 in graphic_hw_update (con=0x55c1536dfaa0) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/ui/console.c:268
> #21 0x000055c151091490 in gd_refresh (dcl=0x55c1549af090) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/ui/gtk.c:484
> #22 0x000055c151056571 in dpy_refresh (s=0x55c1542f9d90) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/ui/console.c:1629
> #23 0x000055c1510527f0 in gui_update (opaque=0x55c1542f9d90) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/ui/console.c:206
> #24 0x000055c1511ee67c in timerlist_run_timers
> (timer_list=0x55c15370c280) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/util/qemu-timer.c:592
> #25 0x000055c1511ee726 in qemu_clock_run_timers
> (type=QEMU_CLOCK_REALTIME) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/util/qemu-timer.c:606
> #26 0x000055c1511ee9e5 in qemu_clock_run_all_timers () at
> /home/petmay01/linaro/qemu-from-laptop/qemu/util/qemu-timer.c:692
> #27 0x000055c1511ef181 in main_loop_wait (nonblocking=0) at
> /home/petmay01/linaro/qemu-from-laptop/qemu/util/main-loop.c:524
> #28 0x000055c150db03fe in main_loop () at
> /home/petmay01/linaro/qemu-from-laptop/qemu/vl.c:1806
>
> run_on_cpu() adds the do_nothing function to a cpu queued-work list.
>
> However, the single-threaded TCG runloop qemu_tcg_rr_cpu_thread_fn()
> has the basic structure:
>
>    while (1) {
>        for (each vCPU) {
>            run this vCPU for a timeslice;
>        }
>        qemu_tcg_rr_wait_io_event();
>    }
>
> and it processes cpu work queues only in qemu_tcg_rr_wait_io_event().
> This is a problem, because the thing which causes us to stop running
> one vCPU when its timeslice ends and move on to the next is the
> tcg_kick_vcpu_timer -- and the iothread will never process that timer
> and kick the vcpu because it is currently blocked in run_on_cpu() !
>
> Not sure currently what the best fix is here.

We seem to be repeating ourselves because:

  a8efa60633575a2ee4dbf807a71cb44d44b0e0f8
  Author:     Paolo Bonzini <pbonzini@redhat.com>
  AuthorDate: Wed Nov 14 12:36:57 2018 +0100
  cpus: run work items for all vCPUs if single-threaded

However looking at the commit I can still see we have the problem of not
advancing to the next vCPU if the kick timer (or some other event)
doesn't bring the execution to an exit. I suspect you could get this in
Linux but it's probably sufficiently busy to ensure vCPUs are always
exiting for some reason or another.

So options I can think of so far are:

1. Kick 'em all when not inter-vCPU

Something like this untested patch...

--8<---------------cut here---------------start------------->8---
1 file changed, 17 insertions(+), 5 deletions(-)
cpus.c | 22 +++++++++++++++++-----

modified   cpus.c
@@ -949,8 +949,8 @@ static inline int64_t qemu_tcg_next_kick(void)
     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
 }

-/* Kick the currently round-robin scheduled vCPU */
-static void qemu_cpu_kick_rr_cpu(void)
+/* Kick the currently round-robin scheduled vCPU to next */
+static void qemu_cpu_kick_rr_next_cpu(void)
 {
     CPUState *cpu;
     do {
@@ -961,6 +961,16 @@ static void qemu_cpu_kick_rr_cpu(void)
     } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
 }

+/* Kick all RR vCPUs */
+static void qemu_cpu_kick_rr_cpus(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        cpu_exit(cpu);
+    };
+}
+
 static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
 {
 }
@@ -993,7 +1003,7 @@ void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
 static void kick_tcg_thread(void *opaque)
 {
     timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-    qemu_cpu_kick_rr_cpu();
+    qemu_cpu_kick_rr_next_cpu();
 }

 static void start_tcg_kick_timer(void)
@@ -1828,9 +1838,11 @@ void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
+        if (qemu_tcg_mttcg_enabled()) {
         cpu_exit(cpu);
-        /* NOP unless doing single-thread RR */
-        qemu_cpu_kick_rr_cpu();
+        } else {
+            qemu_cpu_kick_rr_cpus();
+        }
     } else {
         if (hax_enabled()) {
             /*
--8<---------------cut here---------------end--------------->8---

2. Add handling of kicking all VCPUs to do_run_on_cpu when current_cpu
== NULL

Which would basically kick all vCPUs when events come from outside the
emulation.

3. Figure out multi-threaded icount and record/replay and drop the
special RR case.

This might take a while.

> Side note -- this use of run_on_cpu() means that we now drop
> the iothread lock within memory_region_snapshot_and_clear_dirty(),
> which we didn't before. This means that a vCPU thread can now
> get in and execute an access to the device registers of
> hw/display/vga.c, updating its state in a way I suspect that the
> device model code is not expecting... So maybe the right answer
> here should be to come up with a fix for the race that 9458a9a1
> addresses that doesn't require us to drop the iothread lock in
> memory_region_snapshot_and_clear_dirty() ? Alternatively we need
> to audit the callers and flag in the documentation that this
> function has the unexpected side effect of briefly dropping the
> iothread lock.

There was a series Emilio posted to get rid of more of the BQL in place
of per-CPU locks which IIRC also stopped some of the bouncing we do in
the *_on_cpu functions. Each time we have to do a lock shuffle to get
things moving is a bit of a red flag.

>
> thanks
> -- PMM


--
Alex Bennée

next prev parent reply	other threads:[~2019-09-30 16:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-30 13:15 Lockup with --accel tcg,thread=single Doug Gale
2019-09-30 15:37 ` Peter Maydell
2019-09-30 16:38   ` Alex Bennée [this message]
2019-09-30 17:47   ` Paolo Bonzini
2019-09-30 19:20     ` Alex Bennée
2019-09-30 20:40       ` Paolo Bonzini
2019-10-01  8:42         ` Alex Bennée
2019-10-01  9:16           ` Paolo Bonzini
2019-10-01 15:28             ` Alex Bennée
2019-10-01 12:10     ` Peter Maydell
  -- strict thread matches above, loose matches on Subject: below --
2019-09-30 13:12 Doug Gale

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h84tloy0.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=doug16k@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.