From: "Alex Bennée" <alex.bennee@linaro.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 5/5] icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread
Date: Mon, 13 Mar 2017 18:15:32 +0000 [thread overview]
Message-ID: <87y3w9oxff.fsf@linaro.org> (raw)
In-Reply-To: <834de521-e5cc-8857-afeb-8734037f95f3@redhat.com>
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 13/03/2017 17:53, Alex Bennée wrote:
>>
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> icount has become much slower after tcg_cpu_exec has stopped
>>> using the BQL. There is also a latent bug that is masked by
>>> the slowness.
>>>
>>> The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
>>> timer now has to wake up the I/O thread and wait for it. The rendez-vous
>>> is mediated by the BQL QemuMutex:
>>>
>>> - handle_icount_deadline wakes up the I/O thread with BQL taken
>>> - the I/O thread wakes up and waits on the BQL
>>> - the VCPU thread releases the BQL a little later
>>> - the I/O thread raises an interrupt, which calls qemu_cpu_kick
>>> - the VCPU thread notices the interrupt, takes the BQL to
>>> process it and waits on it
>>>
>>> All this back and forth is extremely expensive, causing a 6 to 8-fold
>>> slowdown when icount is turned on.
>>>
>>> One may think that the issue is that the VCPU thread is too dependent
>>> on the BQL, but then the latent bug comes in. I first tried removing
>>> the BQL completely from the x86 cpu_exec, only to see everything break.
>>> The only way to fix it (and make everything slow again) was to add a dummy
>>> BQL lock/unlock pair.
>>>
>>> This is because in -icount mode you really have to process the events
>>> before the CPU restarts executing the next instruction. Therefore, this
>>> series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
>>> the vCPU thread when running in icount mode.
>>>
>>> The required changes include:
>>>
>>> - make the timer notification callback wake up TCG's single vCPU thread
>>> when run from another thread. By using async_run_on_cpu, the callback
>>> can override all_cpu_threads_idle() when the CPU is halted.
>>>
>>> - move handle_icount_deadline after qemu_tcg_wait_io_event, so that
>>> the timer notification callback is invoked after the dummy work item
>>> wakes up the vCPU thread
>>>
>>> - make handle_icount_deadline run the timers instead of just waking the
>>> I/O thread.
>>>
>>> - stop processing the timers in the main loop
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>> cpus.c | 26 +++++++++++++++++++++++---
>>> include/qemu/timer.h | 24 ++++++++++++++++++++++++
>>> util/qemu-timer.c | 4 +++-
>>> 3 files changed, 50 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index 747addd..209c196 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -804,9 +804,23 @@ static void qemu_cpu_kick_rr_cpu(void)
>>> } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
>>> }
>>>
>>> +static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
>>> +{
>>> +}
>>> +
>>> void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
>>> {
>>> - qemu_notify_event();
>>> + if (!use_icount || type != QEMU_CLOCK_VIRTUAL) {
>>> + qemu_notify_event();
>>> + return;
>>> + }
>>> +
>>> + if (!qemu_in_vcpu_thread() && first_cpu) {
>>> + /* run_on_cpu will also kick the CPU out of halt so that
>>> + * handle_icount_deadline runs
>>> + */
>>> + async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
>>> + }
>>
>> This doesn't read quite right - otherwise you get the same effect by
>> calling qemu_cpu_kick(), or even modify and call qemu_cpu_kick_rr_cpu()?
>>
>> The only real effect of having something in the work queue is you run
>> the outside of the loop before returning into the tcg_cpu_exec.
>
> Yes, and "the outside of the loop" here means handle_icount_deadline.
>
> But qemu_cpu_kick_rr_cpu won't signal condition variables, and
> qemu_cpu_kick does but it won't make cpu_thread_is_idle return false.
> Therefore, qemu_tcg_wait_io_event keeps looping.
How about:
Scheduling work with async_run_on_cpu ensures we exit
qemu_tcg_wait_io_event if we have halted that
handle_icount_deadline can run.
>
>>> }
>>>
>>> static void kick_tcg_thread(void *opaque)
>>> @@ -1220,12 +1234,15 @@ static int64_t tcg_get_icount_limit(void)
>>>
>>> static void handle_icount_deadline(void)
>>> {
>>> + assert(qemu_in_vcpu_thread());
>>> if (use_icount) {
>>> int64_t deadline =
>>> qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>>>
>>> if (deadline == 0) {
>>> + /* Wake up other AioContexts. */
>>> qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>>> + qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
>>> }
>>> }
>>> }
>>> @@ -1338,6 +1355,11 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>>> /* Account partial waits to QEMU_CLOCK_VIRTUAL. */
>>> qemu_account_warp_timer();
>>>
>>> + /* Run the timers here. This is much more efficient than
>>> + * waking up the I/O thread and waiting for completion.
>>> + */
>>> + handle_icount_deadline();
>>> +
>>> if (!cpu) {
>>> cpu = first_cpu;
>>> }
>>> @@ -1379,8 +1401,6 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
>>> atomic_mb_set(&cpu->exit_request, 0);
>>> }
>>>
>>> - handle_icount_deadline();
>>> -
>>
>> I guess we could just as easily move the handling to after
>> qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus))?
>>
>> I noticed we still call handle_icount_deadline in the multi-thread case
>> which is probably superfluous unless/until we figure out a way for it to
>> work with MTTCG.
>
> Should I remove the call? Add an assert(!use_icount)?
Yes pretty much that as an independent patch.
>
>>> qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus));
>>> deal_with_unplugged_cpus();
>>> }
>>> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
>>> index 1441b42..e1742f2 100644
>>> --- a/include/qemu/timer.h
>>> +++ b/include/qemu/timer.h
>>> @@ -533,6 +533,12 @@ static inline QEMUTimer *timer_new_tl(QEMUTimerList *timer_list,
>>> * Create a new timer and associate it with the default
>>> * timer list for the clock type @type.
>>> *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread. This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>> * Returns: a pointer to the timer
>>> */
>>> static inline QEMUTimer *timer_new(QEMUClockType type, int scale,
>>> @@ -550,6 +556,12 @@ static inline QEMUTimer *timer_new(QEMUClockType type, int scale,
>>> * Create a new timer with nanosecond scale on the default timer list
>>> * associated with the clock.
>>> *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread. This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>> * Returns: a pointer to the newly created timer
>>> */
>>> static inline QEMUTimer *timer_new_ns(QEMUClockType type, QEMUTimerCB *cb,
>>> @@ -564,6 +576,12 @@ static inline QEMUTimer *timer_new_ns(QEMUClockType type, QEMUTimerCB *cb,
>>> * @cb: the callback to call when the timer expires
>>> * @opaque: the opaque pointer to pass to the callback
>>> *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread. This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>> * Create a new timer with microsecond scale on the default timer list
>>> * associated with the clock.
>>> *
>>> @@ -581,6 +599,12 @@ static inline QEMUTimer *timer_new_us(QEMUClockType type, QEMUTimerCB *cb,
>>> * @cb: the callback to call when the timer expires
>>> * @opaque: the opaque pointer to pass to the callback
>>> *
>>> + * The default timer list has one special feature: in icount mode,
>>> + * %QEMU_CLOCK_VIRTUAL timers are run in the vCPU thread. This is
>>> + * not true of other timer lists, which are typically associated
>>> + * with an AioContext---each of them runs its timer callbacks in its own
>>> + * AioContext thread.
>>> + *
>>> * Create a new timer with millisecond scale on the default timer list
>>> * associated with the clock.
>>> *
>>> diff --git a/util/qemu-timer.c b/util/qemu-timer.c
>>> index dc3181e..82d5650 100644
>>> --- a/util/qemu-timer.c
>>> +++ b/util/qemu-timer.c
>>> @@ -658,7 +658,9 @@ bool qemu_clock_run_all_timers(void)
>>> QEMUClockType type;
>>>
>>> for (type = 0; type < QEMU_CLOCK_MAX; type++) {
>>> - progress |= qemu_clock_run_timers(type);
>>> + if (qemu_clock_use_for_deadline(type)) {
>>> + progress |= qemu_clock_run_timers(type);
>>> + }
>>
>> minor nit: its not really qemu_clock_run_all_timers() now is it. We run
>> all but the virtual timers (if icount is enabled).
>
> Well yeah, it's all those that pass qemu_clock_use_for_deadline.
>
> Paolo
Have you done any testing with record/replay? So far I have one
reproducible run and one failure. However it is not entirely clear to me
how I am meant to cleanly halt and stop a machine so I don't corrupt the
replay log.
--
Alex Bennée
next prev parent reply other threads:[~2017-03-13 18:15 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-03 13:11 [Qemu-devel] [PATCH 0/6] tcg: fix icount super slowdown Paolo Bonzini
2017-03-03 13:11 ` [Qemu-devel] [PATCH 1/5] qemu-timer: fix off-by-one Paolo Bonzini
2017-03-03 13:48 ` Edgar E. Iglesias
2017-03-10 9:46 ` Alex Bennée
2017-03-03 13:11 ` [Qemu-devel] [PATCH 2/5] qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h Paolo Bonzini
2017-03-03 13:48 ` Edgar E. Iglesias
2017-03-03 14:50 ` Alex Bennée
2017-03-03 14:55 ` Paolo Bonzini
2017-03-10 7:42 ` [Qemu-devel] [PATCH] fixup! " Alex Bennée
2017-03-10 8:27 ` Peter Maydell
2017-03-10 9:47 ` [Qemu-devel] [PATCH 2/5] " Alex Bennée
2017-03-03 13:11 ` [Qemu-devel] [PATCH 3/5] cpus: define QEMUTimerListNotifyCB for QEMU system emulation Paolo Bonzini
2017-03-03 13:53 ` Edgar E. Iglesias
2017-03-03 13:11 ` [Qemu-devel] [PATCH 4/5] main-loop: remove now unnecessary optimization Paolo Bonzini
2017-03-03 13:53 ` Edgar E. Iglesias
2017-03-13 16:23 ` Alex Bennée
2017-03-03 13:11 ` [Qemu-devel] [PATCH 5/5] icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread Paolo Bonzini
2017-03-13 16:53 ` Alex Bennée
2017-03-13 17:16 ` Paolo Bonzini
2017-03-13 18:15 ` Alex Bennée [this message]
2017-03-14 10:05 ` Paolo Bonzini
2017-03-14 12:57 ` Paolo Bonzini
2017-03-14 15:43 ` Alex Bennée
2017-03-14 16:23 ` Paolo Bonzini
2017-03-09 17:19 ` [Qemu-devel] [PATCH 0/6] tcg: fix icount super slowdown Alex Bennée
2017-03-09 17:22 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y3w9oxff.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).