From: "Alex Bennée" <alex.bennee@linaro.org>
To: Pavel Dovgalyuk <dovgaluk@ispras.ru>
Cc: peter.maydell@linaro.org, rth@twiddle.net, pbonzini@redhat.com,
qemu-devel@nongnu.org, mttcg@listserver.greensocs.com,
fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
cota@braap.org, bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9
Date: Fri, 31 Mar 2017 12:21:13 +0100 [thread overview]
Message-ID: <87fuht4rpy.fsf@linaro.org> (raw)
In-Reply-To: <000901d2a9ff$89fcdad0$9df69070$@ru>
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]
>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-
>> request@listserver.greensocs.com]
>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-
>> >> request@listserver.greensocs.com]
>> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> <snip>
>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.
>> >> >>
>> >> >> Thanks for that. I now have a test case that I can reproduce failures on
>> >> >> without needing graphics.
>> >> >>
>> >> >> I've been investigating if there are any problems with the timer
>> >> >> processing now they have been moved into the TCG thread. The record
>> >> >> stage seems to work fine but I'm having difficulty figuring out why
>> >> >> playback freezes. It seems we get to a point where we are stuck waiting
>> >> >> for a suspiciously exact timer deadline:
>> >> >
>> >> > I've encountered the following behavior at replay stage:
>> >> > - replay takes some instructions to execute (qemu_icount += counter)
>> >> > - virtual timer is fired
>> >>
>> >> This is the virtual timer based on icount not the virtual rt timer? So
>> >
>> > Regular virtual timer. It's value is based on icount.
>> > virtual_rt is used for internal icount purposes.
>>
>> And this is where the clock warps come in? The bias brings the
>> virtual_rt time forward because execution is waiting for some external
>> event to fire (e.g. a timer IRQ)?
>
> I guess so. But bias is not updated when the vCPU works.
> vCPU thread updates only qemu_icount which is used for virtual clock calculation.
>
>> >> under the new scheme of being processed in the vCPU loop we should only
>> >> fire this one once all execution is done (although you may exit the loop
>> >> early before icount is exhausted).
>> >
>> > We should stop the vCPU before processing virtual timers because:
>> > - virtual clock depends on instruction counter
>> > - virtual timers may change virtual hardware state
>>
>> If we do the processing in the top of main vCPU loop this should be
>> equivalent. The instruction counter cannot run because we haven't
>> entered tcg_exec_cpu. We also process virtual timers in this thread
>> outside the loop so nothing else can be poking the hardware state.
>
> This is how qemu worked in older version - it switched between
> processing tasks (vCPU and timers) in one thread.
The only real difference is the sequencing in the old case was protected
by the BQL - now its my program order.
> But how we'll join this behavior with the current design and MTTCG?
Ignore MTTCG for now. We don't even try and run multiple-threads when
icount is enabled. However the change in the BQL locking is what has
triggered the failures.
Anyway I think I'm getting closer to narrowing it down. On record I see
replay_current_step jump backwards with this:
/* A common event print, called when reading or saving an event */
static void print_event(uint8_t event)
{
static int event_count;
static uint64_t last_step;
uint64_t this_step = replay_get_current_step();
fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
event_count, event, this_step);
if (this_step < last_step) {
fprintf(stderr,"%s: !!! step %d went backwards %#"PRIx64"=>%#"PRIx64"!!!\n",
__func__, event_count, last_step, this_step);
abort();
}
last_step = this_step;
event_count++;
}
void replay_put_event(uint8_t event)
{
assert(event < EVENT_COUNT);
replay_put_byte(event);
print_event(event);
}
The jump back is fairly consistent in my case (event 66 in the vexpress
run) but I'm fairly sure that is the bug. I can't see any reason for the
step count to go backwards - indeed that breaks the premise of .
Unfortunately when I start putting break and watchpoints in to see how
this jump back occurs the problem goes away until I disable them. So
this very much looks like a race condition corrupting the icount data.
>> >> > - replay puts back unexecuted instructions (qemu_icount -= counter)
>> >> >
>> >> > But virtual timer cannot take in account non-executed instructions (counter) and
>> >> > therefore it reads only qemu_icount, which provides incorrect time.
>> >> >
>> >> >> But the timers are all enabled:
>> >> >>
>> >> >> (gdb) qemu timers
>> >> >> Processing Realtime timers
>> >> >> clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808
>> >> >> Processing Virtual timers
>> >> >> clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808
>> >> >> timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)
>> >> >> timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)
>> >> >> Processing Host timers
>> >> >> clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000
>> >> >> Processing Virtual RT timers
>> >> >> clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808
>> >> >
>> >> > Timers are processed only at checkpoints recorded in the log.
>> >> > When replay is stuck, probably there is a pending checkpoint in the log
>> >> > and pending instructions in CPU (because iothread breaks its
>> >> > synchronization).
>> >>
>> >> How does this work for instructions that sleep (like WFI)? The WFI will
>> >> cause us to exit the loop but I assume we need a "real" timer to trigger
>> >> a firing of whatever the WFI is waiting for as time according to icount
>> >> isn't advancing.
>> >
>> > This is virtual_rt for. It advances even when vCPU is stopped.
>>
>> So I'm not seeing this happen:
>>
>> process_icount_data: icount not increased 117302332749=>117302332749 RT:-1
>>
>> I wonder if the advance of virtual_rt has been deadlocked? The -1
>> suggests its not even setup.
>
> qemu_start_warp_timer says that it is setup only when there are active virtual timers.
>
>
>> > By the way, I made small description of the replay log format
>> > to be later included into docs/replay.txt:
>> >
>> Thanks. That's really useful.
>>
>> Do you think it would be worth including the dumper script I wrote in
>> ./scripts/? I mainly wrote it to aid my understanding of the code flow
>> but it might be useful in future investigations?
>
> Usually I get human-readable logs in record and replay modes to find
> divergence between them, but I think your script might be useful in some cases.
Cool. I'll include it in the icount series once I have something to post
(fingers crossed I'm getting close).
--
Alex Bennée
next prev parent reply other threads:[~2017-03-31 11:21 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-07 15:50 [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9 Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 01/11] vl/cpus: be smarter with icount and MTTCG Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 02/11] target/i386/cpu.h: declare TCG_GUEST_DEFAULT_MO Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 03/11] cpus.c: add additional error_report when !TARGET_SUPPORT_MTTCG Alex Bennée
2017-03-07 17:48 ` Philippe Mathieu-Daudé
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 04/11] sparc/sparc64: grab BQL before calling cpu_check_irqs Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 05/11] s390x/misc_helper.c: wrap IO instructions in BQL Alex Bennée
2017-03-07 17:46 ` Philippe Mathieu-Daudé
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 06/11] target/xtensa: hold BQL for interrupt processing Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 07/11] translate-all: exit cpu_restore_state early if translating Alex Bennée
2017-03-07 19:20 ` Richard Henderson
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 08/11] target/mips: hold BQL for timer interrupts Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 09/11] target-i386: defer VMEXIT to do_interrupt Alex Bennée
2017-03-07 19:23 ` Richard Henderson
2017-03-07 15:50 ` [PATCH v3 10/11] target/arm/helper: make it clear the EC field is also in hex Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] " Alex Bennée
2017-03-07 17:49 ` [Qemu-arm] " Philippe Mathieu-Daudé
2017-03-07 17:49 ` [Qemu-devel] " Philippe Mathieu-Daudé
2017-03-07 15:50 ` [PATCH v3 11/11] hw/intc/arm_gic: modernise the DPRINTF Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] " Alex Bennée
2017-03-07 17:53 ` [Qemu-arm] " Philippe Mathieu-Daudé
2017-03-07 17:53 ` [Qemu-devel] " Philippe Mathieu-Daudé
2017-03-07 20:25 ` [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9 Pranith Kumar
[not found] ` <877f40i5e3.fsf@linaro.org>
2017-03-08 14:20 ` Pranith Kumar
2017-03-13 12:32 ` Pavel Dovgalyuk
2017-03-13 13:16 ` Alex Bennée
2017-03-14 12:15 ` Pavel Dovgalyuk
2017-03-14 15:18 ` Alex Bennée
2017-03-16 8:34 ` Pavel Dovgalyuk
2017-03-16 13:06 ` Alex Bennée
2017-03-16 14:46 ` Pavel Dovgalyuk
2017-03-22 14:17 ` Alex Bennée
2017-03-29 6:06 ` Pavel Dovgalyuk
2017-03-29 9:42 ` Alex Bennée
2017-03-30 11:44 ` Pavel Dovgalyuk
2017-03-30 12:42 ` Alex Bennée
2017-03-31 9:16 ` Pavel Dovgalyuk
2017-03-31 10:16 ` Paolo Bonzini
2017-03-31 11:21 ` Alex Bennée [this message]
2017-03-31 11:31 ` Paolo Bonzini
2017-03-31 19:49 ` Alex Bennée
2017-03-31 13:14 ` Alex Bennée
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fuht4rpy.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=a.rigo@virtualopensystems.com \
--cc=bobby.prani@gmail.com \
--cc=cota@braap.org \
--cc=dovgaluk@ispras.ru \
--cc=fred.konrad@greensocs.com \
--cc=mttcg@listserver.greensocs.com \
--cc=nikunj@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.