From: "Alex Bennée" <alex.bennee@linaro.org>
To: Pavel Dovgalyuk <dovgaluk@ispras.ru>
Cc: peter.maydell@linaro.org, rth@twiddle.net, pbonzini@redhat.com,
qemu-devel@nongnu.org, mttcg@listserver.greensocs.com,
fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
cota@braap.org, bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9
Date: Fri, 31 Mar 2017 12:21:13 +0100 [thread overview]
Message-ID: <87fuht4rpy.fsf@linaro.org> (raw)
In-Reply-To: <000901d2a9ff$89fcdad0$9df69070$@ru>
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]
>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-
>> request@listserver.greensocs.com]
>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-
>> >> request@listserver.greensocs.com]
>> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>> <snip>
>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.
>> >> >>
>> >> >> Thanks for that. I now have a test case that I can reproduce failures on
>> >> >> without needing graphics.
>> >> >>
>> >> >> I've been investigating if there are any problems with the timer
>> >> >> processing now they have been moved into the TCG thread. The record
>> >> >> stage seems to work fine but I'm having difficulty figuring out why
>> >> >> playback freezes. It seems we get to a point where we are stuck waiting
>> >> >> for a suspiciously exact timer deadline:
>> >> >
>> >> > I've encountered the following behavior at replay stage:
>> >> > - replay takes some instructions to execute (qemu_icount += counter)
>> >> > - virtual timer is fired
>> >>
>> >> This is the virtual timer based on icount not the virtual rt timer? So
>> >
>> > Regular virtual timer. It's value is based on icount.
>> > virtual_rt is used for internal icount purposes.
>>
>> And this is where the clock warps come in? The bias brings the
>> virtual_rt time forward because execution is waiting for some external
>> event to fire (e.g. a timer IRQ)?
>
> I guess so. But bias is not updated when the vCPU works.
> vCPU thread updates only qemu_icount which is used for virtual clock calculation.
>
>> >> under the new scheme of being processed in the vCPU loop we should only
>> >> fire this one once all execution is done (although you may exit the loop
>> >> early before icount is exhausted).
>> >
>> > We should stop the vCPU before processing virtual timers because:
>> > - virtual clock depends on instruction counter
>> > - virtual timers may change virtual hardware state
>>
>> If we do the processing in the top of main vCPU loop this should be
>> equivalent. The instruction counter cannot run because we haven't
>> entered tcg_exec_cpu. We also process virtual timers in this thread
>> outside the loop so nothing else can be poking the hardware state.
>
> This is how qemu worked in older version - it switched between
> processing tasks (vCPU and timers) in one thread.
The only real difference is the sequencing in the old case was protected
by the BQL - now its my program order.
> But how we'll join this behavior with the current design and MTTCG?
Ignore MTTCG for now. We don't even try and run multiple-threads when
icount is enabled. However the change in the BQL locking is what has
triggered the failures.
Anyway I think I'm getting closer to narrowing it down. On record I see
replay_current_step jump backwards with this:
/* A common event print, called when reading or saving an event */
static void print_event(uint8_t event)
{
static int event_count;
static uint64_t last_step;
uint64_t this_step = replay_get_current_step();
fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
event_count, event, this_step);
if (this_step < last_step) {
fprintf(stderr,"%s: !!! step %d went backwards %#"PRIx64"=>%#"PRIx64"!!!\n",
__func__, event_count, last_step, this_step);
abort();
}
last_step = this_step;
event_count++;
}
void replay_put_event(uint8_t event)
{
assert(event < EVENT_COUNT);
replay_put_byte(event);
print_event(event);
}
The jump back is fairly consistent in my case (event 66 in the vexpress
run) but I'm fairly sure that is the bug. I can't see any reason for the
step count to go backwards - indeed that breaks the premise of .
Unfortunately when I start putting break and watchpoints in to see how
this jump back occurs the problem goes away until I disable them. So
this very much looks like a race condition corrupting the icount data.
>> >> > - replay puts back unexecuted instructions (qemu_icount -= counter)
>> >> >
>> >> > But virtual timer cannot take in account non-executed instructions (counter) and
>> >> > therefore it reads only qemu_icount, which provides incorrect time.
>> >> >
>> >> >> But the timers are all enabled:
>> >> >>
>> >> >> (gdb) qemu timers
>> >> >> Processing Realtime timers
>> >> >> clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808
>> >> >> Processing Virtual timers
>> >> >> clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808
>> >> >> timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)
>> >> >> timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)
>> >> >> Processing Host timers
>> >> >> clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000
>> >> >> Processing Virtual RT timers
>> >> >> clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808
>> >> >
>> >> > Timers are processed only at checkpoints recorded in the log.
>> >> > When replay is stuck, probably there is a pending checkpoint in the log
>> >> > and pending instructions in CPU (because iothread breaks its
>> >> > synchronization).
>> >>
>> >> How does this work for instructions that sleep (like WFI)? The WFI will
>> >> cause us to exit the loop but I assume we need a "real" timer to trigger
>> >> a firing of whatever the WFI is waiting for as time according to icount
>> >> isn't advancing.
>> >
>> > This is virtual_rt for. It advances even when vCPU is stopped.
>>
>> So I'm not seeing this happen:
>>
>> process_icount_data: icount not increased 117302332749=>117302332749 RT:-1
>>
>> I wonder if the advance of virtual_rt has been deadlocked? The -1
>> suggests its not even setup.
>
> qemu_start_warp_timer says that it is setup only when there are active virtual timers.
>
>
>> > By the way, I made small description of the replay log format
>> > to be later included into docs/replay.txt:
>> >
>> Thanks. That's really useful.
>>
>> Do you think it would be worth including the dumper script I wrote in
>> ./scripts/? I mainly wrote it to aid my understanding of the code flow
>> but it might be useful in future investigations?
>
> Usually I get human-readable logs in record and replay modes to find
> divergence between them, but I think your script might be useful in some cases.
Cool. I'll include it in the icount series once I have something to post
(fingers crossed I'm getting close).
--
Alex Bennée
next prev parent reply other threads:[~2017-03-31 11:21 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-07 15:50 [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9 Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 01/11] vl/cpus: be smarter with icount and MTTCG Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 02/11] target/i386/cpu.h: declare TCG_GUEST_DEFAULT_MO Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 03/11] cpus.c: add additional error_report when !TARGET_SUPPORT_MTTCG Alex Bennée
2017-03-07 17:48 ` Philippe Mathieu-Daudé
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 04/11] sparc/sparc64: grab BQL before calling cpu_check_irqs Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 05/11] s390x/misc_helper.c: wrap IO instructions in BQL Alex Bennée
2017-03-07 17:46 ` Philippe Mathieu-Daudé
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 06/11] target/xtensa: hold BQL for interrupt processing Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 07/11] translate-all: exit cpu_restore_state early if translating Alex Bennée
2017-03-07 19:20 ` Richard Henderson
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 08/11] target/mips: hold BQL for timer interrupts Alex Bennée
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 09/11] target-i386: defer VMEXIT to do_interrupt Alex Bennée
2017-03-07 19:23 ` Richard Henderson
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 10/11] target/arm/helper: make it clear the EC field is also in hex Alex Bennée
2017-03-07 17:49 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2017-03-07 15:50 ` [Qemu-devel] [PATCH v3 11/11] hw/intc/arm_gic: modernise the DPRINTF Alex Bennée
2017-03-07 17:53 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2017-03-07 20:25 ` [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9 Pranith Kumar
[not found] ` <877f40i5e3.fsf@linaro.org>
2017-03-08 14:20 ` Pranith Kumar
2017-03-13 12:32 ` Pavel Dovgalyuk
2017-03-13 13:16 ` Alex Bennée
2017-03-14 12:15 ` Pavel Dovgalyuk
2017-03-14 15:18 ` Alex Bennée
2017-03-16 8:34 ` Pavel Dovgalyuk
2017-03-16 13:06 ` Alex Bennée
2017-03-16 14:46 ` Pavel Dovgalyuk
2017-03-22 14:17 ` Alex Bennée
2017-03-29 6:06 ` Pavel Dovgalyuk
2017-03-29 9:42 ` Alex Bennée
2017-03-30 11:44 ` Pavel Dovgalyuk
2017-03-30 12:42 ` Alex Bennée
2017-03-31 9:16 ` Pavel Dovgalyuk
2017-03-31 10:16 ` Paolo Bonzini
2017-03-31 11:21 ` Alex Bennée [this message]
2017-03-31 11:31 ` Paolo Bonzini
2017-03-31 19:49 ` Alex Bennée
2017-03-31 13:14 ` Alex Bennée
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fuht4rpy.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=a.rigo@virtualopensystems.com \
--cc=bobby.prani@gmail.com \
--cc=cota@braap.org \
--cc=dovgaluk@ispras.ru \
--cc=fred.konrad@greensocs.com \
--cc=mttcg@listserver.greensocs.com \
--cc=nikunj@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).