From: "Alex Bennée" <alex.bennee@linaro.org>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>,
qemu-devel@nongnu.org, qemu-s390x@nongnu.org
Subject: Re: [RFC PATCH] tests/functional/s390x: Add reverse debugging test for s390x
Date: Mon, 01 Dec 2025 12:43:10 +0000 [thread overview]
Message-ID: <87a502v0dd.fsf@draig.linaro.org> (raw)
In-Reply-To: <f5a9796601bb90f754be75b9366149aafa2a9bb0.camel@linux.ibm.com> (Ilya Leoshkevich's message of "Mon, 01 Dec 2025 12:17:13 +0100")
Ilya Leoshkevich <iii@linux.ibm.com> writes:
> On Mon, 2025-12-01 at 10:36 +0000, Alex Bennée wrote:
>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>
>> > On Sun, 2025-11-30 at 20:03 +0100, Ilya Leoshkevich wrote:
>> > > On Sun, 2025-11-30 at 19:32 +0100, Ilya Leoshkevich wrote:
>> > > > On Sun, 2025-11-30 at 16:47 +0000, Alex Bennée wrote:
>> > > > > Ilya Leoshkevich <iii@linux.ibm.com> writes:
>> > > > >
>> > > > > > On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote:
>> > > > > > > On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote:
>> > > > > > > > From: Thomas Huth <thuth@redhat.com>
<snip>
>> > > > > The the async_run_on_cpu is called from the vcpu thread in
>> > > > > response
>> > > > > to a
>> > > > > deterministic event at a known point in time it should be
>> > > > > fine.
>> > > > > If
>> > > > > it
>> > > > > came from another thread that is not synchronised via
>> > > > > replay_lock
>> > > > > then
>> > > > > things will go wrong.
>> > > > >
>> > > > > But this is a VM load save helper?
>> > > >
>> > > > Yes, and it's called from the main thread. Either during
>> > > > initialization, or as a reaction to GDB packets.
>> > > >
>> > > > Here is the call stack:
>> > > >
>> > > > qemu_loadvm_state()
>> > > > qemu_loadvm_state_main()
>> > > > qemu_loadvm_section_start_full()
>> > > > vmstate_load()
>> > > > vmstate_load_state()
>> > > > cpu_post_load()
>> > > > tcg_s390_tod_updated()
>> > > > update_ckc_timer()
>> > > > timer_mod()
>> > > > s390_tod_load()
>> > > > qemu_s390_tod_set() # via tdc->set()
>> > > > async_run_on_cpu(tcg_s390_tod_updated)
>> > > >
>> > > > So you think we may have to take the replay lock around
>> > > > load_snapshot()? So that all async_run_on_cpu() calls it makes
>> > > > end
>> > > > up
>> > > > being handled by the vCPU thread deterministically.
<snip>
>> >
>> > I believe now I at least understand what the race is about:
>> >
>> > - cpu_post_load() fires the TOD timer immediately.
>> >
>> > - s390_tod_load() schedules work for firing the TOD timer.
>>
>> Is this a duplicate of work then? Could we just rely on one or the
>> other? If you drop the cpu_post_load() tweak then the vmstate load
>> helper should still ensure everything works right?
>
> Getting rid of it fixes the problem and makes sense anyway.
>
>> > - If rr loop sees work and then timer, we get one timer callback.
>> >
>> > - If rr loop sees timer and then work, we get two timer callbacks.
>>
>> If the timer is armed we should expect at least to execute a few
>> instructions before triggering the timer, unless it was armed ready
>> expired.
>
> Yes, it is armed expired.
>
> Isn't it a deficiency in record-replay that work and timers are not
> ordered relative to each other? Can't it bite us somewhere else?
They normally should be although I notice:
void icount_handle_deadline(void)
{
assert(qemu_in_vcpu_thread());
int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
QEMU_TIMER_ATTR_ALL);
/*
* Instructions, interrupts, and exceptions are processed in cpu-exec.
* Don't interrupt cpu thread, when these events are waiting
* (i.e., there is no checkpoint)
*/
if (deadline == 0) {
icount_notify_aio_contexts();
}
}
should run the pre-expired timers before we exec the current TB. But the
comment suggests it is not expecting any checkpoint related activity. I
wonder if we can assert that is the case to catch future issues.
>> > - Record and replay may diverge due to this race.
>> >
>> > - In this particular case divergence makes rr loop spin: it sees
>> > that
>> > TOD timer has expired, but cannot invoke its callback, because
>> > there
>> > is no recorded CHECKPOINT_CLOCK_VIRTUAL.
>> >
>> > - The order in which rr loop sees work and timer depends on whether
>> > and when rr loop wakes up during load_snapshot().
>> >
>> > - rr loop may wake up after the main thread kicks the CPU and drops
>> > the BQL, which may happen if it calls, e.g.,
>> > qemu_cond_wait_bql().
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
prev parent reply other threads:[~2025-12-01 12:44 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-28 13:39 [RFC PATCH] tests/functional/s390x: Add reverse debugging test for s390x Thomas Huth
2025-11-28 17:25 ` Ilya Leoshkevich
2025-11-29 21:33 ` Ilya Leoshkevich
2025-11-30 16:47 ` Alex Bennée
2025-11-30 18:32 ` Ilya Leoshkevich
2025-11-30 19:03 ` Ilya Leoshkevich
2025-11-30 22:59 ` Ilya Leoshkevich
2025-12-01 10:36 ` Alex Bennée
2025-12-01 11:17 ` Ilya Leoshkevich
2025-12-01 11:57 ` Ilya Leoshkevich
2025-12-01 12:43 ` Alex Bennée [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a502v0dd.fsf@draig.linaro.org \
--to=alex.bennee@linaro.org \
--cc=iii@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).