All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>,
	 qemu-devel@nongnu.org, qemu-s390x@nongnu.org
Subject: Re: [RFC PATCH] tests/functional/s390x: Add reverse debugging test for s390x
Date: Mon, 01 Dec 2025 12:43:10 +0000	[thread overview]
Message-ID: <87a502v0dd.fsf@draig.linaro.org> (raw)
In-Reply-To: <f5a9796601bb90f754be75b9366149aafa2a9bb0.camel@linux.ibm.com> (Ilya Leoshkevich's message of "Mon, 01 Dec 2025 12:17:13 +0100")

Ilya Leoshkevich <iii@linux.ibm.com> writes:

> On Mon, 2025-12-01 at 10:36 +0000, Alex Bennée wrote:
>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>> 
>> > On Sun, 2025-11-30 at 20:03 +0100, Ilya Leoshkevich wrote:
>> > > On Sun, 2025-11-30 at 19:32 +0100, Ilya Leoshkevich wrote:
>> > > > On Sun, 2025-11-30 at 16:47 +0000, Alex Bennée wrote:
>> > > > > Ilya Leoshkevich <iii@linux.ibm.com> writes:
>> > > > > 
>> > > > > > On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote:
>> > > > > > > On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote:
>> > > > > > > > From: Thomas Huth <thuth@redhat.com>
<snip>
>> > > > > The the async_run_on_cpu is called from the vcpu thread in
>> > > > > response
>> > > > > to a
>> > > > > deterministic event at a known point in time it should be
>> > > > > fine.
>> > > > > If
>> > > > > it
>> > > > > came from another thread that is not synchronised via
>> > > > > replay_lock
>> > > > > then
>> > > > > things will go wrong.
>> > > > > 
>> > > > > But this is a VM load save helper?
>> > > > 
>> > > > Yes, and it's called from the main thread. Either during
>> > > > initialization, or as a reaction to GDB packets.
>> > > > 
>> > > > Here is the call stack:
>> > > > 
>> > > >   qemu_loadvm_state()
>> > > >     qemu_loadvm_state_main()
>> > > >       qemu_loadvm_section_start_full()
>> > > >         vmstate_load()
>> > > >           vmstate_load_state()
>> > > >             cpu_post_load()
>> > > >               tcg_s390_tod_updated()
>> > > >                 update_ckc_timer()
>> > > >                   timer_mod()
>> > > >           s390_tod_load()
>> > > >             qemu_s390_tod_set()  # via tdc->set()
>> > > >               async_run_on_cpu(tcg_s390_tod_updated)
>> > > > 
>> > > > So you think we may have to take the replay lock around
>> > > > load_snapshot()? So that all async_run_on_cpu() calls it makes
>> > > > end
>> > > > up
>> > > > being handled by the vCPU thread deterministically.
<snip>
>> > 
>> > I believe now I at least understand what the race is about:
>> > 
>> > - cpu_post_load() fires the TOD timer immediately.
>> > 
>> > - s390_tod_load() schedules work for firing the TOD timer.
>> 
>> Is this a duplicate of work then? Could we just rely on one or the
>> other? If you drop the cpu_post_load() tweak then the vmstate load
>> helper should still ensure everything works right?
>
> Getting rid of it fixes the problem and makes sense anyway.
>
>> > - If rr loop sees work and then timer, we get one timer callback.
>> > 
>> > - If rr loop sees timer and then work, we get two timer callbacks.
>> 
>> If the timer is armed we should expect at least to execute a few
>> instructions before triggering the timer, unless it was armed ready
>> expired.
>
> Yes, it is armed expired.
>
> Isn't it a deficiency in record-replay that work and timers are not
> ordered relative to each other? Can't it bite us somewhere else?

They normally should be although I notice:

  void icount_handle_deadline(void)
  {
      assert(qemu_in_vcpu_thread());
      int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
                                                    QEMU_TIMER_ATTR_ALL);

      /*
       * Instructions, interrupts, and exceptions are processed in cpu-exec.
       * Don't interrupt cpu thread, when these events are waiting
       * (i.e., there is no checkpoint)
       */
      if (deadline == 0) {
          icount_notify_aio_contexts();
      }
  }

should run the pre-expired timers before we exec the current TB. But the
comment suggests it is not expecting any checkpoint related activity. I
wonder if we can assert that is the case to catch future issues.

>> > - Record and replay may diverge due to this race.
>> > 
>> > - In this particular case divergence makes rr loop spin: it sees
>> > that
>> >   TOD timer has expired, but cannot invoke its callback, because
>> > there
>> >   is no recorded CHECKPOINT_CLOCK_VIRTUAL.
>> > 
>> > - The order in which rr loop sees work and timer depends on whether
>> >   and when rr loop wakes up during load_snapshot().
>> > 
>> > - rr loop may wake up after the main thread kicks the CPU and drops
>> >   the BQL, which may happen if it calls, e.g.,
>> > qemu_cond_wait_bql().

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


      parent reply	other threads:[~2025-12-01 12:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28 13:39 [RFC PATCH] tests/functional/s390x: Add reverse debugging test for s390x Thomas Huth
2025-11-28 17:25 ` Ilya Leoshkevich
2025-11-29 21:33   ` Ilya Leoshkevich
2025-11-30 16:47     ` Alex Bennée
2025-11-30 18:32       ` Ilya Leoshkevich
2025-11-30 19:03         ` Ilya Leoshkevich
2025-11-30 22:59           ` Ilya Leoshkevich
2025-12-01 10:36             ` Alex Bennée
2025-12-01 11:17               ` Ilya Leoshkevich
2025-12-01 11:57                 ` Ilya Leoshkevich
2025-12-01 12:43                 ` Alex Bennée [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a502v0dd.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=iii@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.