From: Thomas Huth <thuth@redhat.com>
To: "Ilya Leoshkevich" <iii@linux.ibm.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Cc: qemu-s390x@nongnu.org, qemu-devel@nongnu.org,
Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Subject: Re: [PATCH RFC] target/s390x: Fix infinite loop during replay
Date: Thu, 4 Dec 2025 10:37:13 +0100 [thread overview]
Message-ID: <01700ae1-b5ef-43f2-af2b-2eb648c5a147@redhat.com> (raw)
In-Reply-To: <20251201215514.1751994-1-iii@linux.ibm.com>
On 01/12/2025 22.49, Ilya Leoshkevich wrote:
> Hi,
>
> Here is my attempt to fix [1] based on the discussion in [2].
>
> I'm sending this as an RFC, because I have definitely misunderstood a
> thing or two about record-replay, missed some timer bookkeeping
> intricacies, and haven't split arch-dependent and independent parts
> into different patches.
>
> This survives "make check" and "make check-tcg" with the test from [2],
> both with and without extra load in background.
>
> Please let me know what you think about the approach.
>
> Best regards,
> Ilya
>
> [1] https://lore.kernel.org/qemu-devel/a0accce9-6042-4a7b-a7c7-218212818891@redhat.com/
> [2] https://lore.kernel.org/qemu-devel/20251128133949.181828-1-thuth@redhat.com/
>
> ---
>
> Replaying even trivial s390x kernels hangs, because:
>
> - cpu_post_load() fires the TOD timer immediately.
>
> - s390_tod_load() schedules work for firing the TOD timer.
>
> - If rr loop sees work and then timer, we get one timer expiration.
>
> - If rr loop sees timer and then work, we get two timer expirations.
>
> - Record and replay may diverge due to this race.
>
> - In this particular case divergence makes replay loop spin: it sees that
> TOD timer has expired, but cannot invoke its callback, because there
> is no recorded CHECKPOINT_CLOCK_VIRTUAL.
>
> - The order in which rr loop sees work and timer depends on whether
> and when rr loop wakes up during load_snapshot().
>
> - rr loop may wake up after the main thread kicks the CPU and drops
> the BQL, which may happen if it calls, e.g., qemu_cond_wait_bql().
>
> Firing TOD timer twice is duplicate work, but it was introduced
> intentionally in commit 7c12f710bad6 ("s390x/tcg: rearm the CKC timer
> during migration") in order to avoid dependency on migration order.
>
> The key culprits here are timers that are armed ready expired. They
> break the ordering between timers and CPU work, because they are not
> constrained by instruction execution, thus introducing non-determinism
> and record-replay divergence.
>
> Fix by converting such timer callbacks to CPU work. Also add TOD clock
> updates to the save path, mirroring the load path, in order to have the
> same CHECKPOINT_CLOCK_VIRTUAL during recording and replaying.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
> hw/s390x/tod.c | 5 +++++
> stubs/async-run-on-cpu.c | 7 +++++++
> stubs/cpus-queue.c | 4 ++++
> stubs/meson.build | 2 ++
> target/s390x/machine.c | 4 ++++
> util/qemu-timer.c | 30 ++++++++++++++++++++++++++++++
> 6 files changed, 52 insertions(+)
> create mode 100644 stubs/async-run-on-cpu.c
> create mode 100644 stubs/cpus-queue.c
Thanks, this indeed fixes the test for me, so:
Tested-by: Thomas Huth <thuth@redhat.com>
prev parent reply other threads:[~2025-12-04 9:39 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-01 21:49 [PATCH RFC] target/s390x: Fix infinite loop during replay Ilya Leoshkevich
2025-12-04 9:37 ` Thomas Huth [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01700ae1-b5ef-43f2-af2b-2eb648c5a147@redhat.com \
--to=thuth@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=iii@linux.ibm.com \
--cc=pavel.dovgalyuk@ispras.ru \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).