From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: qemu-devel@nongnu.org,
"Pavel Dovgalyuk" <Pavel.Dovgalyuk@ispras.ru>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"John Snow" <jsnow@redhat.com>, "Cleber Rosa" <crosa@redhat.com>,
"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
"Beraldo Leal" <bleal@redhat.com>,
"Michael Tokarev" <mjt@tls.msk.ru>
Subject: Re: [PATCH v4 05/24] Revert "replay: stop us hanging in rr_wait_io_event"
Date: Wed, 13 Mar 2024 00:03:43 +1000 [thread overview]
Message-ID: <CZRTTCV604Z3.3MT5MR0P5FWQK@wheely> (raw)
In-Reply-To: <87v85ro9qt.fsf@draig.linaro.org>
On Tue Mar 12, 2024 at 11:33 PM AEST, Alex Bennée wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2.
> >
> > That commit causes reverse_debugging.py test failures, and does
> > not seem to solve the root cause of the problem x86-64 still
> > hangs in record/replay tests.
>
> I'm still finding the reverse debugging tests failing with this series.
:(
In gitlab CI or your own testing? What are you running exactly?
I found 440fx must have some bug becaues it's quite bad, but
q35 was more stable.
>
> > The problem with short-cutting the iowait that was taken during
> > record phase is that related events will not get consumed at the
> > same points (e.g., reading the clock).
> >
> > A hang with zero icount always seems to be a symptom of an earlier
> > problem that has caused the recording to become out of synch with
> > the execution and consumption of events by replay.
>
> Would it be possible to still detect the failure mode rather than a full
> revert?
I'm not actually sure exactly how this patch causes test failures
(or how it was fixing the ones you saw). If I can reproduce the
ones you see I could look a bit deeper at it.
I have been looking at some ways to try detect/report record/replay
errors more sanely because it can be very hard to debug them at
the moment. No patches quite ready to post yet, but yes if there's
something we could do here to help we should.
Thanks,
Nick
>
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > include/sysemu/replay.h | 5 -----
> > accel/tcg/tcg-accel-ops-rr.c | 2 +-
> > replay/replay.c | 21 ---------------------
> > 3 files changed, 1 insertion(+), 27 deletions(-)
> >
> > diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
> > index f229b2109c..8102fa54f0 100644
> > --- a/include/sysemu/replay.h
> > +++ b/include/sysemu/replay.h
> > @@ -73,11 +73,6 @@ int replay_get_instructions(void);
> > /*! Updates instructions counter in replay mode. */
> > void replay_account_executed_instructions(void);
> >
> > -/**
> > - * replay_can_wait: check if we should pause for wait-io
> > - */
> > -bool replay_can_wait(void);
> > -
> > /* Processing clocks and other time sources */
> >
> > /*! Save the specified clock */
> > diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> > index 894e73e52c..a942442a33 100644
> > --- a/accel/tcg/tcg-accel-ops-rr.c
> > +++ b/accel/tcg/tcg-accel-ops-rr.c
> > @@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
> > {
> > CPUState *cpu;
> >
> > - while (all_cpu_threads_idle() && replay_can_wait()) {
> > + while (all_cpu_threads_idle()) {
> > rr_stop_kick_timer();
> > qemu_cond_wait_bql(first_cpu->halt_cond);
> > }
> > diff --git a/replay/replay.c b/replay/replay.c
> > index b8564a4813..895fa6b67a 100644
> > --- a/replay/replay.c
> > +++ b/replay/replay.c
> > @@ -451,27 +451,6 @@ void replay_start(void)
> > replay_enable_events();
> > }
> >
> > -/*
> > - * For none/record the answer is yes.
> > - */
> > -bool replay_can_wait(void)
> > -{
> > - if (replay_mode == REPLAY_MODE_PLAY) {
> > - /*
> > - * For playback we shouldn't ever be at a point we wait. If
> > - * the instruction count has reached zero and we have an
> > - * unconsumed event we should go around again and consume it.
> > - */
> > - if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
> > - return false;
> > - } else {
> > - replay_sync_error("Playback shouldn't have to iowait");
> > - }
> > - }
> > - return true;
> > -}
> > -
> > -
> > void replay_finish(void)
> > {
> > if (replay_mode == REPLAY_MODE_NONE) {
next prev parent reply other threads:[~2024-03-12 14:04 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-11 17:40 [PATCH v4 00/24] replay: fixes and new test cases Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 02/24] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
2024-03-12 13:25 ` Alex Bennée
2024-03-11 17:40 ` [PATCH v4 04/24] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
2024-03-12 13:26 ` Alex Bennée
2024-03-11 17:40 ` [PATCH v4 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
2024-03-12 13:33 ` Alex Bennée
2024-03-12 14:03 ` Nicholas Piggin [this message]
2024-03-12 21:03 ` Alex Bennée
2024-03-13 5:27 ` Nicholas Piggin
2024-03-14 5:19 ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 06/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
2024-03-12 12:39 ` Marc-André Lureau
2024-03-12 14:11 ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 07/24] replay: Fix migration use of clock Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 08/24] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 09/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 10/24] virtio-net: Use virtual time for RSC timers Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 11/24] net: Use virtual time for net announce Nicholas Piggin
2024-03-12 9:09 ` Pavel Dovgalyuk
2024-03-12 11:05 ` Nicholas Piggin
2024-03-12 11:12 ` Pavel Dovgalyuk
2024-03-13 5:38 ` Nicholas Piggin
2024-03-13 7:09 ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 14/24] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 19/24] tests/avocado: reverse_debugging.py add snapshot testing Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
2024-03-12 9:00 ` Pavel Dovgalyuk
2024-03-12 10:43 ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 22/24] target/ppc: fix timebase register reset state Nicholas Piggin
2024-03-12 13:24 ` Alex Bennée
2024-03-12 13:47 ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 23/24] spapr: Fix vpa dispatch count for record-replay Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 24/24] tests/avocado: replay_linux.py add ppc64 pseries test Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CZRTTCV604Z3.3MT5MR0P5FWQK@wheely \
--to=npiggin@gmail.com \
--cc=Pavel.Dovgalyuk@ispras.ru \
--cc=alex.bennee@linaro.org \
--cc=bleal@redhat.com \
--cc=crosa@redhat.com \
--cc=jsnow@redhat.com \
--cc=mjt@tls.msk.ru \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=wainersm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).