All of lore.kernel.org
 help / color / mirror / Atom feed
* Record/replay thread determinism
@ 2026-02-20 14:25 Jim MacArthur
  2026-02-20 15:06 ` Alex Bennée
  0 siblings, 1 reply; 3+ messages in thread
From: Jim MacArthur @ 2026-02-20 14:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, richard.henderson

It looks like we have a solution to the RCU patch which was causing problems with the func-alpha-replay test (see 20260217-alpha-v1-0-0dcc708c9db3@rsg.ci.i.u-tokyo.ac.jp).
While this was going on I spent a bit of time investigating repeatability in record/replay and I think there may be broader problems with record & replay.

While running the func-alpha-replay test we have two threads reading or writing the replay event log; the "main" thread running qemu_main_loop and the "RR" (round robin) thread running rr_cpu_thread_fn. Both of these use replay_mutex_lock() and bql_lock() to synchronize some actions. There's a third thread running RCU maintenance which also uses bql_lock(), but not replay_mutex_lock().

replay_mutex_lock() has some extra logic to improve fairness of locking. This means that the first caller of replay_mutex_lock() should obtain the lock first. However, so far as I can see, this doesn't make the scheduling of the Main and RR threads deterministic.
I have observed times when neither of those threads holds the lock, and as such, there's no way to predict which will call replay_mutex_lock() first. This means the ordering of events during either recording or replay is not deterministic.

It is possible to alter the lock function such that the two threads will run in lockstep; see https://gitlab.com/jmacarthur/qemu-jmac-development/-/commits/jmac/replay-tick-tock for a rough demonstration. Adding this significantly reduced timeouts on func-alpha-replay; I can also see that the replay recordings are much more consistent from one recording to the next; typically diverging around the 380000th event, rather than the 20th event without this hack.
This is not a good fix since it slows QEMU down significantly and may be prone to deadlocks, but I think this demonstrates that the current system is not perfect.

Do you agree with my analysis above? Is there something I've missed which is meant to deterministically schedule these two threads?

Jim MacArthur


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Record/replay thread determinism
  2026-02-20 14:25 Record/replay thread determinism Jim MacArthur
@ 2026-02-20 15:06 ` Alex Bennée
  2026-02-27 14:02   ` Jim MacArthur
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Bennée @ 2026-02-20 15:06 UTC (permalink / raw)
  To: Jim MacArthur; +Cc: qemu-devel, richard.henderson

Jim MacArthur <jim.macarthur@linaro.org> writes:

> It looks like we have a solution to the RCU patch which was causing problems with the func-alpha-replay test (see 20260217-alpha-v1-0-0dcc708c9db3@rsg.ci.i.u-tokyo.ac.jp).
> While this was going on I spent a bit of time investigating repeatability in record/replay and I think there may be broader problems with record & replay.
>
> While running the func-alpha-replay test we have two threads reading
> or writing the replay event log; the "main" thread running
> qemu_main_loop and the "RR" (round robin) thread running
> rr_cpu_thread_fn. Both of these use replay_mutex_lock() and bql_lock()
> to synchronize some actions. There's a third thread running RCU
> maintenance which also uses bql_lock(), but not replay_mutex_lock().
>
> replay_mutex_lock() has some extra logic to improve fairness of
> locking. This means that the first caller of replay_mutex_lock()
> should obtain the lock first. However, so far as I can see, this
> doesn't make the scheduling of the Main and RR threads deterministic.
> I have observed times when neither of those threads holds the lock,
> and as such, there's no way to predict which will call
> replay_mutex_lock() first. This means the ordering of events during
> either recording or replay is not deterministic.

The replay_lock was a kludge we added when we did the original
transition to multi-threaded TCG which involved nailing down the BQL
calls that had previously kept everything in sync.

However if we could keep all replay events in the single RR thread we
could get rid of replay lock because everything should behave serially.

> It is possible to alter the lock function such that the two threads
> will run in lockstep; see
> https://gitlab.com/jmacarthur/qemu-jmac-development/-/commits/jmac/replay-tick-tock
> for a rough demonstration. Adding this significantly reduced timeouts
> on func-alpha-replay; I can also see that the replay recordings are
> much more consistent from one recording to the next; typically
> diverging around the 380000th event, rather than the 20th event
> without this hack.
> This is not a good fix since it slows QEMU down significantly and may be prone to deadlocks, but I think this demonstrates that the current system is not perfect.
>
> Do you agree with my analysis above? Is there something I've missed which is meant to deterministically schedule these two threads?
>
> Jim MacArthur

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Record/replay thread determinism
  2026-02-20 15:06 ` Alex Bennée
@ 2026-02-27 14:02   ` Jim MacArthur
  0 siblings, 0 replies; 3+ messages in thread
From: Jim MacArthur @ 2026-02-27 14:02 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, richard.henderson

On Fri, Feb 20, 2026 at 03:06:20PM +0000, Alex Bennée wrote:
> Jim MacArthur <jim.macarthur@linaro.org> writes:
> 
> > It looks like we have a solution to the RCU patch which was causing problems with the func-alpha-replay test (see 20260217-alpha-v1-0-0dcc708c9db3@rsg.ci.i.u-tokyo.ac.jp).
> > While this was going on I spent a bit of time investigating repeatability in record/replay and I think there may be broader problems with record & replay.
> >
> > While running the func-alpha-replay test we have two threads reading
> > or writing the replay event log; the "main" thread running
> > qemu_main_loop and the "RR" (round robin) thread running
> > rr_cpu_thread_fn. Both of these use replay_mutex_lock() and bql_lock()
> > to synchronize some actions. There's a third thread running RCU
> > maintenance which also uses bql_lock(), but not replay_mutex_lock().
> >
> > replay_mutex_lock() has some extra logic to improve fairness of
> > locking. This means that the first caller of replay_mutex_lock()
> > should obtain the lock first. However, so far as I can see, this
> > doesn't make the scheduling of the Main and RR threads deterministic.
> > I have observed times when neither of those threads holds the lock,
> > and as such, there's no way to predict which will call
> > replay_mutex_lock() first. This means the ordering of events during
> > either recording or replay is not deterministic.
> 
> The replay_lock was a kludge we added when we did the original
> transition to multi-threaded TCG which involved nailing down the BQL
> calls that had previously kept everything in sync.
> 
> However if we could keep all replay events in the single RR thread we
> could get rid of replay lock because everything should behave serially.

With these modifications:
  * Move qemu_clock_run_all_timers into the RR thread
  * Disable calling qemu_soonest_timeout in the main thread

... the number of record/replay events generated by the main thread falls drastically, and also the remaining events generated by the main thread are always at the very start and end of the log, so should not affect the ordering much.

Rather than removing calls to qemu_soonest_timeout altogether, another option is to modify its calls to qemu_clock_get_ns such that they do not record the clock times in the log. Since these functions only affect how long main waits while polling FDs, I would *guess* that they do not need to be recorded.

I have no idea how safe these modifications are yet, only that they remove the occasional errors we used to see while running the func-replay-alpha test.

Jim




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-02-27 14:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 14:25 Record/replay thread determinism Jim MacArthur
2026-02-20 15:06 ` Alex Bennée
2026-02-27 14:02   ` Jim MacArthur

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.