From: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
To: Nicholas Piggin <npiggin@gmail.com>,
Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
Richard Henderson <richard.henderson@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [RFC v2 PATCH] record-replay: support SMP target machine
Date: Tue, 22 Aug 2023 07:44:43 +0300 [thread overview]
Message-ID: <48d32058-dd1b-a2ed-42f2-e21c099bc0e3@ispras.ru> (raw)
In-Reply-To: <20230811014700.39172-1-npiggin@gmail.com>
On 11.08.2023 04:47, Nicholas Piggin wrote:
> RR CPU switching is driven by timers and events so it is deterministic
> like everything else. Record a CPU switch event and use that to drive
> the CPU switch on replay.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> This is still in RFC phase because so far I've only really testd ppc
> pseries, and only with patches that are not yet upstream (but posted
> to list).
>
> It works with smp 2, can step, reverse-step, reverse-continue, etc.
> throughout a Linux boot.
I still didn't have time to test it, but here are some comments.
>
> One issue is reverse-step on one gdb thread (vCPU) only steps back one
> icount, so if another thread is currently running then it is that one
> which goes back one instruction and the selected thread doesn't move. I
> would call this a separate issue from the record-replay mechanism, which
> is in the replay-debugging policy. I think we could record in each vCPU
> an icount of the last instruction it executed before switching, then
> reverse step for that vCPU could replay to there. I think that's not so
> important yet until this mechanism is solid. But if you test and rsi is
> not going backwards, then check your other threads.
>
> Thanks,
> Nick
>
>
> accel/tcg/tcg-accel-ops-icount.c | 9 +++-
> accel/tcg/tcg-accel-ops-rr.c | 73 +++++++++++++++++++++++++++++---
> include/exec/replay-core.h | 3 ++
> replay/replay-internal.h | 1 +
> replay/replay.c | 34 ++++++++++++++-
> scripts/replay-dump.py | 5 +++
> softmmu/vl.c | 4 --
> 7 files changed, 115 insertions(+), 14 deletions(-)
>
> diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
> index 3d2cfbbc97..c26782a56a 100644
> --- a/accel/tcg/tcg-accel-ops-icount.c
> +++ b/accel/tcg/tcg-accel-ops-icount.c
> @@ -93,10 +93,15 @@ void icount_handle_deadline(void)
> int64_t icount_percpu_budget(int cpu_count)
> {
> int64_t limit = icount_get_limit();
> - int64_t timeslice = limit / cpu_count;
> + int64_t timeslice;
>
> - if (timeslice == 0) {
> + if (replay_mode == REPLAY_MODE_PLAY) {
> timeslice = limit;
> + } else {
> + timeslice = limit / cpu_count;
> + if (timeslice == 0) {
> + timeslice = limit;
> + }
> }
>
> return timeslice;
> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> index 212d6f8df4..ce040a687e 100644
> --- a/accel/tcg/tcg-accel-ops-rr.c
> +++ b/accel/tcg/tcg-accel-ops-rr.c
> @@ -27,6 +27,7 @@
> #include "qemu/lockable.h"
> #include "sysemu/tcg.h"
> #include "sysemu/replay.h"
> +#include "sysemu/reset.h"
> #include "sysemu/cpu-timers.h"
> #include "qemu/main-loop.h"
> #include "qemu/notify.h"
> @@ -61,6 +62,22 @@ void rr_kick_vcpu_thread(CPUState *unused)
>
> static QEMUTimer *rr_kick_vcpu_timer;
> static CPUState *rr_current_cpu;
> +static CPUState *rr_next_cpu;
> +static CPUState *rr_last_cpu;
> +
> +/*
> + * Reset the vCPU scheduler to the initial state.
> + */
> +static void record_replay_reset(void *param)
> +{
> + if (rr_kick_vcpu_timer) {
> + timer_del(rr_kick_vcpu_timer);
> + }
> + g_assert(!rr_current_cpu);
> + rr_next_cpu = NULL;
> + rr_last_cpu = first_cpu;
> + current_cpu = NULL;
> +}
>
> static inline int64_t rr_next_kick_time(void)
> {
> @@ -184,6 +201,8 @@ static void *rr_cpu_thread_fn(void *arg)
> Notifier force_rcu;
> CPUState *cpu = arg;
>
> + qemu_register_reset(record_replay_reset, NULL);
> +
> assert(tcg_enabled());
> rcu_register_thread();
> force_rcu.notify = rr_force_rcu;
> @@ -238,14 +257,20 @@ static void *rr_cpu_thread_fn(void *arg)
> cpu_budget = icount_percpu_budget(cpu_count);
> }
>
> + if (!rr_next_cpu) {
> + qatomic_set_mb(&rr_next_cpu, first_cpu);
> + }
> + cpu = rr_next_cpu;
> +
> + if (cpu != rr_last_cpu) {
> + replay_switch_cpu();
> + qatomic_set_mb(&rr_last_cpu, cpu);
> + }
> +
> rr_start_kick_timer();
>
> replay_mutex_unlock();
>
> - if (!cpu) {
> - cpu = first_cpu;
> - }
> -
> while (cpu && cpu_work_list_empty(cpu) && !cpu->exit_request) {
> /* Store rr_current_cpu before evaluating cpu_can_run(). */
> qatomic_set_mb(&rr_current_cpu, cpu);
> @@ -284,7 +309,34 @@ static void *rr_cpu_thread_fn(void *arg)
> break;
> }
>
> - cpu = CPU_NEXT(cpu);
> + if (replay_mode == REPLAY_MODE_NONE) {
> + cpu = CPU_NEXT(cpu);
> + } else if (replay_mode == REPLAY_MODE_RECORD) {
> + /*
> + * Exit the loop immediately so CPU switch events can be
> + * recorded. This may be able to be improved to record
> + * switch events here.
> + */
> + cpu = CPU_NEXT(cpu);
> + break;
> + } else if (replay_mode == REPLAY_MODE_PLAY) {
> + /*
> + * Play can exit from tcg_cpus_exec at different times than
> + * record, because icount budget is set to next non-insn
> + * event which could be an exception or something that
> + * tcg_cpus_exec can process internally if icount limit was
> + * not reached. So we don't necessarily switch CPU here,
> + * only if there is a switch in the trace.
> + */
> + qemu_mutex_unlock_iothread();
> + replay_mutex_lock();
> + qemu_mutex_lock_iothread();
> + if (replay_has_switch_cpu()) {
> + cpu = CPU_NEXT(cpu);
> + }
> + replay_mutex_unlock();
> + break;
> + }
> } /* while (cpu && !cpu->exit_request).. */
>
> /* Does not need a memory barrier because a spurious wakeup is okay. */
> @@ -294,9 +346,9 @@ static void *rr_cpu_thread_fn(void *arg)
> qatomic_set_mb(&cpu->exit_request, 0);
> }
>
> - if (all_cpu_threads_idle()) {
> - rr_stop_kick_timer();
> + qatomic_set(&rr_next_cpu, cpu);
This does not seem to be in the mainline.
>
> + if (all_cpu_threads_idle()) {
> if (icount_enabled()) {
> /*
> * When all cpus are sleeping (e.g in WFI), to avoid a deadlock
> @@ -304,6 +356,13 @@ static void *rr_cpu_thread_fn(void *arg)
> * timer.
> */
> qemu_notify_event();
> + } else {
> + /*
> + * icount timer won't expire when not executing instructions
> + * so no need to stop it. Avoiding restarts is also important
> + * for record-replay to avoid clock events.
> + */
> + rr_stop_kick_timer();
> }
> }
>
> diff --git a/include/exec/replay-core.h b/include/exec/replay-core.h
> index 244c77acce..543c129a1d 100644
> --- a/include/exec/replay-core.h
> +++ b/include/exec/replay-core.h
> @@ -52,6 +52,9 @@ void replay_gdb_attached(void);
>
> /* Interrupts and exceptions */
>
> +bool replay_switch_cpu(void);
> +bool replay_has_switch_cpu(void);
> +
> /* Called by exception handler to write or read exception processing events */
> bool replay_exception(void);
> /*
> diff --git a/replay/replay-internal.h b/replay/replay-internal.h
> index b6836354ac..95849e7461 100644
> --- a/replay/replay-internal.h
> +++ b/replay/replay-internal.h
> @@ -58,6 +58,7 @@ enum ReplayEvents {
> /* some of greater codes are reserved for checkpoints */
> EVENT_CHECKPOINT,
> EVENT_CHECKPOINT_LAST = EVENT_CHECKPOINT + CHECKPOINT_COUNT - 1,
> + EVENT_SWITCH_CPU,
> /* end of log event */
> EVENT_END,
> EVENT_COUNT
> diff --git a/replay/replay.c b/replay/replay.c
> index 0f7d766efe..e2d77ab644 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -98,9 +98,41 @@ void replay_account_executed_instructions(void)
> }
> }
>
> -bool replay_exception(void)
> +bool replay_switch_cpu(void)
> +{
> + if (replay_mode == REPLAY_MODE_RECORD) {
> + g_assert(replay_mutex_locked());
> + replay_save_instructions();
> + replay_put_event(EVENT_SWITCH_CPU);
> + return true;
> + } else if (replay_mode == REPLAY_MODE_PLAY) {
> + bool res = replay_has_switch_cpu();
> + if (res) {
> + replay_finish_event();
> + } else {
> + g_assert_not_reached();
> + }
> + return res;
> + }
> +
> + return true;
> +}
> +
> +bool replay_has_switch_cpu(void)
Is this function really needed?
> {
> + bool res = false;
> + if (replay_mode == REPLAY_MODE_PLAY) {
> + g_assert(replay_mutex_locked());
> + replay_account_executed_instructions();
> + res = replay_next_event_is(EVENT_SWITCH_CPU);
> + }
> +
> + return res;
> +}
>
> +
> +bool replay_exception(void)
> +{
> if (replay_mode == REPLAY_MODE_RECORD) {
> g_assert(replay_mutex_locked());
> replay_save_instructions();
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index b0b96f67fa..946b22d1de 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -1869,10 +1869,6 @@ static void qemu_apply_machine_options(QDict *qdict)
> /* fall back to the -kernel/-append */
> semihosting_arg_fallback(current_machine->kernel_filename, current_machine->kernel_cmdline);
> }
> -
> - if (current_machine->smp.cpus > 1) {
> - replay_add_blocker("smp");
> - }
> }
>
> static void qemu_create_early_backends(void)
next prev parent reply other threads:[~2023-08-22 4:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-11 1:47 [RFC v2 PATCH] record-replay: support SMP target machine Nicholas Piggin
2023-08-22 4:44 ` Pavel Dovgalyuk [this message]
2023-08-25 8:22 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48d32058-dd1b-a2ed-42f2-e21c099bc0e3@ispras.ru \
--to=pavel.dovgalyuk@ispras.ru \
--cc=npiggin@gmail.com \
--cc=pavel.dovgaluk@ispras.ru \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).