All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Nicholas Piggin <npiggin@gmail.com>, qemu-devel@nongnu.org
Cc: "Nicholas Piggin" <npiggin@gmail.com>,
	qemu-block@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Pavel Dovgalyuk" <pavel.dovgaluk@ispras.ru>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Fam Zheng" <fam@euphon.net>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>,
	"John Snow" <jsnow@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Peter Xu" <peterx@redhat.com>,
	"Dr. David Alan Gilbert" <dave@treblig.org>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Michael Roth" <michael.roth@amd.com>,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>
Subject: Re: [PATCH 02/17] replay: Fix migration replay_mutex locking
Date: Fri, 20 Dec 2024 10:08:10 -0300	[thread overview]
Message-ID: <87r062tscl.fsf@suse.de> (raw)
In-Reply-To: <20241220104220.2007786-3-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

Hi Nick,

I'm ignorant about replay, but we try to know why were taking the BQL in
the migration code, we move it around sometimes, etc. Can we be a bit
more strict with documentation here so we don't get stuck with a lock
that can't be changed?

> Migration causes a number of events that need to go in the replay
> trace, such as vm state transitions. The replay_mutex lock needs to
> be held for these.
>

Is it practical to explicitly list which events are those?

Are there any tests that exercise this that we could use to validate
changes around this area?

> The simplest approach seems to be just take it up-front when taking
> the bql.

But also the thing asserts if taken inside the BQL, so is the actual
matter here that we _cannot_ take the lock around the proper places?

I also see the replay lock around the main loop, so is it basically bql2
from the perspective of most of QEMU?

>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  migration/migration.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 2eb9e50a263..277fca954c1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -24,6 +24,7 @@
>  #include "socket.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
>  #include "sysemu/cpu-throttle.h"
>  #include "rdma.h"
>  #include "ram.h"
> @@ -2518,6 +2519,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>      }
>  
>      trace_postcopy_start();
> +    replay_mutex_lock();
>      bql_lock();
>      trace_postcopy_start_set_run();
>  
> @@ -2629,6 +2631,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>      migration_downtime_end(ms);
>  
>      bql_unlock();
> +    replay_mutex_unlock();
>  
>      if (migrate_postcopy_ram()) {
>          /*
> @@ -2670,6 +2673,7 @@ fail:
>      }
>      migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>      bql_unlock();
> +    replay_mutex_unlock();
>      return -1;
>  }
>  
> @@ -2721,6 +2725,7 @@ static int migration_completion_precopy(MigrationState *s,
>  {
>      int ret;
>  
> +    replay_mutex_lock();
>      bql_lock();
>  
>      if (!migrate_mode_is_cpr(s)) {
> @@ -2746,6 +2751,7 @@ static int migration_completion_precopy(MigrationState *s,
>                                               s->block_inactive);
>  out_unlock:
>      bql_unlock();
> +    replay_mutex_unlock();
>      return ret;
>  }
>  
> @@ -3633,6 +3639,7 @@ static void *bg_migration_thread(void *opaque)
>  
>      trace_migration_thread_setup_complete();
>  
> +    replay_mutex_lock();
>      bql_lock();
>  
>      if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
> @@ -3666,6 +3673,7 @@ static void *bg_migration_thread(void *opaque)
>       */
>      migration_bh_schedule(bg_migration_vm_start_bh, s);
>      bql_unlock();
> +    replay_mutex_unlock();
>  
>      while (migration_is_active()) {
>          MigIterateState iter_state = bg_migration_iteration_run(s);
> @@ -3695,6 +3703,7 @@ fail:
>          migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                  MIGRATION_STATUS_FAILED);
>          bql_unlock();
> +        replay_mutex_unlock();
>      }
>  
>  fail_setup:


  reply	other threads:[~2024-12-20 13:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
2024-12-20 16:31   ` Peter Xu
2024-12-21  3:02     ` Nicholas Piggin
2024-12-23 17:26       ` Peter Xu
2024-12-24  7:24         ` Pavel Dovgalyuk
2024-12-24 15:19           ` Peter Xu
2024-12-20 10:42 ` [PATCH 02/17] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-12-20 13:08   ` Fabiano Rosas [this message]
2024-12-21  2:54     ` Nicholas Piggin
2024-12-20 10:42 ` [PATCH 03/17] async: rework async event API for replay Nicholas Piggin
2024-12-20 10:42 ` [PATCH 04/17] util/main-loop: Convert to new bh API Nicholas Piggin
2024-12-20 10:42 ` [PATCH 05/17] util/thread-pool: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 06/17] util/aio-wait: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 07/17] async/coroutine: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 08/17] migration: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 09/17] monitor: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 10/17] qmp: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 11/17] block: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 12/17] hw/ide: Fix record-replay and convert " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 13/17] hw/scsi: Convert " Nicholas Piggin
2024-12-20 23:54   ` Paolo Bonzini
2024-12-21  3:17     ` Nicholas Piggin
2024-12-20 10:42 ` [PATCH 14/17] async: add debugging assertions for record/replay in bh APIs Nicholas Piggin
2024-12-20 10:42 ` [PATCH 15/17] tests/avocado/replay_linux: Fix compile error Nicholas Piggin
2024-12-20 10:42 ` [PATCH 16/17] tests/avocado/replay_linux: Fix cdrom device setup Nicholas Piggin
2024-12-20 10:42 ` [PATCH 17/17] tests/avocado/replay_linux: remove the timeout expected guards Nicholas Piggin
2024-12-20 11:42 ` [PATCH 00/17] replay: Fixes and avocado test updates Pavel Dovgalyuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r062tscl.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=dave@treblig.org \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pavel.dovgaluk@ispras.ru \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    --cc=wainersm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.