qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH] migration: Fix qmp_query_migrate mbps value
Date: Thu, 22 Feb 2024 10:49:12 -0300	[thread overview]
Message-ID: <87ttm0a9t3.fsf@suse.de> (raw)
In-Reply-To: <ZddMRroqCeuRWGFA@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Thu, Feb 22, 2024 at 05:40:41PM +0800, Peter Xu wrote:
>> On Wed, Feb 21, 2024 at 09:56:36AM -0300, Fabiano Rosas wrote:
>> > Peter Xu <peterx@redhat.com> writes:
>> > 
>> > > On Mon, Feb 19, 2024 at 04:44:57PM -0300, Fabiano Rosas wrote:
>> > >> The QMP command query_migrate might see incorrect throughput numbers
>> > >> if it runs after we've set the migration completion status but before
>> > >> migration_calculate_complete() has updated s->total_time and s->mbps.
>> > >> 
>> > >> The migration status would show COMPLETED, but the throughput value
>> > >> would be the one from the last iteration and not the one from the
>> > >> whole migration. This will usually be a larger value due to the time
>> > >> period being smaller (one iteration).
>> > >> 
>> > >> Move migration_calculate_complete() earlier so that the status
>> > >> MIGRATION_STATUS_COMPLETED is only emitted after the final counters
>> > >> update.
>> > >> 
>> > >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > >> ---
>> > >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1182405776
>> > >> ---
>> > >>  migration/migration.c | 10 ++++++----
>> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
>> > >> 
>> > >> diff --git a/migration/migration.c b/migration/migration.c
>> > >> index ab21de2cad..7486d59da0 100644
>> > >> --- a/migration/migration.c
>> > >> +++ b/migration/migration.c
>> > >> @@ -102,6 +102,7 @@ static int migration_maybe_pause(MigrationState *s,
>> > >>                                   int new_state);
>> > >>  static void migrate_fd_cancel(MigrationState *s);
>> > >>  static bool close_return_path_on_source(MigrationState *s);
>> > >> +static void migration_calculate_complete(MigrationState *s);
>> > >>  
>> > >>  static void migration_downtime_start(MigrationState *s)
>> > >>  {
>> > >> @@ -2746,6 +2747,7 @@ static void migration_completion(MigrationState *s)
>> > >>          migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>> > >>                            MIGRATION_STATUS_COLO);
>> > >>      } else {
>> > >> +        migration_calculate_complete(s);
>> > >>          migrate_set_state(&s->state, current_active_state,
>> > >>                            MIGRATION_STATUS_COMPLETED);
>> > >>      }
>> > >> @@ -2784,6 +2786,7 @@ static void bg_migration_completion(MigrationState *s)
>> > >>          goto fail;
>> > >>      }
>> > >>  
>> > >> +    migration_calculate_complete(s);
>> > >>      migrate_set_state(&s->state, current_active_state,
>> > >>                        MIGRATION_STATUS_COMPLETED);
>> > >>      return;
>> > >> @@ -2993,12 +2996,15 @@ static void migration_calculate_complete(MigrationState *s)
>> > >>      int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> > >>      int64_t transfer_time;
>> > >>  
>> > >> +    /* QMP could read from these concurrently */
>> > >> +    bql_lock();
>> > >>      migration_downtime_end(s);
>> > >>      s->total_time = end_time - s->start_time;
>> > >>      transfer_time = s->total_time - s->setup_time;
>> > >>      if (transfer_time) {
>> > >>          s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
>> > >>      }
>> > >> +    bql_unlock();
>> > >
>> > > The lock is not needed?
>> > >
>> > > AFAIU that was needed because of things like runstate_set() rather than
>> > > setting of these fields.
>> > >
>> > 
>> > Don't we need to keep the total_time and mbps update atomic? Otherwise
>> > query-migrate might see (say) total_time=0 and mbps=<correct value> or
>> > total_time=<correct value> and mbps=<previous value>.
>> 
>> I thought it wasn't a major concern, but what you said makes sense; taking
>> it one more time doesn't really hurt after all to provide such benefit.
>> 
>> > 
>> > Also, what orders s->mbps update before the s->state update? I'd say we
>> > should probably hold the lock around the whole total_time,mbps,state
>> > update.
>> 
>> IMHO that's fine; mutex unlock implies a RELEASE.  See atomic.rst:
>> 
>> - ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` has
>>   release semantics and synchronizes with a ``pthread_mutex_lock`` for the
>>   same mutex.
>
> Hmm perhaps I wrote too soon.. it should only guarantee the ordering of the
> update on the lock variable itself v.s. any previous R&Ws, nothing else.
> Only if the other side uses bql_lock() will it guarantee proper ordering.
>
> Put them in bql should work, but I hesitate such use to start using bql
> to protect state updates.

Well, on the other hand that's a major use-case of the BQL: protecting
state that's used by QMP.

>
> How about we drop the lock, but use an explicit smp_mb_release()?  We may
> also want to use smb_load_acquire() in fill_source_migration_info() to use
> on reading &s->state (all will need some comment).  To me, making sure the
> total mbps is valid seems more important; while the other races are less
> harmful, and may not be a major concern?

That more closely reflects the problem we're trying to solve, which is
just an ordering one. However, the QMP code already holds the BQL, we
could just take benefit of that instead of adding more complex
synchronization primitives.

May I suggest we keep it simple and move that last migrate_set_state
into the BQL as well?

>
> PS: logically I think smp_mb_release() is not needed either, because state
> is updated using qatomic_cmpxchg(), which implies a full __ATOMIC_SEQ_CST.
>
>> 
>> > 
>> > I'm not entirely sure, what do you think?
>> > 
>> > > See migration_update_counters() where it also updates mbps without holding
>> > > a lock.
>> > 
>> > Here it might be less important since it's the middle of the migration,
>> > there will proabably be more than one query-migrate which would see the
>> > correct values.
>> 
>> Yep.  I queued this.
>> 
>> Thanks,
>> 
>> -- 
>> Peter Xu


  reply	other threads:[~2024-02-22 13:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19 19:44 [PATCH] migration: Fix qmp_query_migrate mbps value Fabiano Rosas
2024-02-21  2:52 ` Peter Xu
2024-02-21 12:56   ` Fabiano Rosas
2024-02-22  9:40     ` Peter Xu
2024-02-22 13:29       ` Peter Xu
2024-02-22 13:49         ` Fabiano Rosas [this message]
2024-02-23  0:08           ` Peter Xu
2024-02-23 12:39             ` Fabiano Rosas
2024-02-26  1:44               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttm0a9t3.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).