All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: "Dr . David Alan Gilbert" <dave@treblig.org>,
	peterx@redhat.com, Alexey Perevalov <a.perevalov@samsung.com>,
	Juraj Marcin <jmarcin@redhat.com>
Subject: Re: [PATCH 02/13] migration/postcopy: Push blocktime start/end into page req mutex
Date: Mon, 02 Jun 2025 14:46:09 -0300	[thread overview]
Message-ID: <87seki3ula.fsf@suse.de> (raw)
In-Reply-To: <20250527231248.1279174-3-peterx@redhat.com>

Peter Xu <peterx@redhat.com> writes:

> The postcopy blocktime feature was tricky that it used quite some atomic
> operations over quite a few arrays and vars, without explaining how that
> would be thread safe.  The thread safety here is about concurrency between
> the fault thread and the fault resolution threads, possible to access the
> same chunk of data.  All these atomic ops can be expensive too before
> knowing clearly how it works.
>
> OTOH, postcopy has one page_request_mutex used to serialize the received
> bitmap updates.  So far it's ok - we don't yet have a lot of threads
> contending the lock.  It might change after multifd will be supported, but
> that's a separate story.  What is important is, with that mutex, it's
> pretty lightweight to move all the blocktime maintenance into the mutex
> critical section.  It's because the blocktime layer is lightweighted:
> almost "remember which vcpu faulted on which address", and "ok we get some
> fault resolved, calculate how long it takes".  It's also an optional
> feature for now (but I have thought of changing that, maybe in the future).
>
> Let's push the blocktime layer into the mutex, so that it's always
> thread-safe even without any atomic ops.
>
> To achieve that, I'll need to add a tid parameter on fault path so that
> it'll start to pass the faulted thread ID into deeper the stack, but not
> too deep.  When at it, add a comment for the shared fault handler (for
> example, vhost-user devices running with postcopy), to mention a TODO.  One
> reason it might not be trivial is that vhost-user's userfaultfds should be
> opened by vhost-user process, so it's pretty hard to control making sure
> the TID feature will be around.  It wasn't supported before, so keep it
> like that for now.
>
> Now we should be as ease when everything is protected by a mutex that we
> always take anyway.
>
> One side effect: we can finally remove one ramblock_recv_bitmap_test() in
> mark_postcopy_blocktime_begin(), which was pretty weird and which also
> includes a weird (but maybe necessary.. but maybe not?) operation to inject
> a blocktime entry then quickly erase it..  When we're with the mutex, and
> when we make sure it's invoked after checking the receive bitmap, it's not
> needed anymore.  Instead, we assert.
>
> As another side effect, this paves way for removing all atomic ops in all
> the mem accesses in blocktime layer.
>
> Note that we need a stub for mark_postcopy_blocktime_begin() for Windows
> builds.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


  reply	other threads:[~2025-06-02 17:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 23:12 [PATCH 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-05-27 23:12 ` [PATCH 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-02 16:50   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-02 17:46   ` Fabiano Rosas [this message]
2025-05-27 23:12 ` [PATCH 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-02 18:00   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-02 20:42   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-03 15:52   ` Fabiano Rosas
2025-06-03 15:55     ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-03 15:54   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-03 15:59   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-02  9:26   ` Markus Armbruster
2025-06-02 16:29     ` Peter Xu
2025-06-03 16:07   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-03 16:08   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-03 16:20   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-03 16:22   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-03 16:44   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Peter Xu
2025-06-03 16:50   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87seki3ula.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=a.perevalov@samsung.com \
    --cc=dave@treblig.org \
    --cc=jmarcin@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.