From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: "Dr . David Alan Gilbert" <dave@treblig.org>,
peterx@redhat.com, Alexey Perevalov <a.perevalov@samsung.com>,
Juraj Marcin <jmarcin@redhat.com>
Subject: Re: [PATCH 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable
Date: Tue, 03 Jun 2025 13:44:40 -0300 [thread overview]
Message-ID: <87tt4w3hc7.fsf@suse.de> (raw)
In-Reply-To: <20250527231248.1279174-13-peterx@redhat.com>
Peter Xu <peterx@redhat.com> writes:
> Currently, the postcopy blocktime feature maintains vCPU fault information
> using an array (vcpu_addr[]). It has two issues.
>
> Issue 1: Performance Concern
> ============================
>
> The old algorithm was almost OK and fast on inserts, except that the lookup
> is slow and won't scale if there are a lot of vCPUs: when a page is copied
> during postcopy, mark_postcopy_blocktime_end() will walk the whole array
> trying to find which vCPUs are blocked by the address. So it needs
> constant O(N) walk for each page resolution.
>
> Alexey (the author of postcopy blocktime) mentioned the perf issue and how
> to optimize it in a piece of comment in the page resolution path. The
> comment was (interestingly..) not complete, but it's relatively clear what
> he wanted to say about this perf issue.
>
> Issue 2: Wrong Accounting on re-entrancies
> ==========================================
>
> People might think that each vCPU should only and always get one fault at a
> time, so that when the blocktime layer captured one fault on one vCPU, we
> should never see another fault message on this vCPU.
>
> It's almost correct, except in some extreme rare cases.
>
> Case 1: it's possible the fault thread processes the userfaultfd messages
> too fast so it can see >1 messages on one vCPU before the previous one was
> resolved.
>
> Case 2: it's theoretically also possible one vCPU can get even more than
> one message on the same fault address if a fault is retried by the
> kernel (e.g., handle_userfault() got interrupted before page resolution).
>
> As this info might be important, instead of using commit message, I put
> more details into the code as comment, when introducing an array
> maintaining concurrent faults on one vCPU. Please refer to the comments
> for details on both cases, especially case 1 which can be tricky.
>
> Case 1 sounds rare, but it can be easily reproduced locally for me when we
> run blocktime together with the migration-test on the vanilla postcopy.
>
> New Design
> ==========
>
> This patch should do almost what Alexey mentioned, but slightly
> differently: instead of having an array to maintain vCPU fault addresses,
> for each of the fault message we push a message into a hash, indexed by the
> fault address.
>
> With the hash, it can replace the old two structs: both the vcpu_addr[]
> array, and also the array to store the start time of the fault. However
> due to above we need one more counter array to account concurrent faults on
> the same vCPU - that should even be needed in the old code, it's just that
> the old code was buggy and it will blindly overwrite an existing
> entry.. now we'll start to really track everything.
>
> The hash structure might be more efficient than tree to maintain such
> addr->(cpu, fault_time) information, so that the insert() and lookup()
> paths should ideally both be ~O(1). After all, we do not need to sort.
>
> Here we need to do one remove() though after the lookup(). It could be
> slow but only if many vCPUs faulted exactly on the same address (so when
> the list of cpu entries is long), which should be unlikely. Even with that,
> it's still a worst case O(N) (consider 400 vCPUs faulted on the same
> address and how likely is it..) rather than a constant O(N) complexity.
>
> When at it, touch up the tracepoints to make them slightly more useful.
> One tracepoint is added when walking all the fault entries.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
next prev parent reply other threads:[~2025-06-03 16:45 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-27 23:12 [PATCH 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-05-27 23:12 ` [PATCH 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-02 16:50 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-02 17:46 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-02 18:00 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-02 20:42 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-03 15:52 ` Fabiano Rosas
2025-06-03 15:55 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-03 15:54 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-03 15:59 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-02 9:26 ` Markus Armbruster
2025-06-02 16:29 ` Peter Xu
2025-06-03 16:07 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-03 16:08 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-03 16:20 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-03 16:22 ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-03 16:44 ` Fabiano Rosas [this message]
2025-05-27 23:12 ` [PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Peter Xu
2025-06-03 16:50 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tt4w3hc7.fsf@suse.de \
--to=farosas@suse.de \
--cc=a.perevalov@samsung.com \
--cc=dave@treblig.org \
--cc=jmarcin@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).