From: "Gupta, Pankaj" <pankaj.gupta@amd.com>
To: Axel Rasmussen <axelrasmussen@google.com>,
Anish Moorthy <amoorthy@google.com>
Cc: seanjc@google.com, oliver.upton@linux.dev, maz@kernel.org,
kvm@vger.kernel.org, kvmarm@lists.linux.dev,
robert.hoo.linux@gmail.com, jthoughton@google.com,
dmatlack@google.com, peterx@redhat.com, nadav.amit@gmail.com,
isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com
Subject: Re: [PATCH v7 00/14] Improve KVM + userfaultfd performance via KVM_EXIT_MEMORY_FAULTs on stage-2 faults
Date: Wed, 21 Feb 2024 08:35:49 +0100 [thread overview]
Message-ID: <f21d4fb1-3bd2-b129-287d-ecd959dbb72e@amd.com> (raw)
In-Reply-To: <CAJHvVcjmg8i8ebMEK2gE2hMg9c98zyUr_xPCrsDKvY-3fUZTUQ@mail.gmail.com>
>>> On 2/16/2024 12:53 AM, Anish Moorthy wrote:
>>>> This series adds an option to cause stage-2 fault handlers to
>>>> KVM_MEMORY_FAULT_EXIT when they would otherwise be required to fault in
>>>> the userspace mappings. Doing so allows userspace to receive stage-2
>>>> faults directly from KVM_RUN instead of through userfaultfd, which
>>>> suffers from serious contention issues as the number of vCPUs scales.
>>>
>>> Thanks for your work!
>>
>> :D
>>
>>>
>>> So, this is an alternative approach userspace like Qemu to do post copy
>>> live migration using KVM_MEMORY_FAULT_EXIT instead of userfaultfd which
>>> seems slower with more vCPU's.
>>>
>>> Maybe I am missing some things here, just curious how userspace VMM e.g
>>> Qemu would do memory copy with this approach once the page is available
>>> from remote host which was done with UFFDIO_COPY earlier?
>>
>> This new capability is meant to be used *alongside* userfaultfd during
>> post-copy: it's not a replacement. KVM_RUN can generate page faults
>> from outside the stage-2 fault handlers (IIUC instruction emulation is
>> one source), and these paths are unchanged: so it's important that
>> userspace still UFFDIO_REGISTERs KVM's mapping and reads from the UFFD
>> to catch these guest accesses. But with the new
>> KVM_MEM_EXIT_ON_MISSING memslot flag set, the stage-2 handlers will
>> report needing to fault in memory via KVM_MEMORY_FAULT_EXIT instead of
>> queuing onto the UFFD.
>>
>> In the workloads I've tested, the vast majority of guest-generated
>> page faults (99%+) come from the stage-2 handlers. So this series
>> "solves" the issue of contention on the UFFD file descriptor by
>> (mostly) sidestepping it.
>>
>> As for how userspace actually uses the new functionality: when a vCPU
>> thread receives a KVM_MEMORY_FAULT_EXIT for an unfetched page during
>> post-copy it might
>>
>> (a) Fetch the page
>> (b) Install the page into KVM's mapping via UFFDIO_COPY (don't
>> necessarily need to UFFDIO_WAKE!)
>> (c) Call KVM_RUN to re-enter the guest and retry the access. The
>> stage-2 fault handler will fire again but almost certainly won't
>> KVM_MEMORY_FAULT_EXIT now (since the UFFDIO_COPY will have mapped the
>> page), so the guest can continue.
>>
>> and userspace can continue using some thread(s) to
>>
>> (a) Read page faults from the UFFD.
>> (b) Install the page using UFFDIO_COPY + UFFDIO_WAKE
>> (c) goto (a)
>>
>> to make sure it catches everything. The combination of these two things
>> adds up to more performant "uffd-based" postcopy.
>>
>> I'm of course skimming over some details (e.g.: when two vCPU threads
>> race to fetch a page one of them should probably MADV_POPULATE_WRITE
>> somehow), but I hope this is helpful. My patch to the KVM demand
>> paging self test might also clarify things a bit [1].
>
> One other small detail is, you can equally use UFFDIO_CONTINUE,
> depending on how the rest of the live migration implementation works.
>
> Really briefly, this series should be viewed as an alternate (and more
> scalable) mechanism to find out that a fault occurred. The way
> userspace then *resolves* the fault (whether via UFFDIO_COPY or
> UFFDIO_CONTINUE) can remain the same as before.
>
That clarifies. Thank you!
Best regards,
Pankaj
next prev parent reply other threads:[~2024-02-21 7:35 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 23:53 [PATCH v7 00/14] Improve KVM + userfaultfd performance via KVM_EXIT_MEMORY_FAULTs on stage-2 faults Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 01/14] KVM: Clarify meaning of hva_to_pfn()'s 'atomic' parameter Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 02/14] KVM: Add function comments for __kvm_read/write_guest_page() Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 03/14] KVM: Documentation: Make note of the KVM_MEM_GUEST_MEMFD memslot flag Anish Moorthy
2024-04-09 22:47 ` Sean Christopherson
2024-02-15 23:53 ` [PATCH v7 04/14] KVM: Simplify error handling in __gfn_to_pfn_memslot() Anish Moorthy
2024-04-09 22:44 ` Sean Christopherson
2024-02-15 23:53 ` [PATCH v7 05/14] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 06/14] KVM: Add memslot flag to let userspace force an exit on missing hva mappings Anish Moorthy
2024-03-08 22:07 ` Sean Christopherson
2024-03-09 0:46 ` David Matlack
2024-03-11 4:45 ` Oliver Upton
2024-03-11 16:20 ` David Matlack
2024-07-03 17:34 ` Nikita Kalyazin
2024-07-03 20:11 ` David Matlack
2024-07-04 10:10 ` Nikita Kalyazin
2024-03-11 16:36 ` Sean Christopherson
2024-03-11 17:08 ` Anish Moorthy
2024-03-11 21:21 ` Oliver Upton
2024-02-15 23:53 ` [PATCH v7 07/14] KVM: x86: Enable KVM_CAP_EXIT_ON_MISSING and annotate EFAULTs from stage-2 fault handler Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO and annotate fault in the " Anish Moorthy
2024-03-04 20:00 ` Oliver Upton
2024-03-04 20:10 ` Oliver Upton
2024-03-04 20:32 ` Sean Christopherson
2024-03-04 21:03 ` Oliver Upton
2024-03-04 22:49 ` Sean Christopherson
2024-03-05 1:01 ` Oliver Upton
2024-03-05 15:39 ` Sean Christopherson
2024-02-15 23:54 ` [PATCH v7 09/14] KVM: arm64: Implement and advertise KVM_CAP_EXIT_ON_MISSING Anish Moorthy
2024-02-15 23:54 ` [PATCH v7 10/14] KVM: selftests: Report per-vcpu demand paging rate from demand paging test Anish Moorthy
2024-04-09 22:49 ` Sean Christopherson
2024-02-15 23:54 ` [PATCH v7 11/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in " Anish Moorthy
2024-04-09 22:58 ` Sean Christopherson
2024-02-15 23:54 ` [PATCH v7 12/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2024-02-15 23:54 ` [PATCH v7 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2024-02-15 23:54 ` [PATCH v7 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2024-02-16 7:36 ` [PATCH v7 00/14] Improve KVM + userfaultfd performance via KVM_EXIT_MEMORY_FAULTs on stage-2 faults Gupta, Pankaj
2024-02-16 20:00 ` Anish Moorthy
2024-02-16 23:40 ` Axel Rasmussen
2024-02-21 7:35 ` Gupta, Pankaj [this message]
2024-04-10 0:19 ` Sean Christopherson
2024-05-07 17:38 ` Anish Moorthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f21d4fb1-3bd2-b129-287d-ecd959dbb72e@amd.com \
--to=pankaj.gupta@amd.com \
--cc=amoorthy@google.com \
--cc=axelrasmussen@google.com \
--cc=dmatlack@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=jthoughton@google.com \
--cc=kconsul@linux.vnet.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=maz@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=oliver.upton@linux.dev \
--cc=peterx@redhat.com \
--cc=robert.hoo.linux@gmail.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox