From: Oliver Upton <oliver.upton@linux.dev>
To: Anish Moorthy <amoorthy@google.com>
Cc: seanjc@google.com, jthoughton@google.com, kvm@vger.kernel.org
Subject: Re: [WIP Patch v2 09/14] KVM: Introduce KVM_CAP_MEMORY_FAULT_NOWAIT without implementation
Date: Fri, 17 Mar 2023 18:59:47 +0000 [thread overview]
Message-ID: <ZBS4o75PVHL4FQqw@linux.dev> (raw)
In-Reply-To: <20230315021738.1151386-10-amoorthy@google.com>
On Wed, Mar 15, 2023 at 02:17:33AM +0000, Anish Moorthy wrote:
> Add documentation, memslot flags, useful helper functions, and the
> actual new capability itself.
>
> Memory fault exits on absent mappings are particularly useful for
> userfaultfd-based live migration postcopy. When many vCPUs fault upon a
> single userfaultfd the faults can take a while to surface to userspace
> due to having to contend for uffd wait queue locks. Bypassing the uffd
> entirely by triggering a vCPU exit avoids this contention and can improve
> the fault rate by as much as 10x.
> ---
> Documentation/virt/kvm/api.rst | 37 +++++++++++++++++++++++++++++++---
> include/linux/kvm_host.h | 6 ++++++
> include/uapi/linux/kvm.h | 3 +++
> tools/include/uapi/linux/kvm.h | 2 ++
> virt/kvm/kvm_main.c | 7 ++++++-
> 5 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f9ca18bbec879..4932c0f62eb3d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1312,6 +1312,7 @@ yet and must be cleared on entry.
> /* for kvm_userspace_memory_region::flags */
> #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
> #define KVM_MEM_READONLY (1UL << 1)
> + #define KVM_MEM_ABSENT_MAPPING_FAULT (1UL << 2)
call it KVM_MEM_EXIT_ABSENT_MAPPING
>
> This ioctl allows the user to create, modify or delete a guest physical
> memory slot. Bits 0-15 of "slot" specify the slot id and this value
> @@ -1342,12 +1343,15 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
> be identical. This allows large pages in the guest to be backed by large
> pages in the host.
>
> -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
> -KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of
> +The flags field supports three flags
> +
> +1. KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of
> writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to
> -use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
> +use it.
> +2. KVM_MEM_READONLY: can be set, if KVM_CAP_READONLY_MEM capability allows it,
> to make a new slot read-only. In this case, writes to this memory will be
> posted to userspace as KVM_EXIT_MMIO exits.
> +3. KVM_MEM_ABSENT_MAPPING_FAULT: see KVM_CAP_MEMORY_FAULT_NOWAIT for details.
>
> When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
> the memory region are automatically reflected into the guest. For example, an
> @@ -7702,10 +7706,37 @@ Through args[0], the capability can be set on a per-exit-reason basis.
> Currently, the only exit reasons supported are
>
> 1. KVM_MEMFAULT_REASON_UNKNOWN (1 << 0)
> +2. KVM_MEMFAULT_REASON_ABSENT_MAPPING (1 << 1)
>
> Memory fault exits with a reason of UNKNOWN should not be depended upon: they
> may be added, removed, or reclassified under a stable reason.
>
> +7.35 KVM_CAP_MEMORY_FAULT_NOWAIT
> +--------------------------------
> +
> +:Architectures: x86, arm64
> +:Returns: -EINVAL.
> +
> +The presence of this capability indicates that userspace may pass the
> +KVM_MEM_ABSENT_MAPPING_FAULT flag to KVM_SET_USER_MEMORY_REGION to cause KVM_RUN
> +to exit to populate 'kvm_run.memory_fault' and exit to userspace (*) in response
> +to page faults for which the userspace page tables do not contain present
> +mappings. Attempting to enable the capability directly will fail.
> +
> +The 'gpa' and 'len' fields of kvm_run.memory_fault will be set to the starting
> +address and length (in bytes) of the faulting page. 'flags' will be set to
> +KVM_MEMFAULT_REASON_ABSENT_MAPPING.
> +
> +Userspace should determine how best to make the mapping present, then take
> +appropriate action. For instance, in the case of absent mappings this might
> +involve establishing the mapping for the first time via UFFDIO_COPY/CONTINUE or
> +faulting the mapping in using MADV_POPULATE_READ/WRITE. After establishing the
> +mapping, userspace can return to KVM to retry the previous memory access.
> +
> +(*) NOTE: On x86, KVM_CAP_X86_MEMORY_FAULT_EXIT must be enabled for the
> +KVM_MEMFAULT_REASON_ABSENT_MAPPING_reason: otherwise userspace will only receive
> +a -EFAULT from KVM_RUN without any useful information.
I'm not a fan of this architecture-specific dependency. Userspace is already
explicitly opting in to this behavior by way of the memslot flag. These sort
of exits are entirely orthogonal to the -EFAULT conversion earlier in the
series.
> 8. Other capabilities.
> ======================
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d3ccfead73e42..c28330f25526f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -593,6 +593,12 @@ static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *sl
> return slot->flags & KVM_MEM_LOG_DIRTY_PAGES;
> }
>
> +static inline bool kvm_slot_fault_on_absent_mapping(
> + const struct kvm_memory_slot *slot)
Style again...
I'd strongly recommend using 'exit' instead of 'fault' in the verbiage of the
KVM implementation. I understand we're giving userspace the illusion of a page
fault mechanism, but the term is then overloaded in KVM since we handle
literal faults from hardware.
--
Thanks,
Oliver
next prev parent reply other threads:[~2023-03-17 19:00 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 2:17 [WIP Patch v2 00/14] Avoiding slow get-user-pages via memory fault exit Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 01/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 02/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 03/14] KVM: Allow hva_pfn_fast to resolve read-only faults Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field Anish Moorthy
2023-03-17 0:02 ` Isaku Yamahata
2023-03-17 18:33 ` Anish Moorthy
2023-03-17 19:30 ` Oliver Upton
2023-03-17 21:50 ` Sean Christopherson
2023-03-17 22:44 ` Anish Moorthy
2023-03-20 15:53 ` Sean Christopherson
2023-03-20 18:19 ` Anish Moorthy
2023-03-20 22:11 ` Anish Moorthy
2023-03-21 15:21 ` Sean Christopherson
2023-03-21 18:01 ` Anish Moorthy
2023-03-21 19:43 ` Sean Christopherson
2023-03-22 21:06 ` Anish Moorthy
2023-03-22 23:17 ` Sean Christopherson
2023-03-28 22:19 ` Anish Moorthy
2023-04-04 19:34 ` Sean Christopherson
2023-04-04 20:40 ` Anish Moorthy
2023-04-04 22:07 ` Sean Christopherson
2023-04-05 20:21 ` Anish Moorthy
2023-03-17 18:35 ` Oliver Upton
2023-03-15 2:17 ` [WIP Patch v2 05/14] KVM: x86: Implement memory fault exit for direct_map Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 06/14] KVM: x86: Implement memory fault exit for kvm_handle_page_fault Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 07/14] KVM: x86: Implement memory fault exit for setup_vmgexit_scratch Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 08/14] KVM: x86: Implement memory fault exit for FNAME(fetch) Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 09/14] KVM: Introduce KVM_CAP_MEMORY_FAULT_NOWAIT without implementation Anish Moorthy
2023-03-17 18:59 ` Oliver Upton [this message]
2023-03-17 20:15 ` Anish Moorthy
2023-03-17 20:54 ` Sean Christopherson
2023-03-17 23:42 ` Anish Moorthy
2023-03-20 15:13 ` Sean Christopherson
2023-03-20 19:53 ` Anish Moorthy
2023-03-17 20:17 ` Sean Christopherson
2023-03-20 22:22 ` Oliver Upton
2023-03-21 14:50 ` Sean Christopherson
2023-03-21 20:23 ` Oliver Upton
2023-03-21 21:01 ` Sean Christopherson
2023-03-15 2:17 ` [WIP Patch v2 10/14] KVM: x86: Implement KVM_CAP_MEMORY_FAULT_NOWAIT Anish Moorthy
2023-03-17 0:32 ` Isaku Yamahata
2023-03-15 2:17 ` [WIP Patch v2 11/14] KVM: arm64: Allow user_mem_abort to return 0 to signal a 'normal' exit Anish Moorthy
2023-03-17 18:18 ` Oliver Upton
2023-03-15 2:17 ` [WIP Patch v2 12/14] KVM: arm64: Implement KVM_CAP_MEMORY_FAULT_NOWAIT Anish Moorthy
2023-03-17 18:27 ` Oliver Upton
2023-03-17 19:00 ` Anish Moorthy
2023-03-17 19:03 ` Oliver Upton
2023-03-17 19:24 ` Sean Christopherson
2023-03-15 2:17 ` [WIP Patch v2 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2023-03-17 17:43 ` [WIP Patch v2 00/14] Avoiding slow get-user-pages via memory fault exit Oliver Upton
2023-03-17 18:13 ` Sean Christopherson
2023-03-17 18:46 ` David Matlack
2023-03-17 18:54 ` Oliver Upton
2023-03-17 18:59 ` David Matlack
2023-03-17 19:53 ` Anish Moorthy
2023-03-17 22:03 ` Sean Christopherson
2023-03-20 15:56 ` Sean Christopherson
2023-03-17 20:35 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZBS4o75PVHL4FQqw@linux.dev \
--to=oliver.upton@linux.dev \
--cc=amoorthy@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).