From: Oliver Upton <oliver.upton@linux.dev>
To: Sean Christopherson <seanjc@google.com>
Cc: Anish Moorthy <amoorthy@google.com>,
jthoughton@google.com, kvm@vger.kernel.org
Subject: Re: [WIP Patch v2 09/14] KVM: Introduce KVM_CAP_MEMORY_FAULT_NOWAIT without implementation
Date: Mon, 20 Mar 2023 22:22:08 +0000 [thread overview]
Message-ID: <ZBjckKb6eWx2vSin@linux.dev> (raw)
In-Reply-To: <ZBTK0vzAoWqY1hDh@google.com>
Sean,
On Fri, Mar 17, 2023 at 01:17:22PM -0700, Sean Christopherson wrote:
> On Fri, Mar 17, 2023, Oliver Upton wrote:
> > On Wed, Mar 15, 2023 at 02:17:33AM +0000, Anish Moorthy wrote:
> > > Add documentation, memslot flags, useful helper functions, and the
> > > actual new capability itself.
> > >
> > > Memory fault exits on absent mappings are particularly useful for
> > > userfaultfd-based live migration postcopy. When many vCPUs fault upon a
> > > single userfaultfd the faults can take a while to surface to userspace
> > > due to having to contend for uffd wait queue locks. Bypassing the uffd
> > > entirely by triggering a vCPU exit avoids this contention and can improve
> > > the fault rate by as much as 10x.
> > > ---
> > > Documentation/virt/kvm/api.rst | 37 +++++++++++++++++++++++++++++++---
> > > include/linux/kvm_host.h | 6 ++++++
> > > include/uapi/linux/kvm.h | 3 +++
> > > tools/include/uapi/linux/kvm.h | 2 ++
> > > virt/kvm/kvm_main.c | 7 ++++++-
> > > 5 files changed, 51 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index f9ca18bbec879..4932c0f62eb3d 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -1312,6 +1312,7 @@ yet and must be cleared on entry.
> > > /* for kvm_userspace_memory_region::flags */
> > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
> > > #define KVM_MEM_READONLY (1UL << 1)
> > > + #define KVM_MEM_ABSENT_MAPPING_FAULT (1UL << 2)
> >
> > call it KVM_MEM_EXIT_ABSENT_MAPPING
>
> Ooh, look, a bikeshed! :-)
Couldn't help myself :)
> I don't think it should have "EXIT" in the name. The exit to userspace is a side
> effect, e.g. KVM already exits to userspace on unresolved userfaults. The only
> thing this knob _directly_ controls is whether or not KVM attempts the slow path.
> If we give the flag a name like "exit on absent userspace mappings", then KVM will
> appear to do the wrong thing when KVM exits on a truly absent userspace mapping.
>
> And as I argued in the last version[*], I am _strongly_ opposed to KVM speculating
> on why KVM is exiting to userspace. I.e. KVM should not set a special flag if
> the memslot has "fast only" behavior. The only thing the flag should do is control
> whether or not KVM tries slow paths, what KVM does in response to an unresolved
> fault should be an orthogonal thing.
>
> E.g. If KVM encounters an unmapped page while prefetching SPTEs, KVM will (correctly)
> not exit to userspace and instead simply terminate the prefetch. Obviously we
> could solve that through documentation, but I don't see any benefit in making this
> more complex than it needs to be.
I couldn't care less about what the user-facing portion of this thing is
called, TBH. We could just refer to it as KVM_MEM_BIT_2 /s
The only bit I wanted to avoid is having a collision in the kernel between
literal faults arising from hardware and exits to userspace that we are also
calling 'faults'.
> [*] https://lkml.kernel.org/r/Y%2B0RYMfw6pHrSLX4%40google.com
>
> > > +7.35 KVM_CAP_MEMORY_FAULT_NOWAIT
> > > +--------------------------------
> > > +
> > > +:Architectures: x86, arm64
> > > +:Returns: -EINVAL.
> > > +
> > > +The presence of this capability indicates that userspace may pass the
> > > +KVM_MEM_ABSENT_MAPPING_FAULT flag to KVM_SET_USER_MEMORY_REGION to cause KVM_RUN
> > > +to exit to populate 'kvm_run.memory_fault' and exit to userspace (*) in response
> > > +to page faults for which the userspace page tables do not contain present
> > > +mappings. Attempting to enable the capability directly will fail.
> > > +
> > > +The 'gpa' and 'len' fields of kvm_run.memory_fault will be set to the starting
> > > +address and length (in bytes) of the faulting page. 'flags' will be set to
> > > +KVM_MEMFAULT_REASON_ABSENT_MAPPING.
> > > +
> > > +Userspace should determine how best to make the mapping present, then take
> > > +appropriate action. For instance, in the case of absent mappings this might
> > > +involve establishing the mapping for the first time via UFFDIO_COPY/CONTINUE or
> > > +faulting the mapping in using MADV_POPULATE_READ/WRITE. After establishing the
> > > +mapping, userspace can return to KVM to retry the previous memory access.
> > > +
> > > +(*) NOTE: On x86, KVM_CAP_X86_MEMORY_FAULT_EXIT must be enabled for the
> > > +KVM_MEMFAULT_REASON_ABSENT_MAPPING_reason: otherwise userspace will only receive
> > > +a -EFAULT from KVM_RUN without any useful information.
> >
> > I'm not a fan of this architecture-specific dependency. Userspace is already
> > explicitly opting in to this behavior by way of the memslot flag. These sort
> > of exits are entirely orthogonal to the -EFAULT conversion earlier in the
> > series.
>
> Ya, yet another reason not to speculate on why KVM wasn't able to resolve a fault.
Regardless of what we name this memslot flag, we're already getting explicit
opt-in from userspace for new behavior. There seems to be zero value in
supporting memslot_flag && !MEMORY_FAULT_EXIT (i.e. returning EFAULT),
so why even bother?
Requiring two levels of opt-in to have the intended outcome for a single
architecture seems nauseating from a userspace perspective.
--
Thanks,
Oliver
next prev parent reply other threads:[~2023-03-20 22:24 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 2:17 [WIP Patch v2 00/14] Avoiding slow get-user-pages via memory fault exit Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 01/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 02/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 03/14] KVM: Allow hva_pfn_fast to resolve read-only faults Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field Anish Moorthy
2023-03-17 0:02 ` Isaku Yamahata
2023-03-17 18:33 ` Anish Moorthy
2023-03-17 19:30 ` Oliver Upton
2023-03-17 21:50 ` Sean Christopherson
2023-03-17 22:44 ` Anish Moorthy
2023-03-20 15:53 ` Sean Christopherson
2023-03-20 18:19 ` Anish Moorthy
2023-03-20 22:11 ` Anish Moorthy
2023-03-21 15:21 ` Sean Christopherson
2023-03-21 18:01 ` Anish Moorthy
2023-03-21 19:43 ` Sean Christopherson
2023-03-22 21:06 ` Anish Moorthy
2023-03-22 23:17 ` Sean Christopherson
2023-03-28 22:19 ` Anish Moorthy
2023-04-04 19:34 ` Sean Christopherson
2023-04-04 20:40 ` Anish Moorthy
2023-04-04 22:07 ` Sean Christopherson
2023-04-05 20:21 ` Anish Moorthy
2023-03-17 18:35 ` Oliver Upton
2023-03-15 2:17 ` [WIP Patch v2 05/14] KVM: x86: Implement memory fault exit for direct_map Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 06/14] KVM: x86: Implement memory fault exit for kvm_handle_page_fault Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 07/14] KVM: x86: Implement memory fault exit for setup_vmgexit_scratch Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 08/14] KVM: x86: Implement memory fault exit for FNAME(fetch) Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 09/14] KVM: Introduce KVM_CAP_MEMORY_FAULT_NOWAIT without implementation Anish Moorthy
2023-03-17 18:59 ` Oliver Upton
2023-03-17 20:15 ` Anish Moorthy
2023-03-17 20:54 ` Sean Christopherson
2023-03-17 23:42 ` Anish Moorthy
2023-03-20 15:13 ` Sean Christopherson
2023-03-20 19:53 ` Anish Moorthy
2023-03-17 20:17 ` Sean Christopherson
2023-03-20 22:22 ` Oliver Upton [this message]
2023-03-21 14:50 ` Sean Christopherson
2023-03-21 20:23 ` Oliver Upton
2023-03-21 21:01 ` Sean Christopherson
2023-03-15 2:17 ` [WIP Patch v2 10/14] KVM: x86: Implement KVM_CAP_MEMORY_FAULT_NOWAIT Anish Moorthy
2023-03-17 0:32 ` Isaku Yamahata
2023-03-15 2:17 ` [WIP Patch v2 11/14] KVM: arm64: Allow user_mem_abort to return 0 to signal a 'normal' exit Anish Moorthy
2023-03-17 18:18 ` Oliver Upton
2023-03-15 2:17 ` [WIP Patch v2 12/14] KVM: arm64: Implement KVM_CAP_MEMORY_FAULT_NOWAIT Anish Moorthy
2023-03-17 18:27 ` Oliver Upton
2023-03-17 19:00 ` Anish Moorthy
2023-03-17 19:03 ` Oliver Upton
2023-03-17 19:24 ` Sean Christopherson
2023-03-15 2:17 ` [WIP Patch v2 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm Anish Moorthy
2023-03-15 2:17 ` [WIP Patch v2 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2023-03-17 17:43 ` [WIP Patch v2 00/14] Avoiding slow get-user-pages via memory fault exit Oliver Upton
2023-03-17 18:13 ` Sean Christopherson
2023-03-17 18:46 ` David Matlack
2023-03-17 18:54 ` Oliver Upton
2023-03-17 18:59 ` David Matlack
2023-03-17 19:53 ` Anish Moorthy
2023-03-17 22:03 ` Sean Christopherson
2023-03-20 15:56 ` Sean Christopherson
2023-03-17 20:35 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZBjckKb6eWx2vSin@linux.dev \
--to=oliver.upton@linux.dev \
--cc=amoorthy@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.