public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>,
	Anish Moorthy <amoorthy@google.com>,
	Sean Christopherson <seanjc@google.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	maz@kernel.org, oliver.upton@linux.dev,
	James Houghton <jthoughton@google.com>,
	bgardon@google.com, ricarkol@google.com,
	kvm <kvm@vger.kernel.org>,
	kvmarm@lists.linux.dev
Subject: Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.
Date: Mon, 15 May 2023 11:05:46 -0400	[thread overview]
Message-ID: <ZGJKSl7mb5t30zwN@x1n> (raw)
In-Reply-To: <CAJHvVci4VuQ_vdpRKczg4ic6x7eZRXE4+ZUvzO-xU_9VJ1Vqvg@mail.gmail.com>

On Thu, May 11, 2023 at 10:33:24AM -0700, Axel Rasmussen wrote:
> On Thu, May 11, 2023 at 10:18 AM David Matlack <dmatlack@google.com> wrote:
> >
> > On Wed, May 10, 2023 at 2:50 PM Peter Xu <peterx@redhat.com> wrote:
> > > On Tue, May 09, 2023 at 01:52:05PM -0700, Anish Moorthy wrote:
> > > > On Sun, May 7, 2023 at 6:23 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > What I wanted to do is to understand whether there's still chance to
> > > provide a generic solution.  I don't know why you have had a bunch of pmu
> > > stack showing in the graph, perhaps you forgot to disable some of the perf
> > > events when doing the test?  Let me know if you figure out why it happened
> > > like that (so far I didn't see), but I feel guilty to keep overloading you
> > > with such questions.
> > >
> > > The major problem I had with this series is it's definitely not a clean
> > > approach.  Say, even if you'll all rely on userapp you'll still need to
> > > rely on userfaultfd for kernel traps on corner cases or it just won't work.
> > > IIUC that's also the concern from Nadav.
> >
> > This is a long thread, so apologies if the following has already been discussed.
> >
> > Would per-tid userfaultfd support be a generic solution? i.e. Allow
> > userspace to create a userfaultfd that is tied to a specific task. Any
> > userfaults encountered by that task use that fd, rather than the
> > process-wide fd. I'm making the assumption here that each of these fds
> > would have independent signaling mechanisms/queues and so this would
> > solve the scaling problem.
> >
> > A VMM could use this to create 1 userfaultfd per vCPU and 1 thread per
> > vCPU for handling userfault requests. This seems like it'd have
> > roughly the same scalability characteristics as the KVM -EFAULT
> > approach.
> 
> I think this would work in principle, but it's significantly different
> from what exists today.
> 
> The splitting of userfaultfds Peter is describing is splitting up the
> HVA address space, not splitting per-thread.

[sorry mostly travel last week]

No, my idea was actually split per-thread, but since currently there's no
way to split per thread I was thinking we should start testing with split
per vma so it "emulates" the best we can have out of a split per thread.

> 
> I think for this design, we'd need to change UFFD registration so
> multiple UFFDs can register the same VMA, but can be filtered so they
> only receive fault events caused by some particular tid(s).

Having multiple real uffds per vma is challenging, as you mentioned
enqueuing may be more of an effort, meanwhile it's hard to know what's the
attribute of the uffd over this vma because each uffd has one feature list.

Here what we may need is only the "logical queue" of the uffd.  So I was
considering supporting multi-queue for a _single_ userfaultfd.

I actually mentioned some of it in the very initial reply to Anish:

https://lore.kernel.org/all/ZEGuogfbtxPNUq7t@x1n/

        If the real problem relies in a bunch of threads queuing, is it
        possible that we can provide just more queues for the events?  The
        readers will just need to go over all the queues.

        Way to decide "which thread uses which queue" can be another
        problem, what comes ups quickly to me is a "hash(tid) % n_queues"
        but maybe it can be better.  Each vcpu thread will have different
        tids, then they can hopefully scale on the queues.

The queues may need to be created also as sub-uffds, each only support
partial of the uffd interfaces (read/poll, COPY/CONTINUE/ZEROPAGE) but not
all (e.g. UFFDIO_API shouldn't be supported there).

> This might also incur some (small?) overhead, because in the fault path
> we now need to maintain some data structure so we can lookup which UFFD
> to notify based on a combination of the address and our tid. Today, since
> VMAs and UFFDs are 1:1 this lookup is trivial.  I think it's worth
> keeping in mind that a selling point of Anish's approach is that it's a
> very small change. It's plausible we can come up with some alternative
> way to scale, but it seems to me everything suggested so far is likely to
> require a lot more code, complexity, and effort vs. Anish's approach.

Yes, I think that's also the reason why I thought I overloaded too much on
this work.  If Anish eagerly wants that and make it useful, then I'm
totally fine because maintaining the 2nd cap seems trivial assuming the
maintainer already would accept the 1st cap.

I just hope it'll be thoroughly tested with even Google's private userspace
hypervisor, so the kernel interface is (even if not straightforward enough
to a new user seeing this) solid so it will service the goal for the
problem Anish is tackling with.

Thanks,

-- 
Peter Xu


  parent reply	other threads:[~2023-05-15 15:07 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 21:34 [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 01/22] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
2023-04-19 13:51   ` Hoo Robert
2023-04-20 17:55     ` Anish Moorthy
2023-04-21 12:15       ` Robert Hoo
2023-04-21 16:21         ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 02/22] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-04-19 13:36   ` Hoo Robert
2023-04-19 23:26     ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 03/22] KVM: Allow hva_pfn_fast() to resolve read-only faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 04/22] KVM: x86: Set vCPU exit reason to KVM_EXIT_UNKNOWN at the start of KVM_RUN Anish Moorthy
2023-05-02 17:17   ` Anish Moorthy
2023-05-02 18:51     ` Sean Christopherson
2023-05-02 19:49       ` Anish Moorthy
2023-05-02 20:41         ` Sean Christopherson
2023-05-02 21:46           ` Anish Moorthy
2023-05-02 22:31             ` Sean Christopherson
2023-04-12 21:34 ` [PATCH v3 05/22] KVM: Add KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
2023-04-19 13:57   ` Hoo Robert
2023-04-20 18:09     ` Anish Moorthy
2023-04-21 12:28       ` Robert Hoo
2023-06-01 19:52   ` Oliver Upton
2023-06-01 20:30     ` Anish Moorthy
2023-06-01 21:29       ` Oliver Upton
2023-07-04 10:10   ` Kautuk Consul
2023-04-12 21:34 ` [PATCH v3 06/22] KVM: Add docstrings to __kvm_write_guest_page() and __kvm_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 07/22] KVM: Annotate -EFAULTs from kvm_vcpu_write_guest_page() Anish Moorthy
2023-04-20 20:52   ` Peter Xu
2023-04-20 23:29     ` Anish Moorthy
2023-04-21 15:00       ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 08/22] KVM: Annotate -EFAULTs from kvm_vcpu_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 09/22] KVM: Annotate -EFAULTs from kvm_vcpu_map() Anish Moorthy
2023-04-20 20:53   ` Peter Xu
2023-04-20 23:34     ` Anish Moorthy
2023-04-21 14:58       ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 10/22] KVM: x86: Annotate -EFAULTs from kvm_mmu_page_fault() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 11/22] KVM: x86: Annotate -EFAULTs from setup_vmgexit_scratch() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 12/22] KVM: x86: Annotate -EFAULTs from kvm_handle_page_fault() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 13/22] KVM: x86: Annotate -EFAULTs from kvm_hv_get_assist_page() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 14/22] KVM: x86: Annotate -EFAULTs from kvm_pv_clock_pairing() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 15/22] KVM: x86: Annotate -EFAULTs from direct_map() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 16/22] KVM: x86: Annotate -EFAULTs from kvm_handle_error_pfn() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 17/22] KVM: Introduce KVM_CAP_ABSENT_MAPPING_FAULT without implementation Anish Moorthy
2023-04-19 14:00   ` Hoo Robert
2023-04-20 18:23     ` Anish Moorthy
2023-04-24 21:02   ` Sean Christopherson
2023-06-01 16:04     ` Oliver Upton
2023-06-01 18:19   ` Oliver Upton
2023-06-01 18:59     ` Sean Christopherson
2023-06-01 19:29       ` Oliver Upton
2023-06-01 19:34         ` Sean Christopherson
2023-04-12 21:35 ` [PATCH v3 18/22] KVM: x86: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 19/22] KVM: arm64: Annotate (some) -EFAULTs from user_mem_abort() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 20/22] KVM: arm64: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 21/22] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 22/22] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2023-04-19 14:09   ` Hoo Robert
2023-04-19 16:40     ` Anish Moorthy
2023-04-20 22:47     ` Anish Moorthy
2023-04-27 15:48   ` James Houghton
2023-05-01 18:01     ` Anish Moorthy
2023-04-19 19:55 ` [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Peter Xu
2023-04-19 20:15   ` Axel Rasmussen
2023-04-19 21:05     ` Peter Xu
     [not found]       ` <CAF7b7mo68VLNp=QynfT7QKgdq=d1YYGv1SEVEDxF9UwHzF6YDw@mail.gmail.com>
2023-04-20 21:29         ` Peter Xu
2023-04-21 16:58           ` Anish Moorthy
2023-04-21 17:39           ` Nadav Amit
2023-04-24 17:54             ` Anish Moorthy
2023-04-24 19:44               ` Nadav Amit
2023-04-24 20:35                 ` Sean Christopherson
2023-04-24 23:47                   ` Nadav Amit
2023-04-25  0:26                     ` Sean Christopherson
2023-04-25  0:37                       ` Nadav Amit
2023-04-25  0:15                 ` Anish Moorthy
2023-04-25  0:54                   ` Nadav Amit
2023-04-27 16:38                     ` James Houghton
2023-04-27 20:26                   ` Peter Xu
2023-05-03 19:45                     ` Anish Moorthy
2023-05-03 20:09                       ` Sean Christopherson
     [not found]                       ` <ZFLPlRReglM/Vgfu@x1n>
2023-05-03 21:27                         ` Peter Xu
2023-05-03 21:42                           ` Sean Christopherson
2023-05-03 23:45                             ` Peter Xu
2023-05-04 19:09                               ` Peter Xu
2023-05-05 18:32                                 ` Anish Moorthy
2023-05-08  1:23                                   ` Peter Xu
2023-05-09 20:52                                     ` Anish Moorthy
2023-05-10 21:50                                       ` Peter Xu
2023-05-11 17:17                                         ` David Matlack
2023-05-11 17:33                                           ` Axel Rasmussen
2023-05-11 19:05                                             ` David Matlack
2023-05-11 19:45                                               ` Axel Rasmussen
2023-05-15 15:16                                                 ` Peter Xu
2023-05-15 15:05                                             ` Peter Xu [this message]
2023-05-15 17:16                                         ` Anish Moorthy
2023-05-05 20:05                               ` Nadav Amit
2023-05-08  1:12                                 ` Peter Xu
2023-04-20 23:42         ` Anish Moorthy
2023-05-09 22:19 ` David Matlack
2023-05-10 16:35   ` Anish Moorthy
2023-05-10 22:35   ` Sean Christopherson
2023-05-10 23:44     ` Anish Moorthy
2023-05-23 17:49     ` Anish Moorthy
2023-06-01 22:43       ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZGJKSl7mb5t30zwN@x1n \
    --to=peterx@redhat.com \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox