Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.

Linux KVM/arm64 development list
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Anish Moorthy <amoorthy@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	maz@kernel.org, oliver.upton@linux.dev,
	Sean Christopherson <seanjc@google.com>,
	James Houghton <jthoughton@google.com>,
	bgardon@google.com, dmatlack@google.com, ricarkol@google.com,
	kvm <kvm@vger.kernel.org>,
	kvmarm@lists.linux.dev
Subject: Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.
Date: Wed, 3 May 2023 17:18:13 -0400	[thread overview]
Message-ID: <ZFLPlRReglM/Vgfu@x1n> (raw)
In-Reply-To: <CAF7b7mqaxk6w90+9+5UkEAE13vDTmBMmCO_ZdAEo6pD8_--fZA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5133 bytes --]

On Wed, May 03, 2023 at 12:45:07PM -0700, Anish Moorthy wrote:
> On Thu, Apr 27, 2023 at 1:26 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Thanks (for doing this test, and also to Nadav for all his inputs), and
> > sorry for a late response.
> 
> No need to apologize: anyways, I've got you comfortably beat on being
> late at this point :)
> 
> > These numbers caught my eye, and I'm very curious why even 2 vcpus can
> > scale that bad.
> >
> > I gave it a shot on a test machine and I got something slightly different:
> >
> >   Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores, 40 threads)
> >   $ ./demand_paging_test -b 512M -u MINOR -s shmem -v N
> >   |-------+----------+--------|
> >   | n_thr | per-vcpu | total  |
> >   |-------+----------+--------|
> >   |     1 | 39.5K    | 39.5K  |
> >   |     2 | 33.8K    | 67.6K  |
> >   |     4 | 31.8K    | 127.2K |
> >   |     8 | 30.8K    | 246.1K |
> >   |    16 | 21.9K    | 351.0K |
> >   |-------+----------+--------|
> >
> > I used larger ram due to less cores.  I didn't try 32+ vcpus to make sure I
> > don't have two threads content on a core/thread already since I only got 40
> > hardware threads there, but still we can compare with your lower half.
> >
> > When I was testing I noticed bad numbers and another bug on not using
> > NSEC_PER_SEC properly, so I did this before the test:
> >
> > https://lore.kernel.org/all/20230427201112.2164776-1-peterx@redhat.com/
> >
> > I think it means it still doesn't scale that good, however not so bad
> > either - no obvious 1/2 drop on using 2vcpus.  There're still a bunch of
> > paths triggered in the test so I also don't expect it to fully scale
> > linearly.  From my numbers I just didn't see as drastic as yours. I'm not
> > sure whether it's simply broken test number, parameter differences
> > (e.g. you used 64M only per-vcpu), or hardware differences.
> 
> Hmm, I suspect we're dealing with  hardware differences here. I
> rebased my changes onto those two patches you sent up, taking care not
> to clobber them, but even with the repro command you provided my
> results look very different than yours (at least on 1-4 vcpus) on the
> machine I've been testing on (4x AMD EPYC 7B13 64-Core, 2.2GHz).
> 
> (n=20)
> n_thr      per_vcpu       total
> 1            154K              154K
> 2             92k                184K
> 4             71K                285K
> 8             36K                291K
> 16           19K                310K
> 
> Out of interested I tested on another machine (Intel(R) Xeon(R)
> Platinum 8273CL CPU @ 2.20GHz) as well, and results are a bit
> different again
> 
> (n=20)
> n_thr      per_vcpu       total
> 1            115K              115K
> 2             103k              206K
> 4             65K                262K
> 8             39K                319K
> 16           19K                398K

Interesting.

> 
> It is interesting how all three sets of numbers start off different
> but seem to converge around 16 vCPUs. I did check to make sure the
> memory fault exits sped things up in all cases, and that at least
> stays true.
> 
> By the way, I've got a little helper script that I've been using to
> run/average the selftest results (which can vary quite a bit). I've
> attached it below- hopefully it doesn't bounce from the mailing list.
> Just for reference, the invocation to test the command you provided is
> 
> > python dp_runner.py --num_runs 20 --max_cores 16 --percpu_mem 512M

I found that indeed I shouldn't have stopped at 16 vcpus since that's
exactly where it starts to bottleneck. :)

So out of my curiosity I tried to profile 32 vcpus case on my system with
this test case, meanwhile I tried it both with:

  - 1 uffd + 8 readers
  - 32 uffds (so 32 readers)

I've got the flamegraphs attached for both.

It seems that when using >1 uffds the bottleneck is not the spinlock
anymore but something else.

From what I got there, vmx_vcpu_load() gets more highlights than the
spinlocks. I think that's the tlb flush broadcast.

While OTOH indeed when using 1 uffd we can see obviously the overhead of
spinlock contention on either the fault() path or read()/poll() as you and
James rightfully pointed out.

I'm not sure whether my number is caused by special setup, though. After
all I only had 40 threads and I started 32 vcpus + 8 readers and there'll
be contention already between the workloads.

IMHO this means that there's still chance to provide a more generic
userfaultfd scaling solution as long as we can remove the single spinlock
contention on the fault/fault_pending queues.  I'll see whether I can still
explore a bit on the possibility of this and keep you guys updated.  The
general idea here to me is still to make multi-queue out of 1 uffd.

I _think_ this might also be a positive result to your work, because if the
bottleneck is not userfaultfd (as we scale it with creating multiple;
ignoring the split vma effect), then it cannot be resolved by scaling
userfaultfd alone anyway, anymore.  So a general solution, even if existed,
may not work here for kvm, because we'll get stuck somewhere else already.

-- 
Peter Xu

[-- Attachment #2: uffd-1-reader-8.svg --]
[-- Type: image/svg+xml, Size: 427017 bytes --]

[-- Attachment #3: uffd-32-reader-32.svg --]
[-- Type: image/svg+xml, Size: 220618 bytes --]

next prev parent reply	other threads:[~2023-05-03 21:18 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 21:34 [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 01/22] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
2023-04-19 13:51   ` Hoo Robert
2023-04-20 17:55     ` Anish Moorthy
2023-04-21 12:15       ` Robert Hoo
2023-04-21 16:21         ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 02/22] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-04-19 13:36   ` Hoo Robert
2023-04-19 23:26     ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 03/22] KVM: Allow hva_pfn_fast() to resolve read-only faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 04/22] KVM: x86: Set vCPU exit reason to KVM_EXIT_UNKNOWN at the start of KVM_RUN Anish Moorthy
2023-05-02 17:17   ` Anish Moorthy
2023-05-02 18:51     ` Sean Christopherson
2023-05-02 19:49       ` Anish Moorthy
2023-05-02 20:41         ` Sean Christopherson
2023-05-02 21:46           ` Anish Moorthy
2023-05-02 22:31             ` Sean Christopherson
2023-04-12 21:34 ` [PATCH v3 05/22] KVM: Add KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
2023-04-19 13:57   ` Hoo Robert
2023-04-20 18:09     ` Anish Moorthy
2023-04-21 12:28       ` Robert Hoo
2023-06-01 19:52   ` Oliver Upton
2023-06-01 20:30     ` Anish Moorthy
2023-06-01 21:29       ` Oliver Upton
2023-07-04 10:10   ` Kautuk Consul
2023-04-12 21:34 ` [PATCH v3 06/22] KVM: Add docstrings to __kvm_write_guest_page() and __kvm_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 07/22] KVM: Annotate -EFAULTs from kvm_vcpu_write_guest_page() Anish Moorthy
2023-04-20 20:52   ` Peter Xu
2023-04-20 23:29     ` Anish Moorthy
2023-04-21 15:00       ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 08/22] KVM: Annotate -EFAULTs from kvm_vcpu_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 09/22] KVM: Annotate -EFAULTs from kvm_vcpu_map() Anish Moorthy
2023-04-20 20:53   ` Peter Xu
2023-04-20 23:34     ` Anish Moorthy
2023-04-21 14:58       ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 10/22] KVM: x86: Annotate -EFAULTs from kvm_mmu_page_fault() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 11/22] KVM: x86: Annotate -EFAULTs from setup_vmgexit_scratch() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 12/22] KVM: x86: Annotate -EFAULTs from kvm_handle_page_fault() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 13/22] KVM: x86: Annotate -EFAULTs from kvm_hv_get_assist_page() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 14/22] KVM: x86: Annotate -EFAULTs from kvm_pv_clock_pairing() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 15/22] KVM: x86: Annotate -EFAULTs from direct_map() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 16/22] KVM: x86: Annotate -EFAULTs from kvm_handle_error_pfn() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 17/22] KVM: Introduce KVM_CAP_ABSENT_MAPPING_FAULT without implementation Anish Moorthy
2023-04-19 14:00   ` Hoo Robert
2023-04-20 18:23     ` Anish Moorthy
2023-04-24 21:02   ` Sean Christopherson
2023-06-01 16:04     ` Oliver Upton
2023-06-01 18:19   ` Oliver Upton
2023-06-01 18:59     ` Sean Christopherson
2023-06-01 19:29       ` Oliver Upton
2023-06-01 19:34         ` Sean Christopherson
2023-04-12 21:35 ` [PATCH v3 18/22] KVM: x86: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 19/22] KVM: arm64: Annotate (some) -EFAULTs from user_mem_abort() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 20/22] KVM: arm64: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 21/22] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 22/22] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2023-04-19 14:09   ` Hoo Robert
2023-04-19 16:40     ` Anish Moorthy
2023-04-20 22:47     ` Anish Moorthy
2023-04-27 15:48   ` James Houghton
2023-05-01 18:01     ` Anish Moorthy
2023-04-19 19:55 ` [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Peter Xu
2023-04-19 20:15   ` Axel Rasmussen
2023-04-19 21:05     ` Peter Xu
2023-04-19 21:53       ` Anish Moorthy
2023-04-20 21:29         ` Peter Xu
2023-04-21 16:58           ` Anish Moorthy
2023-04-21 17:39           ` Nadav Amit
2023-04-24 17:54             ` Anish Moorthy
2023-04-24 19:44               ` Nadav Amit
2023-04-24 20:35                 ` Sean Christopherson
2023-04-24 23:47                   ` Nadav Amit
2023-04-25  0:26                     ` Sean Christopherson
2023-04-25  0:37                       ` Nadav Amit
2023-04-25  0:15                 ` Anish Moorthy
2023-04-25  0:54                   ` Nadav Amit
2023-04-27 16:38                     ` James Houghton
2023-04-27 20:26                   ` Peter Xu
2023-05-03 19:45                     ` Anish Moorthy
2023-05-03 20:09                       ` Sean Christopherson
2023-05-03 21:18                       ` Peter Xu [this message]
2023-05-03 21:27                         ` Peter Xu
2023-05-03 21:42                           ` Sean Christopherson
2023-05-03 23:45                             ` Peter Xu
2023-05-04 19:09                               ` Peter Xu
2023-05-05 18:32                                 ` Anish Moorthy
2023-05-08  1:23                                   ` Peter Xu
2023-05-09 20:52                                     ` Anish Moorthy
2023-05-10 21:50                                       ` Peter Xu
2023-05-11 17:17                                         ` David Matlack
2023-05-11 17:33                                           ` Axel Rasmussen
2023-05-11 19:05                                             ` David Matlack
2023-05-11 19:45                                               ` Axel Rasmussen
2023-05-15 15:16                                                 ` Peter Xu
2023-05-15 15:05                                             ` Peter Xu
2023-05-15 17:16                                         ` Anish Moorthy
2023-05-05 20:05                               ` Nadav Amit
2023-05-08  1:12                                 ` Peter Xu
2023-04-20 23:42         ` Anish Moorthy
2023-05-09 22:19 ` David Matlack
2023-05-10 16:35   ` Anish Moorthy
2023-05-10 22:35   ` Sean Christopherson
2023-05-10 23:44     ` Anish Moorthy
2023-05-23 17:49     ` Anish Moorthy
2023-06-01 22:43       ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFLPlRReglM/Vgfu@x1n \
    --to=peterx@redhat.com \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox