From: David Matlack <dmatlack@google.com>
To: Ben Gardon <bgardon@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Jim Mattson <jmattson@google.com>,
David Dunn <daviddunn@google.com>,
Jing Zhang <jingzhangos@google.com>,
Junaid Shahid <junaids@google.com>
Subject: Re: [PATCH v2 0/9] KVM: x86/MMU: Optimize disabling dirty logging
Date: Mon, 28 Mar 2022 17:49:17 +0000 [thread overview]
Message-ID: <YkH1HbuUcY4JH5tT@google.com> (raw)
In-Reply-To: <20220321224358.1305530-1-bgardon@google.com>
On Mon, Mar 21, 2022 at 03:43:49PM -0700, Ben Gardon wrote:
> Currently disabling dirty logging with the TDP MMU is extremely slow.
> On a 96 vCPU / 96G VM it takes ~256 seconds to disable dirty logging
> with the TDP MMU, as opposed to ~4 seconds with the legacy MMU. This
> series optimizes TLB flushes and introduces in-place large page
> promotion, to bring the disable dirty log time down to ~3 seconds.
>
> Testing:
> Ran KVM selftests and kvm-unit-tests on an Intel Haswell. This
> series introduced no new failures.
>
> Performance:
>
> Without this series, TDP MMU:
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 4.972184425s
> Enabling dirty logging time: 0.001943807s
>
> Iteration 1 dirty memory time: 0.061862112s
> Iteration 1 get dirty log time: 0.001416413s
> Iteration 1 clear dirty log time: 1.417428057s
> Iteration 2 dirty memory time: 0.664103656s
> Iteration 2 get dirty log time: 0.000676724s
> Iteration 2 clear dirty log time: 1.149387201s
> Disabling dirty logging time: 256.682188868s
> Get dirty log over 2 iterations took 0.002093137s. (Avg 0.001046568s/iteration)
> Clear dirty log over 2 iterations took 2.566815258s. (Avg 1.283407629s/iteration)
>
> Without this series, Legacy MMU:
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 4.892940915s
> Enabling dirty logging time: 0.001864603s
>
> Iteration 1 dirty memory time: 0.060490391s
> Iteration 1 get dirty log time: 0.001416277s
> Iteration 1 clear dirty log time: 0.323548614s
> Iteration 2 dirty memory time: 29.217064826s
> Iteration 2 get dirty log time: 0.000696202s
> Iteration 2 clear dirty log time: 0.907089084s
> Disabling dirty logging time: 4.246216551s
> Get dirty log over 2 iterations took 0.002112479s. (Avg 0.001056239s/iteration)
> Clear dirty log over 2 iterations took 1.230637698s. (Avg 0.615318849s/iteration)
>
> With this series, TDP MMU:
> (Updated since RFC. Pulling out patches 1-4 could have a performance impact.)
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 4.878083336s
> Enabling dirty logging time: 0.001874340s
>
> Iteration 1 dirty memory time: 0.054867383s
> Iteration 1 get dirty log time: 0.001368377s
> Iteration 1 clear dirty log time: 1.406960856s
> Iteration 2 dirty memory time: 0.679301083s
> Iteration 2 get dirty log time: 0.000662905s
> Iteration 2 clear dirty log time: 1.138263359s
> Disabling dirty logging time: 3.169381810s
> Get dirty log over 2 iterations took 0.002031282s. (Avg 0.001015641s/iteration)
> Clear dirty log over 2 iterations took 2.545224215s. (Avg 1.272612107s/iteration)
>
> Patch breakdown:
> Patches 1-4 remove the need for a vCPU pointer to make_spte
> Patches 5-8 are small refactors in preparation for in-place lpage promotion
> Patch 9 implements in-place largepage promotion when disabling dirty logging
>
> Changelog:
> RFC -> v1:
> Dropped the first 4 patches from the series. Patch 1 was sent
> separately, patches 2-4 will be taken over by Sean Christopherson.
> Incorporated David Matlack's Reviewed-by.
> v1 -> v2:
> Several patches were queued and dropped from this revision.
> Incorporated feedback from Peter Xu on the last patch in the series.
> Refreshed performance data
> Between versions 1 and 2 of this series, disable time without
> the TDP MMU went from 45s to 256, a major regression. I was
> testing on a skylake before and haswell this time, but that
> does not explain the huge performance loss.
>
> Ben Gardon (9):
> KVM: x86/mmu: Move implementation of make_spte to a helper
> KVM: x86/mmu: Factor mt_mask out of __make_spte
> KVM: x86/mmu: Factor shadow_zero_check out of __make_spte
> KVM: x86/mmu: Replace vcpu argument with kvm pointer in make_spte
> KVM: x86/mmu: Factor out the meat of reset_tdp_shadow_zero_bits_mask
> KVM: x86/mmu: Factor out part of vmx_get_mt_mask which does not depend
> on vcpu
> KVM: x86/mmu: Add try_get_mt_mask to x86_ops
> KVM: x86/mmu: Make kvm_is_mmio_pfn usable outside of spte.c
> KVM: x86/mmu: Promote pages in-place when disabling dirty logging
Use () after function names to make it clear you are referring to a
function and not something else. e.g.
KVM: x86/mmu: Move implementation of make_spte to a helper
becomes
KVM: x86/mmu: Move implementation of make_spte() to a helper
This applies throughout the series, in commit messages and comments.
>
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/mmu/mmu.c | 21 +++++----
> arch/x86/kvm/mmu/mmu_internal.h | 6 +++
> arch/x86/kvm/mmu/spte.c | 39 +++++++++++-----
> arch/x86/kvm/mmu/spte.h | 6 +++
> arch/x86/kvm/mmu/tdp_mmu.c | 73 +++++++++++++++++++++++++++++-
> arch/x86/kvm/svm/svm.c | 9 ++++
> arch/x86/kvm/vmx/vmx.c | 25 ++++++++--
> 9 files changed, 155 insertions(+), 27 deletions(-)
>
> --
> 2.35.1.894.gb6a874cedc-goog
>
prev parent reply other threads:[~2022-03-28 17:49 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-21 22:43 [PATCH v2 0/9] KVM: x86/MMU: Optimize disabling dirty logging Ben Gardon
2022-03-21 22:43 ` [PATCH v2 1/9] KVM: x86/mmu: Move implementation of make_spte to a helper Ben Gardon
2022-03-21 22:43 ` [PATCH v2 2/9] KVM: x86/mmu: Factor mt_mask out of __make_spte Ben Gardon
2022-03-21 22:43 ` [PATCH v2 3/9] KVM: x86/mmu: Factor shadow_zero_check " Ben Gardon
2022-04-12 15:52 ` Sean Christopherson
2022-03-21 22:43 ` [PATCH v2 4/9] KVM: x86/mmu: Replace vcpu argument with kvm pointer in make_spte Ben Gardon
2022-03-21 22:43 ` [PATCH v2 5/9] KVM: x86/mmu: Factor out the meat of reset_tdp_shadow_zero_bits_mask Ben Gardon
2022-04-12 15:46 ` Sean Christopherson
2022-04-21 18:50 ` Ben Gardon
2022-04-21 19:09 ` Ben Gardon
2022-03-21 22:43 ` [PATCH v2 6/9] KVM: x86/mmu: Factor out part of vmx_get_mt_mask which does not depend on vcpu Ben Gardon
2022-03-28 18:04 ` David Matlack
2022-03-21 22:43 ` [PATCH v2 7/9] KVM: x86/mmu: Add try_get_mt_mask to x86_ops Ben Gardon
2022-04-11 23:00 ` Sean Christopherson
2022-04-11 23:24 ` Ben Gardon
2022-04-11 23:33 ` Sean Christopherson
2022-04-12 19:30 ` Sean Christopherson
2022-03-21 22:43 ` [PATCH v2 8/9] KVM: x86/mmu: Make kvm_is_mmio_pfn usable outside of spte.c Ben Gardon
2022-04-12 19:39 ` Sean Christopherson
2022-03-21 22:43 ` [PATCH v2 9/9] KVM: x86/mmu: Promote pages in-place when disabling dirty logging Ben Gardon
2022-03-28 17:45 ` David Matlack
2022-03-28 18:07 ` Ben Gardon
2022-03-28 18:20 ` David Matlack
2022-07-12 23:21 ` Sean Christopherson
2022-07-13 16:20 ` Sean Christopherson
2022-03-28 18:21 ` David Matlack
2022-04-12 16:43 ` Sean Christopherson
2022-04-25 18:09 ` Ben Gardon
2022-03-25 12:00 ` [PATCH v2 0/9] KVM: x86/MMU: Optimize " Paolo Bonzini
2022-07-12 1:37 ` Sean Christopherson
2022-07-14 7:55 ` Paolo Bonzini
2022-07-14 15:27 ` Sean Christopherson
2022-03-28 17:49 ` David Matlack [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YkH1HbuUcY4JH5tT@google.com \
--to=dmatlack@google.com \
--cc=bgardon@google.com \
--cc=daviddunn@google.com \
--cc=jingzhangos@google.com \
--cc=jmattson@google.com \
--cc=junaids@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.