From: David Matlack <dmatlack@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Venkatesh Srinivas <venkateshs@chromium.org>,
kvm@vger.kernel.org, Junaid Shahid <junaids@google.com>,
Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH] KVM: mmu: spte_write_protect optimization
Date: Thu, 26 May 2022 17:18:51 +0000 [thread overview]
Message-ID: <Yo+2e2wYUnDbpcTQ@google.com> (raw)
In-Reply-To: <Yo6MacceCSi9zSmU@google.com>
On Wed, May 25, 2022 at 08:07:05PM +0000, Sean Christopherson wrote:
> +Ben and David
>
> On Wed, May 25, 2022, Venkatesh Srinivas wrote:
> > From: Junaid Shahid <junaids@google.com>
> >
> > This change uses a lighter-weight function instead of mmu_spte_update()
> > in the common case in spte_write_protect(). This helps speed up the
> > get_dirty_log IOCTL.
>
> When upstreaming our internal patches, please rewrite the changelogs to meet
> upstream standards. If that requires a lot of effort, then by all means take
> credit with a Co-developed-by or even making yourself the sole author with a
> Suggested-by for the original author.
>
> There is subtly a _lot_ going on in this patch, and practically none of it is
> explained.
>
> > Performance: dirty_log_perf_test with 32 GB VM size
> > Avg IOCTL time over 10 passes
> > Haswell: ~0.23s vs ~0.4s
> > IvyBridge: ~0.8s vs 1s
> >
> > Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
> > Signed-off-by: Junaid Shahid <junaids@google.com>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 26 +++++++++++++++++++++-----
> > 1 file changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index efe5a3dca1e0..a6db9dfaf7c3 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1151,6 +1151,22 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
> > }
> > }
> >
> > +static bool spte_test_and_clear_writable(u64 *sptep)
> > +{
> > + u64 spte = *sptep;
> > +
> > + if (spte & PT_WRITABLE_MASK) {
>
> This is redundant, the caller has already verified that the spte is writable.
>
> > + clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep);
>
> Is an unconditional atomic op faster than checking spte_has_volatile_bits()? If
> so, that should be documented with a comment.
>
> This also needs a comment explaining the mmu_lock is held for write and so
> WRITABLE can't be cleared, i.e. using clear_bit() instead of test_and_clear_bit()
> is acceptable.
>
> > + if (!spte_ad_enabled(spte))
>
> This absolutely needs a comment explaining what's going on. Specifically that if
> A/D bits are disabled, the WRITABLE bit doubles as the dirty flag, whereas hardware's
> DIRTY bit is untouched when A/D bits are in use.
>
> > + kvm_set_pfn_dirty(spte_to_pfn(spte));
> > +
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > /*
> > * Write-protect on the specified @sptep, @pt_protect indicates whether
> > * spte write-protection is caused by protecting shadow page table.
> > @@ -1174,11 +1190,11 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect)
> >
> > rmap_printk("spte %p %llx\n", sptep, *sptep);
> >
> > - if (pt_protect)
> > - spte &= ~shadow_mmu_writable_mask;
> > - spte = spte & ~PT_WRITABLE_MASK;
> > -
> > - return mmu_spte_update(sptep, spte);
> > + if (pt_protect) {
> > + spte &= ~(shadow_mmu_writable_mask | PT_WRITABLE_MASK);
> > + return mmu_spte_update(sptep, spte);
> > + }
> > + return spte_test_and_clear_writable(sptep);
>
> I think rather open code the logic instead of adding a helper. Without even more
> comments, it's not at all obvious when it's safe to use spte_test_and_clear_writable().
> And although this patch begs the question of whether or not we should do a similar
> thing for the TDP MMU's clear_dirty_gfn_range(), the TDP MMU and legacy MMU can't
> really share a helper at this time because the TDP MMU only holds mmu_lock for read.
>
> So minus all the comments that need to be added, I think this can just be:
>
> ---
> arch/x86/kvm/mmu/mmu.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..caf5db7f1dce 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1174,11 +1174,16 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect)
>
> rmap_printk("spte %p %llx\n", sptep, *sptep);
>
> - if (pt_protect)
> - spte &= ~shadow_mmu_writable_mask;
> - spte = spte & ~PT_WRITABLE_MASK;
> + if (pt_protect) {
> + spte &= ~(shadow_mmu_writable_mask | PT_WRITABLE_MASK);
> + return mmu_spte_update(sptep, spte);
> + }
>
> - return mmu_spte_update(sptep, spte);
> + clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep);
> + if (!spte_ad_enabled(spte))
> + kvm_set_pfn_dirty(spte_to_pfn(spte));
> +
> + return true;
> }
>
> static bool rmap_write_protect(struct kvm_rmap_head *rmap_head,
>
> base-commit: 9fd329c0c3309a87821817ca553b449936e318a9
> --
>
>
> And then on the TDP MMU side, I _think_ we can do the following (again, minus
> comments). The logic being that if something else races and clears the bit or
> zaps the SPTEs, there's no point in "retrying" because either the !PRESENT check
> and/or the WRITABLE/DIRTY bit check will fail.
>
> The only hiccup would be clearing a REMOVED_SPTE bit, but by sheer dumb luck, past
> me chose a semi-arbitrary value for REMOVED_SPTE that doesn't use either dirty bit.
>
> This would need to be done over two patches, e.g. to make the EPT definitions
> available to the mmu.
>
> Ben, David, any holes in my idea?
I think the idea will work. But I also wonder if we should just change
clear_dirty_gfn_range() to run under the write lock [1]. Then we could
share the code with the shadow MMU and avoid the REMOVED_SPTE
complexity.
[1] As a general rule, it seems like large (e.g. slot-wide)
single-threaded batch operations should run under the write lock.
They will be more efficient by avoiding atomic instructions, and can
still avoid lock contention via tdp_mmu_iter_cond_resched(). e.g. Ben
and I have been talking about changing eager page splitting to always
run under the write lock.
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 841feaa48be5..c28a0b9e2af1 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1600,6 +1600,8 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
> }
> }
>
> +#define VMX_EPT_DIRTY_BIT BIT_ULL(9)
> +
> /*
> * Clear the dirty status of all the SPTEs mapping GFNs in the memslot. If
> * AD bits are enabled, this will involve clearing the dirty bit on each SPTE.
> @@ -1610,35 +1612,32 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
> static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> gfn_t start, gfn_t end)
> {
> - struct tdp_iter iter;
> - u64 new_spte;
> + const int dirty_shift = ilog2(shadow_dirty_mask);
> bool spte_set = false;
> + struct tdp_iter iter;
> + int bit_nr;
> +
> + BUILD_BUG_ON(REMOVED_SPTE & PT_DIRTY_MASK);
> + BUILD_BUG_ON(REMOVED_SPTE & VMX_EPT_DIRTY_BIT);
>
> rcu_read_lock();
>
> tdp_root_for_each_leaf_pte(iter, root, start, end) {
> -retry:
> if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
> continue;
>
> if (!is_shadow_present_pte(iter.old_spte))
> continue;
>
> - if (spte_ad_need_write_protect(iter.old_spte)) {
> - if (is_writable_pte(iter.old_spte))
> - new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
> - else
> - continue;
> - } else {
> - if (iter.old_spte & shadow_dirty_mask)
> - new_spte = iter.old_spte & ~shadow_dirty_mask;
> - else
> - continue;
> - }
> + if (spte_ad_need_write_protect(iter.old_spte))
> + bit_nr = PT_WRITABLE_SHIFT;
> + else
> + bit_nr = dirty_shift;
>
> - if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
> - goto retry;
> + if (!test_and_clear_bit(bit_nr, (unsigned long *)iter.sptep))
> + continue;
>
> + kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
> spte_set = true;
> }
>
next prev parent reply other threads:[~2022-05-26 17:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-25 19:12 [PATCH] KVM: mmu: spte_write_protect optimization Venkatesh Srinivas
2022-05-25 20:07 ` Sean Christopherson
2022-05-26 17:18 ` David Matlack [this message]
2022-05-26 17:33 ` Ben Gardon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yo+2e2wYUnDbpcTQ@google.com \
--to=dmatlack@google.com \
--cc=bgardon@google.com \
--cc=junaids@google.com \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
--cc=venkateshs@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox