From: David Matlack <dmatlack@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Bibo Mao <maobibo@loongson.cn>
Subject: Re: [PATCH v3] KVM: x86/mmu: Always drop mmu_lock to allocate TDP MMU SPs for eager splitting
Date: Tue, 11 Jun 2024 08:29:38 -0700 [thread overview]
Message-ID: <ZmhtYqtAou031wjV@google.com> (raw)
In-Reply-To: <Zl4H0xVkkq5p507k@google.com>
On 2024-06-03 11:13 AM, Sean Christopherson wrote:
> On Thu, May 09, 2024, David Matlack wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index aaa2369a9479..2089d696e3c6 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1385,11 +1385,11 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
> > return spte_set;
> > }
> >
> > -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
> > +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(void)
> > {
> > + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
> > struct kvm_mmu_page *sp;
> >
> > - gfp |= __GFP_ZERO;
> >
> > sp = kmem_cache_alloc(mmu_page_header_cache, gfp);
>
> This can more simply and cleary be:
>
> sp = kmem_cache_zalloc(mmu_page_header_cache, GFP_KERNEL_ACCOUNT);
Will do. And I assume you'd prefer get_zeroed_page(GFP_KERNEL_ACCOUNT)
as well below?
>
> > if (!sp)
> > @@ -1412,19 +1412,6 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
> >
> > kvm_lockdep_assert_mmu_lock_held(kvm, shared);
> >
> > - /*
> > - * Since we are allocating while under the MMU lock we have to be
> > - * careful about GFP flags. Use GFP_NOWAIT to avoid blocking on direct
> > - * reclaim and to avoid making any filesystem callbacks (which can end
> > - * up invoking KVM MMU notifiers, resulting in a deadlock).
> > - *
> > - * If this allocation fails we drop the lock and retry with reclaim
> > - * allowed.
> > - */
> > - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
> > - if (sp)
> > - return sp;
> > -
> > rcu_read_unlock();
> >
> > if (shared)
> > @@ -1433,7 +1420,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
> > write_unlock(&kvm->mmu_lock);
> >
> > iter->yielded = true;
>
> Now that yielding is unconditional, this really should be in the loop itself.
> The bare continue looks weird, and it's unnecessarily hard to see that "yielded"
> is being set.
>
> And there's definitely no reason to have two helpers.
>
> Not sure how many patches it'll take, but I think we should end up with:
>
> static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(void)
> {
> struct kvm_mmu_page *sp;
>
> sp = kmem_cache_zalloc(mmu_page_header_cache, GFP_KERNEL_ACCOUNT);
> if (!sp)
> return NULL;
>
> sp->spt = (void *)__get_free_page(gfp);
> if (!sp->spt) {
> kmem_cache_free(mmu_page_header_cache, sp);
> return NULL;
> }
>
> return sp;
> }
>
> static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> struct kvm_mmu_page *root,
> gfn_t start, gfn_t end,
> int target_level, bool shared)
> {
> struct kvm_mmu_page *sp = NULL;
> struct tdp_iter iter;
>
> rcu_read_lock();
>
> /*
> * Traverse the page table splitting all huge pages above the target
> * level into one lower level. For example, if we encounter a 1GB page
> * we split it into 512 2MB pages.
> *
> * Since the TDP iterator uses a pre-order traversal, we are guaranteed
> * to visit an SPTE before ever visiting its children, which means we
> * will correctly recursively split huge pages that are more than one
> * level above the target level (e.g. splitting a 1GB to 512 2MB pages,
> * and then splitting each of those to 512 4KB pages).
> */
> for_each_tdp_pte_min_level(iter, root, target_level + 1, start, end) {
> retry:
> if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
> continue;
>
> if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
> continue;
>
> if (!sp) {
> rcu_read_unlock();
>
> if (shared)
> read_unlock(&kvm->mmu_lock);
> else
> write_unlock(&kvm->mmu_lock);
>
> sp = tdp_mmu_alloc_sp_for_split(kvm, &iter, shared);
>
> if (shared)
> read_lock(&kvm->mmu_lock);
> else
> write_lock(&kvm->mmu_lock);
>
> if (!sp) {
> trace_kvm_mmu_split_huge_page(iter.gfn,
> iter.old_spte,
> iter.level, -ENOMEM);
> return -ENOMEM;
> }
>
> rcu_read_lock();
>
> iter->yielded = true;
> continue;
> }
>
> tdp_mmu_init_child_sp(sp, &iter);
>
> if (tdp_mmu_split_huge_page(kvm, &iter, sp, shared))
> goto retry;
>
> sp = NULL;
> }
>
> rcu_read_unlock();
>
> /*
> * It's possible to exit the loop having never used the last sp if, for
> * example, a vCPU doing HugePage NX splitting wins the race and
> * installs its own sp in place of the last sp we tried to split.
> */
> if (sp)
> tdp_mmu_free_sp(sp);
>
> return 0;
> }
Ack, will do.
next prev parent reply other threads:[~2024-06-11 15:29 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-09 18:11 [PATCH v3] KVM: x86/mmu: Always drop mmu_lock to allocate TDP MMU SPs for eager splitting David Matlack
2024-06-03 18:13 ` Sean Christopherson
2024-06-11 15:29 ` David Matlack [this message]
2024-06-11 16:35 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmhtYqtAou031wjV@google.com \
--to=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=maobibo@loongson.cn \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).