From: David Matlack <dmatlack@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Bibo Mao <maobibo@loongson.cn>
Subject: Re: [PATCH v3] KVM: x86/mmu: Always drop mmu_lock to allocate TDP MMU SPs for eager splitting
Date: Tue, 11 Jun 2024 08:29:38 -0700 [thread overview]
Message-ID: <ZmhtYqtAou031wjV@google.com> (raw)
In-Reply-To: <Zl4H0xVkkq5p507k@google.com>
On 2024-06-03 11:13 AM, Sean Christopherson wrote:
> On Thu, May 09, 2024, David Matlack wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index aaa2369a9479..2089d696e3c6 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1385,11 +1385,11 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
> > return spte_set;
> > }
> >
> > -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
> > +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(void)
> > {
> > + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
> > struct kvm_mmu_page *sp;
> >
> > - gfp |= __GFP_ZERO;
> >
> > sp = kmem_cache_alloc(mmu_page_header_cache, gfp);
>
> This can more simply and cleary be:
>
> sp = kmem_cache_zalloc(mmu_page_header_cache, GFP_KERNEL_ACCOUNT);
Will do. And I assume you'd prefer get_zeroed_page(GFP_KERNEL_ACCOUNT)
as well below?
>
> > if (!sp)
> > @@ -1412,19 +1412,6 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
> >
> > kvm_lockdep_assert_mmu_lock_held(kvm, shared);
> >
> > - /*
> > - * Since we are allocating while under the MMU lock we have to be
> > - * careful about GFP flags. Use GFP_NOWAIT to avoid blocking on direct
> > - * reclaim and to avoid making any filesystem callbacks (which can end
> > - * up invoking KVM MMU notifiers, resulting in a deadlock).
> > - *
> > - * If this allocation fails we drop the lock and retry with reclaim
> > - * allowed.
> > - */
> > - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
> > - if (sp)
> > - return sp;
> > -
> > rcu_read_unlock();
> >
> > if (shared)
> > @@ -1433,7 +1420,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
> > write_unlock(&kvm->mmu_lock);
> >
> > iter->yielded = true;
>
> Now that yielding is unconditional, this really should be in the loop itself.
> The bare continue looks weird, and it's unnecessarily hard to see that "yielded"
> is being set.
>
> And there's definitely no reason to have two helpers.
>
> Not sure how many patches it'll take, but I think we should end up with:
>
> static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(void)
> {
> struct kvm_mmu_page *sp;
>
> sp = kmem_cache_zalloc(mmu_page_header_cache, GFP_KERNEL_ACCOUNT);
> if (!sp)
> return NULL;
>
> sp->spt = (void *)__get_free_page(gfp);
> if (!sp->spt) {
> kmem_cache_free(mmu_page_header_cache, sp);
> return NULL;
> }
>
> return sp;
> }
>
> static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> struct kvm_mmu_page *root,
> gfn_t start, gfn_t end,
> int target_level, bool shared)
> {
> struct kvm_mmu_page *sp = NULL;
> struct tdp_iter iter;
>
> rcu_read_lock();
>
> /*
> * Traverse the page table splitting all huge pages above the target
> * level into one lower level. For example, if we encounter a 1GB page
> * we split it into 512 2MB pages.
> *
> * Since the TDP iterator uses a pre-order traversal, we are guaranteed
> * to visit an SPTE before ever visiting its children, which means we
> * will correctly recursively split huge pages that are more than one
> * level above the target level (e.g. splitting a 1GB to 512 2MB pages,
> * and then splitting each of those to 512 4KB pages).
> */
> for_each_tdp_pte_min_level(iter, root, target_level + 1, start, end) {
> retry:
> if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
> continue;
>
> if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
> continue;
>
> if (!sp) {
> rcu_read_unlock();
>
> if (shared)
> read_unlock(&kvm->mmu_lock);
> else
> write_unlock(&kvm->mmu_lock);
>
> sp = tdp_mmu_alloc_sp_for_split(kvm, &iter, shared);
>
> if (shared)
> read_lock(&kvm->mmu_lock);
> else
> write_lock(&kvm->mmu_lock);
>
> if (!sp) {
> trace_kvm_mmu_split_huge_page(iter.gfn,
> iter.old_spte,
> iter.level, -ENOMEM);
> return -ENOMEM;
> }
>
> rcu_read_lock();
>
> iter->yielded = true;
> continue;
> }
>
> tdp_mmu_init_child_sp(sp, &iter);
>
> if (tdp_mmu_split_huge_page(kvm, &iter, sp, shared))
> goto retry;
>
> sp = NULL;
> }
>
> rcu_read_unlock();
>
> /*
> * It's possible to exit the loop having never used the last sp if, for
> * example, a vCPU doing HugePage NX splitting wins the race and
> * installs its own sp in place of the last sp we tried to split.
> */
> if (sp)
> tdp_mmu_free_sp(sp);
>
> return 0;
> }
Ack, will do.
next prev parent reply other threads:[~2024-06-11 15:29 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-09 18:11 [PATCH v3] KVM: x86/mmu: Always drop mmu_lock to allocate TDP MMU SPs for eager splitting David Matlack
2024-06-03 18:13 ` Sean Christopherson
2024-06-11 15:29 ` David Matlack [this message]
2024-06-11 16:35 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmhtYqtAou031wjV@google.com \
--to=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=maobibo@loongson.cn \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.