Re: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware
Date: Mon, 1 Aug 2022 23:56:21 +0000	[thread overview]
Message-ID: <YuhoJUoPBOu5eMz8@google.com> (raw)
In-Reply-To: <YuhPT2drgqL+osLl@google.com>

On Mon, Aug 01, 2022, David Matlack wrote:
> On Mon, Aug 01, 2022 at 08:19:28AM -0700, Vipin Sharma wrote:
> That being said, KVM currently has a gap where a guest doing a lot of
> remote memory accesses when touching memory for the first time will
> cause KVM to allocate the TDP page tables on the arguably wrong node.

Userspace can solve this by setting the NUMA policy on a VMA or shared-object
basis.  E.g. create dedicated memslots for each NUMA node, then bind each of the
backing stores to the appropriate host node.

If there is a gap, e.g. a backing store we want to use doesn't properly support
mempolicy for shared mappings, then we should enhance the backing store.

> > We can improve TDP MMU eager page splitting by making
> > tdp_mmu_alloc_sp_for_split() NUMA-aware. Specifically, when splitting a
> > huge page, allocate the new lower level page tables on the same node as the
> > huge page.
> > 
> > __get_free_page() is replaced by alloc_page_nodes(). This introduces two
> > functional changes.
> > 
> >   1. __get_free_page() removes gfp flag __GFP_HIGHMEM via call to
> >   __get_free_pages(). This should not be an issue  as __GFP_HIGHMEM flag is
> >   not passed in tdp_mmu_alloc_sp_for_split() anyway.
> > 
> >   2. __get_free_page() calls alloc_pages() and use thread's mempolicy for
> >   the NUMA node allocation. From this commit, thread's mempolicy will not
> >   be used and first preference will be to allocate on the node where huge
> >   page was present.
> 
> It would be worth noting that userspace could change the mempolicy of
> the thread doing eager splitting to prefer allocating from the target
> NUMA node, as an alternative approach.
> 
> I don't prefer the alternative though since it bleeds details from KVM
> into userspace, such as the fact that enabling dirty logging does eager
> page splitting, which allocates page tables. 

As above, if userspace is cares about vNUMA, then it already needs to be aware of
some of KVM/kernel details.  Separate memslots aren't strictly necessary, e.g.
userspace could stitch together contiguous VMAs to create a single mega-memslot,
but that seems like it'd be more work than just creating separate memslots.

And because eager page splitting for dirty logging runs with mmu_lock held for,
userspace might also benefit from per-node memslots as it can do the splitting on
multiple tasks/CPUs.

Regardless of what we do, the behavior needs to be document, i.e. KVM details will
bleed into userspace.  E.g. if KVM is overriding the per-task NUMA policy, then
that should be documented.

> It's also unnecessary since KVM can infer an appropriate NUMA placement
> without the help of userspace, and I can't think of a reason for userspace to
> prefer a different policy.

I can't think of a reason why userspace would want to have a different policy for
the task that's enabling dirty logging, but I also can't think of a reason why
KVM should go out of its way to ignore that policy.

IMO this is a "bug" in dirty_log_perf_test, though it's probably a good idea to
document how to effectively configure vNUMA-aware memslots.

next prev parent reply	other threads:[~2022-08-01 23:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 15:19 [PATCH] KVM: x86/mmu: Make page tables for eager page splitting NUMA aware Vipin Sharma
2022-08-01 22:10 ` David Matlack
2022-08-01 23:56   ` Sean Christopherson [this message]
2022-08-02 16:31     ` David Matlack
2022-08-02 17:22       ` Sean Christopherson
2022-08-02 17:42         ` Paolo Bonzini
2022-08-02 19:12           ` Sean Christopherson
2022-08-03 15:08             ` Paolo Bonzini
2022-08-05 23:30               ` Vipin Sharma
2022-08-09 16:52                 ` David Matlack
2022-08-09 23:51                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuhoJUoPBOu5eMz8@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.