public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Vipin Sharma <vipinsh@google.com>
Cc: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
	jmattson@google.com, mizhang@google.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Patch v4 11/18] KVM: x86/mmu: Add documentation of NUMA aware page table capability
Date: Thu, 23 Mar 2023 14:59:48 -0700	[thread overview]
Message-ID: <ZBzL1Awe7S00dPUP@google.com> (raw)
In-Reply-To: <20230306224127.1689967-12-vipinsh@google.com>

On Mon, Mar 06, 2023 at 02:41:20PM -0800, Vipin Sharma wrote:
> Add documentation for KVM_CAP_NUMA_AWARE_PAGE_TABLE capability and
> explain why it is needed.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 62de0768d6aa..7e3a1299ca8e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7669,6 +7669,35 @@ This capability is aimed to mitigate the threat that malicious VMs can
>  cause CPU stuck (due to event windows don't open up) and make the CPU
>  unavailable to host or other VMs.
>  
> +7.34 KVM_CAP_NUMA_AWARE_PAGE_TABLE
> +------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Returns: 0 on success, -EINVAL if vCPUs are already created.
> +
> +This capability allows userspace to enable NUMA aware page tables allocations.

Call out that this capability overrides task mempolicies. e.g.

  This capability causes KVM to use a custom NUMA memory policy when
  allocating page tables. Specifically, KVM will attempt to co-locate
  page tables pages with the memory that they map, rather than following
  the mempolicy of the current task.

> +NUMA aware page tables are disabled by default. Once enabled, prior to vCPU
> +creation, any page table allocated during the life of a VM will be allocated

The "prior to vCPU creation" part here is confusing because it sounds
like you're talking about any page tables allocated before vCPU
creation. Just delete that part and put it in a separate paragraph.

 KVM_CAP_NUMA_AWARE_PAGE_TABLE must be enabled before any vCPU is
 created, otherwise KVM will return -EINVAL.

> +preferably from the NUMA node of the leaf page.
> +
> +Without this capability, default feature is to use current thread mempolicy and

s/default feature is to/KVM will/

> +allocate page table based on that.

s/and allocate page table based on that./to allocate page tables./

> +
> +This capability is useful to improve page accesses by a guest. For example, an

nit: Be more specific about how.

 This capability aims to minimize the cost of TLB misses when a vCPU is
 accessing NUMA-local memory, by reducing the number of remote memory
 accesses needed to walk KVM's page tables.

> +initialization thread which access lots of remote memory and ends up creating
> +page tables on local NUMA node, or some service thread allocates memory on
> +remote NUMA nodes and later worker/background threads accessing that memory
> +will end up accessing remote NUMA node page tables.

It's not clear if these examples are talking about what happens when
KVM_CAP_NUMA_AWARE_PAGE_TABLE is enabled or disabled.

Also it's important to distinguish virtual NUMA nodes from physical NUMA
nodes and where these "threads" are running. How about this:

 For example, when KVM_CAP_NUMA_AWARE_PAGE_TABLE is disabled and a vCPU
 accesses memory on a remote NUMA node and triggers a KVM page fault,
 KVM will allocate page tables to handle that fault on the node where
 the vCPU is running rather than the node where the memory is allocated.
 When KVM_CAP_NUMA_AWARE_PAGE_TABLE is enabled, KVM will allocate the
 page tables on the node where the memory is located.

 This is intended to be used in VM configurations that properly
 virtualize NUMA. i.e. VMs with one or more virtual NUMA nodes, each of
 which is mapped to a physical NUMA node. With this capability enabled
 on such VMs, any guest memory access to virtually-local memory will be
 translated through mostly[*] physically-local page tables, regardless
 of how the memory was faulted in.

 [*] KVM will fallback to allocating from remote NUMA nodes if the
 preferred node is out of memory. Also, in VMs with 2 or more NUMA
 nodes, higher level page tables will necessarily map memory across
 multiple physical nodes.

> So, a multi NUMA node
> +guest, can with high confidence access local memory faster instead of going
> +through remote page tables first.
> +
> +This capability is also helpful for host to reduce live migration impact when
> +splitting huge pages during dirty log operations. If the thread splitting huge
> +page is on remote NUMA node it will create page tables on remote node. Even if
> +guest is careful in making sure that it only access local memory they will end
> +up accessing remote page tables.

Please also cover the limitations of this feature:

 - Impact on remote memory accesses (more expensive).
 - How KVM handles NUMA node exhaustion.
 - How high-level page tables can span multiple nodes.
 - What KVM does if it can't determine the NUMA node of the pfn.
 - What KVM does for faults on GPAs that aren't backed by a pfn.

> +
>  8. Other capabilities.
>  ======================
>  
> -- 
> 2.40.0.rc0.216.gc4246ad0f0-goog
> 

  reply	other threads:[~2023-03-23 22:00 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 22:41 [Patch v4 00/18] NUMA aware page table allocation Vipin Sharma
2023-03-06 22:41 ` [Patch v4 01/18] KVM: x86/mmu: Change KVM mmu shrinker to no-op Vipin Sharma
2023-03-06 22:41 ` [Patch v4 02/18] KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} Vipin Sharma
2023-03-06 22:41 ` [Patch v4 03/18] KVM: x86/mmu: Track count of pages in KVM MMU page caches globally Vipin Sharma
2023-03-07 11:32   ` kernel test robot
2023-03-07 19:13     ` Vipin Sharma
2023-03-07 20:18       ` Sean Christopherson
2023-03-07 12:13   ` kernel test robot
2023-03-08 20:33   ` Zhi Wang
2023-03-08 22:16     ` Vipin Sharma
2023-03-09  5:18       ` Mingwei Zhang
2023-03-09 12:52         ` Zhi Wang
2023-03-09 19:52           ` Vipin Sharma
2023-03-09 15:37   ` Zhi Wang
2023-03-09 18:19     ` Vipin Sharma
2023-03-09 23:53   ` David Matlack
2023-03-10  0:28     ` Vipin Sharma
2023-03-10  0:55       ` David Matlack
2023-03-10  1:09         ` Vipin Sharma
2023-03-10  0:22   ` David Matlack
2023-03-10  0:36     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 04/18] KVM: x86/mmu: Shrink shadow page caches via MMU shrinker Vipin Sharma
2023-03-06 22:41 ` [Patch v4 05/18] KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU cache pages Vipin Sharma
2023-03-09 15:58   ` Zhi Wang
2023-03-09 19:59     ` Vipin Sharma
2023-03-10  0:05       ` David Matlack
2023-03-10  0:06         ` David Matlack
2023-03-06 22:41 ` [Patch v4 06/18] KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker Vipin Sharma
2023-03-09 16:01   ` Zhi Wang
2023-03-09 19:59     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 07/18] KVM: x86/mmu: Unconditionally count allocations from MMU page caches Vipin Sharma
2023-03-09 16:03   ` Zhi Wang
2023-03-06 22:41 ` [Patch v4 08/18] KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via global counter Vipin Sharma
2023-03-30  4:53   ` Yang, Weijiang
2023-04-03 23:02     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 09/18] KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker Vipin Sharma
2023-03-06 22:41 ` [Patch v4 10/18] KVM: x86/mmu: Add per VM NUMA aware page table capability Vipin Sharma
2023-03-06 22:41 ` [Patch v4 11/18] KVM: x86/mmu: Add documentation of " Vipin Sharma
2023-03-23 21:59   ` David Matlack [this message]
2023-03-28 16:47     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 12/18] KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits Vipin Sharma
2023-03-23 22:15   ` David Matlack
2023-03-28 17:12     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 13/18] KVM: mmu: Add common initialization logic for struct kvm_mmu_memory_cache{} Vipin Sharma
2023-03-23 22:23   ` David Matlack
2023-03-28 17:16     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 14/18] KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by default Vipin Sharma
2023-03-23 22:28   ` David Matlack
2023-03-28 17:31     ` Vipin Sharma
2023-03-28 23:13       ` David Matlack
2023-03-06 22:41 ` [Patch v4 15/18] KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{} Vipin Sharma
2023-03-23 22:30   ` David Matlack
2023-03-28 17:50     ` Vipin Sharma
2023-03-28 23:24       ` David Matlack
2023-04-03 22:57         ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 16/18] KVM: x86/mmu: Allocate numa aware page tables during page fault Vipin Sharma
2023-03-29  0:21   ` David Matlack
2023-03-29  0:28     ` David Matlack
2023-03-29 19:03     ` David Matlack
2023-04-03 22:54       ` Vipin Sharma
2023-04-03 22:50     ` Vipin Sharma
2023-03-06 22:41 ` [Patch v4 17/18] KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the same NUMA node Vipin Sharma
2023-03-06 22:41 ` [Patch v4 18/18] KVM: x86/mmu: Reduce default mmu memory cache size Vipin Sharma
2023-03-07 18:19 ` [Patch v4 00/18] NUMA aware page table allocation Mingwei Zhang
2023-03-07 18:33   ` Vipin Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBzL1Awe7S00dPUP@google.com \
    --to=dmatlack@google.com \
    --cc=bgardon@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox