public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>, Shivank Garg <shivankg@amd.com>
Cc: Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	linux-arm-kernel@lists.infradead.org,  kvmarm@lists.linux.dev,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 David Hildenbrand <david@redhat.com>,
	Fuad Tabba <tabba@google.com>,
	Ashish Kalra <ashish.kalra@amd.com>,
	 Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v12 05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
Date: Fri, 10 Oct 2025 14:57:19 -0700	[thread overview]
Message-ID: <diqzv7kmfmio.fsf@google.com> (raw)
In-Reply-To: <aOltikRvKzCy1DXN@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Fri, Oct 10, 2025, Shivank Garg wrote:
>> >> @@ -112,6 +114,19 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>> >>  	return r;
>> >>  }
>> >>  
>> >> +static struct mempolicy *kvm_gmem_get_folio_policy(struct gmem_inode *gi,
>> >> +						   pgoff_t index)
>> > 
>> > How about kvm_gmem_get_index_policy() instead, since the policy is keyed
>> > by index?
>
> But isn't the policy tied to the folio?  I assume/hope that something will split
> folios if they have different policies for their indices when a folio contains
> more than one page.  In other words, how will this work when hugepage support
> comes along?
>
> So yeah, I agree that the lookup is keyed on the index, but conceptually aren't
> we getting the policy for the folio?  The index is a means to an end.
>

I think the policy is tied to the index.

When we mmap(), there may not be a folio at this index yet, so any folio
that gets allocated for this index then is taken from the right NUMA
node based on the policy.

If the folio is later truncated, the folio just goes back to the NUMA
node, but the memory policy remains for the next folio to be allocated
at this index.

>> >> +{
>> >> +#ifdef CONFIG_NUMA
>> >> +	struct mempolicy *mpol;
>> >> +
>> >> +	mpol = mpol_shared_policy_lookup(&gi->policy, index);
>> >> +	return mpol ? mpol : get_task_policy(current);
>> > 
>> > Should we be returning NULL if no shared policy was defined?
>> > 
>> > By returning NULL, __filemap_get_folio_mpol() can handle the case where
>> > cpuset_do_page_mem_spread().
>> > 
>> > If we always return current's task policy, what if the user wants to use
>> > cpuset_do_page_mem_spread()?
>> > 
>> 
>> I initially followed shmem's approach here.
>> I agree that returning NULL maintains consistency with the current default
>> behavior of cpuset_do_page_mem_spread(), regardless of CONFIG_NUMA.
>> 
>> I'm curious what could be the practical implications of cpuset_do_page_mem_spread()
>> v/s get_task_policy() as the fallback?
>
> Userspace could enable page spreading on the task that triggers guest_memfd
> allocation.  I can't conjure up a reason to do that, but I've been surprised
> more than once by KVM setups.
>
>> Which is more appropriate for guest_memfd when no policy is explicitly set
>> via mbind()?
>
> I don't think we need to answer that question?  Userspace _has_ set a policy,
> just through cpuset, not via mbind().  So while I can't imagine there's a sane
> use case for cpuset_do_page_mem_spread() with guest_memfd, I also don't see a
> reason why KVM should effectively disallow it.
>
> And unless I'm missing something, allocation will eventually fallback to
> get_task_policy() (in alloc_frozen_pages_noprof()), so by explicitly getting the
> task policy in guest_memfd, KVM is doing _more_ work than necessary _and_ is
> unnecessarily restricting usersepace.
>
> Add in that returning NULL would align this code with the ->get_policy hook (and
> could be shared again, I assume), and my vote is definitely to return NULL and
> not get in the way.

... although if we are going to return NULL then we can directly use
mpol_shared_policy_lookup(), so the first discussion is moot.


Though looking slightly into the future, shareability (aka memory
attributes or shared/private state within guest_memfd inodes) are also
keyed by index, and is a property of the index and not the folio (since
shared/private state is defined even before folios are allocated for a
given index.

  reply	other threads:[~2025-10-10 21:57 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07 22:14 [PATCH v12 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
2025-10-07 22:14 ` [PATCH v12 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file" Sean Christopherson
2025-10-08  5:25   ` Garg, Shivank
2025-10-09 21:08     ` Ackerley Tng
2025-10-10 15:07     ` David Hildenbrand
2025-10-07 22:14 ` [PATCH v12 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode Sean Christopherson
2025-10-08  5:30   ` Garg, Shivank
2025-10-09 21:27   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes Sean Christopherson
2025-10-07 22:14 ` [PATCH v12 04/12] KVM: guest_memfd: Add slab-allocated inode cache Sean Christopherson
2025-10-09 21:39   ` Ackerley Tng
2025-10-09 22:16   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Sean Christopherson
2025-10-09 22:15   ` Ackerley Tng
2025-10-10  7:57     ` Garg, Shivank
2025-10-10 20:33       ` Sean Christopherson
2025-10-10 21:57         ` Ackerley Tng [this message]
2025-10-12 20:00           ` Garg, Shivank
2025-10-15 16:56           ` Sean Christopherson
2025-10-07 22:14 ` [PATCH v12 06/12] KVM: selftests: Define wrappers for common syscalls to assert success Sean Christopherson
2025-10-09 21:44   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default Sean Christopherson
2025-10-09 22:31   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h Sean Christopherson
2025-10-09 22:34   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions Sean Christopherson
2025-10-10 17:59   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems Sean Christopherson
2025-10-07 22:14 ` [PATCH v12 11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support Sean Christopherson
2025-10-09 23:08   ` Ackerley Tng
2025-10-07 22:14 ` [PATCH v12 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private Sean Christopherson
2025-10-09 20:58 ` [PATCH v12 00/12] KVM: guest_memfd: Add NUMA mempolicy support Ackerley Tng
2025-10-10  4:59   ` Garg, Shivank
2025-10-10 17:56     ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqzv7kmfmio.fsf@google.com \
    --to=ackerleytng@google.com \
    --cc=ashish.kalra@amd.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox