linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Shivank Garg <shivankg@amd.com>
Cc: willy@infradead.org, akpm@linux-foundation.org, david@redhat.com,
	 pbonzini@redhat.com, shuah@kernel.org, vbabka@suse.cz,
	brauner@kernel.org,  viro@zeniv.linux.org.uk, dsterba@suse.com,
	xiang@kernel.org, chao@kernel.org,  jaegeuk@kernel.org,
	clm@fb.com, josef@toxicpanda.com,  kent.overstreet@linux.dev,
	zbestahu@gmail.com, jefflexu@linux.alibaba.com,
	 dhavale@google.com, lihongbo22@huawei.com,
	lorenzo.stoakes@oracle.com,  Liam.Howlett@oracle.com,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	 ziy@nvidia.com, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com,  rakie.kim@sk.com, byungchul@sk.com,
	gourry@gourry.net,  ying.huang@linux.alibaba.com,
	apopple@nvidia.com, tabba@google.com,  ackerleytng@google.com,
	paul@paul-moore.com, jmorris@namei.org,  serge@hallyn.com,
	pvorel@suse.cz, bfoster@redhat.com, vannapurve@google.com,
	 chao.gao@intel.com, bharata@amd.com, nikunj@amd.com,
	michael.day@amd.com,  shdhiman@amd.com, yan.y.zhao@intel.com,
	Neeraj.Upadhyay@amd.com,  thomas.lendacky@amd.com,
	michael.roth@amd.com, aik@amd.com, jgg@nvidia.com,
	 kalyazin@amazon.com, peterx@redhat.com, jack@suse.cz,
	hch@infradead.org,  cgzones@googlemail.com, ira.weiny@intel.com,
	rientjes@google.com,  roypat@amazon.co.uk, chao.p.peng@intel.com,
	amit@infradead.org,  ddutile@redhat.com,
	dan.j.williams@intel.com, ashish.kalra@amd.com,
	 gshan@redhat.com, jgowans@amazon.com, pankaj.gupta@amd.com,
	papaluri@amd.com,  yuzhao@google.com, suzuki.poulose@arm.com,
	quic_eberman@quicinc.com,  linux-bcachefs@vger.kernel.org,
	linux-btrfs@vger.kernel.org,  linux-erofs@lists.ozlabs.org,
	linux-f2fs-devel@lists.sourceforge.net,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org,  kvm@vger.kernel.org,
	linux-kselftest@vger.kernel.org,  linux-coco@lists.linux.dev
Subject: Re: [PATCH kvm-next V11 6/7] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
Date: Fri, 26 Sep 2025 12:36:27 -0700	[thread overview]
Message-ID: <aNbrO7A7fSjb4W84@google.com> (raw)
In-Reply-To: <aNVQJqYLX17v-fsf@google.com>

On Thu, Sep 25, 2025, Sean Christopherson wrote:
> On Wed, Aug 27, 2025, Shivank Garg wrote:
> > @@ -26,6 +28,9 @@ static inline struct kvm_gmem_inode_info *KVM_GMEM_I(struct inode *inode)
> >  	return container_of(inode, struct kvm_gmem_inode_info, vfs_inode);
> >  }
> >  
> > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem_inode_info *info,
> > +						   pgoff_t index);
> > +
> >  /**
> >   * folio_file_pfn - like folio_file_page, but return a pfn.
> >   * @folio: The folio which contains this index.
> > @@ -112,7 +117,25 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
> >  {
> >  	/* TODO: Support huge pages. */
> > -	return filemap_grab_folio(inode->i_mapping, index);
> > +	struct mempolicy *policy;
> > +	struct folio *folio;
> > +
> > +	/*
> > +	 * Fast-path: See if folio is already present in mapping to avoid
> > +	 * policy_lookup.
> > +	 */
> > +	folio = __filemap_get_folio(inode->i_mapping, index,
> > +				    FGP_LOCK | FGP_ACCESSED, 0);
> > +	if (!IS_ERR(folio))
> > +		return folio;
> > +
> > +	policy = kvm_gmem_get_pgoff_policy(KVM_GMEM_I(inode), index);
> > +	folio = __filemap_get_folio_mpol(inode->i_mapping, index,
> > +					 FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
> > +					 mapping_gfp_mask(inode->i_mapping), policy);
> > +	mpol_cond_put(policy);
> > +
> > +	return folio;
> >  }
> >  
> >  static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
> > @@ -372,8 +395,45 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> >  	return ret;
> >  }
> >  
> > +#ifdef CONFIG_NUMA
> > +static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
> > +{
> > +	struct inode *inode = file_inode(vma->vm_file);
> > +
> > +	return mpol_set_shared_policy(&KVM_GMEM_I(inode)->policy, vma, mpol);
> > +}
> > +
> > +static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> > +					     unsigned long addr, pgoff_t *pgoff)
> > +{
> > +	struct inode *inode = file_inode(vma->vm_file);
> > +
> > +	*pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> > +	return mpol_shared_policy_lookup(&KVM_GMEM_I(inode)->policy, *pgoff);
> > +}
> > +
> > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem_inode_info *info,
> > +						   pgoff_t index)
> 
> I keep reading this is "page offset policy", as opposed to "policy given a page
> offset".  Another oddity that is confusing is that this helper explicitly does
> get_task_policy(current), while kvm_gmem_get_policy() lets the caller do that.
> The end result is the same, but I think it would be helpful for gmem to be
> internally consistent.
> 
> If we have kvm_gmem_get_policy() use this helper, then we can kill two birds with
> one stone:
> 
> static struct mempolicy *__kvm_gmem_get_policy(struct gmem_inode *gi,
> 					       pgoff_t index)
> {
> 	struct mempolicy *mpol;
> 
> 	mpol = mpol_shared_policy_lookup(&gi->policy, index);
> 	return mpol ? mpol : get_task_policy(current);
> }
> 
> static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> 					     unsigned long addr, pgoff_t *pgoff)
> {
> 	*pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> 
> 	return __kvm_gmem_get_policy(GMEM_I(file_inode(vma->vm_file)), *pgoff);

Argh!!!!!  This breaks the selftest because do_get_mempolicy() very specifically
falls back to the default_policy, NOT to the current task's policy.  That is
*exactly* the type of subtle detail that needs to be commented, because there's
no way some random KVM developer is going to know that returning NULL here is
important with respect to get_mempolicy() ABI.

On a happier note, I'm very glad you wrote a testcase :-)

I've got this as fixup-to-the-fixup:

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index e796cc552a96..61130a52553f 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -114,8 +114,8 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
        return r;
 }
 
-static struct mempolicy *__kvm_gmem_get_policy(struct gmem_inode *gi,
-                                              pgoff_t index)
+static struct mempolicy *kvm_gmem_get_folio_policy(struct gmem_inode *gi,
+                                                  pgoff_t index)
 {
 #ifdef CONFIG_NUMA
        struct mempolicy *mpol;
@@ -151,7 +151,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
        if (!IS_ERR(folio))
                return folio;
 
-       policy = __kvm_gmem_get_policy(GMEM_I(inode), index);
+       policy = kvm_gmem_get_folio_policy(GMEM_I(inode), index);
        folio = __filemap_get_folio_mpol(inode->i_mapping, index,
                                         FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
                                         mapping_gfp_mask(inode->i_mapping), policy);
@@ -431,9 +431,18 @@ static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpo
 static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
                                              unsigned long addr, pgoff_t *pgoff)
 {
+       struct inode *inode = file_inode(vma->vm_file);
+
         *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
 
-        return __kvm_gmem_get_policy(GMEM_I(file_inode(vma->vm_file)), *pgoff);
+       /*
+        * Note!  Directly return whatever the lookup returns, do NOT return
+        * the current task's policy as is done when looking up the policy for
+        * a specific folio.  Kernel ABI for get_mempolicy() is to return
+        * MPOL_DEFAULT when there is no defined policy, not whatever the
+        * default policy resolves to.
+        */
+        return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, *pgoff);
 }
 #endif /* CONFIG_NUMA */
 


  reply	other threads:[~2025-09-26 19:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27 17:52 [PATCH kvm-next V11 0/7] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
2025-08-27 17:52 ` [PATCH kvm-next V11 1/7] mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio() Shivank Garg
2025-08-27 17:52 ` [PATCH kvm-next V11 2/7] mm/filemap: Extend __filemap_get_folio() to support NUMA memory policies Shivank Garg
2025-08-27 17:52 ` [PATCH kvm-next V11 3/7] mm/mempolicy: Export memory policy symbols Shivank Garg
2025-08-27 17:52 ` [PATCH kvm-next V11 4/7] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes Shivank Garg
2025-08-27 22:43   ` Ackerley Tng
2025-08-28  5:49     ` Garg, Shivank
2025-08-28 10:06     ` David Hildenbrand
2025-09-25  2:50     ` Sean Christopherson
2025-09-25 11:44       ` Garg, Shivank
2025-09-25 11:55         ` David Hildenbrand
2025-09-25 13:41           ` Sean Christopherson
2025-09-25 13:44             ` Fuad Tabba
2025-09-25 14:26             ` David Hildenbrand
2025-09-25 15:06               ` Sean Christopherson
2025-08-27 17:52 ` [PATCH kvm-next V11 5/7] KVM: guest_memfd: Add slab-allocated inode cache Shivank Garg
2025-09-25 14:05   ` Sean Christopherson
2025-09-25 14:17     ` Sean Christopherson
2025-08-27 17:52 ` [PATCH kvm-next V11 6/7] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Shivank Garg
2025-09-25 14:22   ` Sean Christopherson
2025-09-26 19:36     ` Sean Christopherson [this message]
2025-10-15 21:45       ` [f2fs-dev] " Gregory Price
2025-10-15 22:48         ` Sean Christopherson
2025-10-16 12:58           ` Garg, Shivank
2025-10-16 14:17           ` Gregory Price
2025-08-27 17:52 ` [PATCH kvm-next V11 7/7] KVM: guest_memfd: selftests: Add tests for mmap and NUMA policy support Shivank Garg
2025-09-25 21:35   ` Sean Christopherson
2025-09-25 23:03     ` Sean Christopherson
2025-09-25 23:04     ` Jason Gunthorpe
2025-09-25 23:12       ` Sean Christopherson
2025-09-26  7:32       ` David Hildenbrand
2025-09-26  7:31     ` David Hildenbrand
2025-09-26  7:37       ` Garg, Shivank
2025-08-28 12:44 ` [PATCH kvm-next V11 0/7] Add NUMA mempolicy support for KVM guest-memfd David Hildenbrand
2025-09-24 18:19 ` David Hildenbrand
2025-09-24 20:35   ` Kalra, Ashish
2025-10-15 18:02 ` Sean Christopherson
2025-10-20 15:52   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNbrO7A7fSjb4W84@google.com \
    --to=seanjc@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=ackerleytng@google.com \
    --cc=aik@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit@infradead.org \
    --cc=apopple@nvidia.com \
    --cc=ashish.kalra@amd.com \
    --cc=bfoster@redhat.com \
    --cc=bharata@amd.com \
    --cc=brauner@kernel.org \
    --cc=byungchul@sk.com \
    --cc=cgzones@googlemail.com \
    --cc=chao.gao@intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=chao@kernel.org \
    --cc=clm@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhavale@google.com \
    --cc=dsterba@suse.com \
    --cc=gourry@gourry.net \
    --cc=gshan@redhat.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=jefflexu@linux.alibaba.com \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=jmorris@namei.org \
    --cc=josef@toxicpanda.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kalyazin@amazon.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=lihongbo22@huawei.com \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=michael.day@amd.com \
    --cc=michael.roth@amd.com \
    --cc=nikunj@amd.com \
    --cc=pankaj.gupta@amd.com \
    --cc=papaluri@amd.com \
    --cc=paul@paul-moore.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pvorel@suse.cz \
    --cc=quic_eberman@quicinc.com \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=serge@hallyn.com \
    --cc=shdhiman@amd.com \
    --cc=shivankg@amd.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuzhao@google.com \
    --cc=zbestahu@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).