Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: sashiko-reviews@lists.linux.dev
Cc: kvm@vger.kernel.org, Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	David Hildenbrand <david@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>
Subject: Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting
Date: Thu, 4 Jun 2026 03:53:09 -0400	[thread overview]
Message-ID: <20260604034539-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20260603161525.D9E241F00893@smtp.kernel.org>

On Wed, Jun 03, 2026 at 04:15:25PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> 
> Pre-existing issues:
> - [High] guest_memfd memory allocation ignores page offset for NUMA interleaving, falling back to the task's global interleave counter.
> --
> 
> commit ce4f65d7ee89d8194749cc615f30619f101248db
> Author: Michael S. Tsirkin <mst@redhat.com>
> 
> KVM: guest_memfd: fix NUMA interleave index double-counting
> 
> This commit fixes a bug where the NUMA interleave index was being
> double-counted by setting *ilx to the inode number instead of the full page
> offset in kvm_gmem_get_policy(). This allows get_vma_policy() to correctly
> add the page offset without duplicating it.
> 
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index bf9659a7b0f6d..65daef23c8390 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -438,11 +438,12 @@ static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpo
> >  }
> >  
> >  static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> > -					     unsigned long addr, pgoff_t *pgoff)
> > +					     unsigned long addr, pgoff_t *ilx)
> >  {
> >  	struct inode *inode = file_inode(vma->vm_file);
> > +	pgoff_t pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> >  
> > -	*pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> > +	*ilx = inode->i_ino;
> >  
> >  	/*
> >  	 * Return the memory policy for this index, or NULL if none is set.
> > @@ -453,7 +454,7 @@ static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> >  	 * can then replace NULL with the default memory policy instead of the
> >  	 * current task's memory policy.
> >  	 */
> > -	return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, *pgoff);
> > +	return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, pgoff);
> >  }
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but does guest_memfd memory
> allocation ignore the page offset for NUMA interleaving and fall back to
> the task's global interleave counter instead?
> 
> While this patch fixes the index calculation for the policy lookup, it seems
> the index is completely dropped later during the actual memory allocation.
> 
> In virt/kvm/guest_memfd.c kvm_gmem_get_folio(), KVM calls into the generic
> filemap allocator:
> 
> folio = __filemap_get_folio_mpol(inode->i_mapping, index,
>                                  FGP_LOCK | FGP_CREAT,
>                                  mapping_gfp_mask(inode->i_mapping), policy);
> 
> But the generic allocator hardcodes NO_INTERLEAVE_INDEX in mm/filemap.c
> filemap_alloc_folio_noprof():
> 
> if (policy)
>     return folio_alloc_mpol_noprof(gfp, order, policy,
>                                    NO_INTERLEAVE_INDEX, numa_node_id());
> 
> For policies like MPOL_INTERLEAVE, using NO_INTERLEAVE_INDEX causes the
> allocation to use the faulting task's global il_prev counter rather than the
> file index-based deterministic interleaving.
> 
> Doesn't this defeat the purpose of the shared policy, causing interleaving
> to be randomized by the chronological order of vCPU page faults rather than
> deterministically spread based on the guest physical address?

Does not seem to be caused by this patch.
And I've no idea whether changing this will break anything
users rely on - might be prudent to have userspace opt-in,
if we are changing this?

At least kvm is the only user passing a non NULL policy, so
fixing filemap_alloc_folio_noprof should not break anyone else.
Anyway, I'll leave this one to the kvm maintainers.

> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/0eff0a90667b900bee837d06b5db5025e1f304b5.1780501924.git.mst@redhat.com?part=1

next prev parent reply	other threads:[~2026-06-04  7:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 15:57 [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting Michael S. Tsirkin
2026-06-03 16:15 ` sashiko-bot
2026-06-04  7:53   ` Michael S. Tsirkin [this message]
2026-06-03 18:51 ` Garg, Shivank
2026-06-04 23:46   ` Michael S. Tsirkin
2026-06-05 13:01     ` Garg, Shivank
2026-06-05 14:55       ` Michael S. Tsirkin
2026-06-06 13:02         ` Garg, Shivank
2026-06-06 13:12           ` Michael S. Tsirkin
2026-06-05  9:26 ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260604034539-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=david@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=seanjc@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox