Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: sashiko-reviews@lists.linux.dev
Cc: kvm@vger.kernel.org, Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	David Hildenbrand <david@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>
Subject: Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting
Date: Thu, 4 Jun 2026 03:53:09 -0400	[thread overview]
Message-ID: <20260604034539-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20260603161525.D9E241F00893@smtp.kernel.org>

On Wed, Jun 03, 2026 at 04:15:25PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> 
> Pre-existing issues:
> - [High] guest_memfd memory allocation ignores page offset for NUMA interleaving, falling back to the task's global interleave counter.
> --
> 
> commit ce4f65d7ee89d8194749cc615f30619f101248db
> Author: Michael S. Tsirkin <mst@redhat.com>
> 
> KVM: guest_memfd: fix NUMA interleave index double-counting
> 
> This commit fixes a bug where the NUMA interleave index was being
> double-counted by setting *ilx to the inode number instead of the full page
> offset in kvm_gmem_get_policy(). This allows get_vma_policy() to correctly
> add the page offset without duplicating it.
> 
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index bf9659a7b0f6d..65daef23c8390 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -438,11 +438,12 @@ static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpo
> >  }
> >  
> >  static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> > -					     unsigned long addr, pgoff_t *pgoff)
> > +					     unsigned long addr, pgoff_t *ilx)
> >  {
> >  	struct inode *inode = file_inode(vma->vm_file);
> > +	pgoff_t pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> >  
> > -	*pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> > +	*ilx = inode->i_ino;
> >  
> >  	/*
> >  	 * Return the memory policy for this index, or NULL if none is set.
> > @@ -453,7 +454,7 @@ static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> >  	 * can then replace NULL with the default memory policy instead of the
> >  	 * current task's memory policy.
> >  	 */
> > -	return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, *pgoff);
> > +	return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, pgoff);
> >  }
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but does guest_memfd memory
> allocation ignore the page offset for NUMA interleaving and fall back to
> the task's global interleave counter instead?
> 
> While this patch fixes the index calculation for the policy lookup, it seems
> the index is completely dropped later during the actual memory allocation.
> 
> In virt/kvm/guest_memfd.c kvm_gmem_get_folio(), KVM calls into the generic
> filemap allocator:
> 
> folio = __filemap_get_folio_mpol(inode->i_mapping, index,
>                                  FGP_LOCK | FGP_CREAT,
>                                  mapping_gfp_mask(inode->i_mapping), policy);
> 
> But the generic allocator hardcodes NO_INTERLEAVE_INDEX in mm/filemap.c
> filemap_alloc_folio_noprof():
> 
> if (policy)
>     return folio_alloc_mpol_noprof(gfp, order, policy,
>                                    NO_INTERLEAVE_INDEX, numa_node_id());
> 
> For policies like MPOL_INTERLEAVE, using NO_INTERLEAVE_INDEX causes the
> allocation to use the faulting task's global il_prev counter rather than the
> file index-based deterministic interleaving.
> 
> Doesn't this defeat the purpose of the shared policy, causing interleaving
> to be randomized by the chronological order of vCPU page faults rather than
> deterministically spread based on the guest physical address?

Does not seem to be caused by this patch.
And I've no idea whether changing this will break anything
users rely on - might be prudent to have userspace opt-in,
if we are changing this?

At least kvm is the only user passing a non NULL policy, so
fixing filemap_alloc_folio_noprof should not break anyone else.
Anyway, I'll leave this one to the kvm maintainers.

> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/0eff0a90667b900bee837d06b5db5025e1f304b5.1780501924.git.mst@redhat.com?part=1

next prev parent reply	other threads:[~2026-06-04  7:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 15:57 [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting Michael S. Tsirkin
2026-06-03 16:15 ` sashiko-bot
2026-06-04  7:53   ` Michael S. Tsirkin [this message]
2026-06-03 18:51 ` Garg, Shivank
2026-06-04 23:46   ` Michael S. Tsirkin
2026-06-05 13:01     ` Garg, Shivank
2026-06-05 14:55       ` Michael S. Tsirkin
2026-06-05  9:26 ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260604034539-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=david@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=seanjc@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.