Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	David Hildenbrand <david@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Shivank Garg <shivankg@amd.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting
Date: Tue, 9 Jun 2026 15:54:02 -0400	[thread overview]
Message-ID: <20260609154046-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <178102235481.2735841.1203781071933134475.b4-ty@google.com>

On Tue, Jun 09, 2026 at 09:31:29AM -0700, Sean Christopherson wrote:
> On Wed, 03 Jun 2026 11:57:33 -0400, Michael S. Tsirkin wrote:
> > kvm_gmem_get_policy() sets *ilx to the full page offset
> > (vm_pgoff + vma offset).  But get_vma_policy() adds the page
> > offset on top of *ilx, so the offset is counted twice.  This
> > causes NUMA interleaving to skip nodes: for order-0 pages the
> > effective index jumps by 2 for each consecutive page.
> > 
> > The get_policy vm_op should return only a per-file bias in *ilx
> > (like shmem_get_policy does with inode->i_ino), letting
> > get_vma_policy() add the page-offset component.
> > 
> > [...]
> 
> Applied to kvm-x86 gmem, with a heavily massaged changelog to explicitly spell
> out that ilx == interleave index, and to try and explain the role of the index
> (it wasn't at all obvious to me why using the inode number was "correct").
> 
> Thanks!
> 
> [1/1] KVM: guest_memfd: fix NUMA interleave index double-counting
>       https://github.com/kvm-x86/linux/commit/48dbe4732198

Thanks!

Sean, what is your take on interleaving for guest_memfd?

To the best of my understanding:

Right now IIUC kvm calls __filemap_get_folio_mpol which in turn does not pass
the index to filemap_alloc_folio. That uses NO_INTERLEAVE_INDEX, so
MPOL_INTERLEAVE uses the task's global counter - effectively
unpredictable placement. This looks like an oversight (the index was
available but never threaded down), but it's been shipping since 6.19.

Should we fix it to use the file offset instead? Or GPA? And if so,
should that be the default or does userspace need a way to opt out of
NO_INTERLEAVE_INDEX?

Thanks,
MST


  reply	other threads:[~2026-06-09 19:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 15:57 [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting Michael S. Tsirkin
2026-06-03 16:15 ` sashiko-bot
2026-06-04  7:53   ` Michael S. Tsirkin
2026-06-03 18:51 ` Garg, Shivank
2026-06-04 23:46   ` Michael S. Tsirkin
2026-06-05 13:01     ` Garg, Shivank
2026-06-05 14:55       ` Michael S. Tsirkin
2026-06-06 13:02         ` Garg, Shivank
2026-06-06 13:12           ` Michael S. Tsirkin
2026-06-05  9:26 ` David Hildenbrand (Arm)
2026-06-09 16:31 ` Sean Christopherson
2026-06-09 19:54   ` Michael S. Tsirkin [this message]
2026-06-09 21:14     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260609154046-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=david@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shivankg@amd.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox