Linux virtualization list
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	"David Hildenbrand (Arm)" <david@kernel.org>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	"Vlastimil Babka" <vbabka@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Brendan Jackman" <jackmanb@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>, "Zi Yan" <ziy@nvidia.com>,
	"Baolin Wang" <baolin.wang@linux.alibaba.com>,
	"Nico Pache" <npache@redhat.com>,
	"Ryan Roberts" <ryan.roberts@arm.com>,
	"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
	"Lance Yang" <lance.yang@linux.dev>,
	"Hugh Dickins" <hughd@google.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Joshua Hahn" <joshua.hahnjy@gmail.com>,
	"Rakie Kim" <rakie.kim@sk.com>,
	"Byungchul Park" <byungchul@sk.com>,
	"Gregory Price" <gourry@gourry.net>,
	"Ying Huang" <ying.huang@linux.alibaba.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Christoph Lameter" <cl@gentwo.org>,
	"David Rientjes" <rientjes@google.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Harry Yoo" <harry.yoo@oracle.com>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
	"Chris Li" <chrisl@kernel.org>,
	"Kairui Song" <kasong@tencent.com>,
	"Kemeng Shi" <shikemeng@huaweicloud.com>,
	"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
	virtualization@lists.linux.dev, linux-mm@kvack.org,
	"Andrea Arcangeli" <aarcange@redhat.com>
Subject: Re: [PATCH v10 01/37] mm: mempolicy: fix interleave index calculation
Date: Mon, 8 Jun 2026 10:43:58 +0100	[thread overview]
Message-ID: <aiaI4Of97oH5YTbi@lucifer> (raw)
In-Reply-To: <f5b1dcc97baae7183e96743873b943315d93cdb6.1780906288.git.mst@redhat.com>

On Mon, Jun 08, 2026 at 04:34:06AM -0400, Michael S. Tsirkin wrote:
> The NUMA interleave index was computed as two separate terms:
>
>     *ilx += vma->vm_pgoff >> order;
>     *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
>
> This has two problems:
>
> 1. When vm_start is not aligned to the folio size, the
>    subtraction before the shift lets low bits affect the
>    result via borrows.

This feels really vague. Do you have examples of where the calculation has
been impacted like this?

How is this a problem that affects things in practice?

>
> 2. For file-backed VMAs, shifting vm_pgoff and the VMA
>    offset independently loses carries between them, giving
>    wrong chunk indices when vm_pgoff is not aligned to order.

Similar comments to the above. An example would be helpful here.

>
> Combine into a single expression that adds vm_pgoff and

Combining this kind of thing into a single expression is the complete
opposite of what we want to do when refactoring code. No thanks.

> the page-granularity VMA offset first, then shifts once:
>
>     *ilx += (vma->vm_pgoff +
>             (addr >> PAGE_SHIFT) -
>             (vma->vm_start >> PAGE_SHIFT)) >> order;
>
> For anonymous VMAs, vm_pgoff equals vm_start >> PAGE_SHIFT,

This is completely incorrect.

For anonymous VMAs:

vm_pgoff = vm_start_at_first_fault >> PAGE_SHIFT.

So if you remap a _faulted_ VMA, the vm_start changes, vm_pgoff does
not. The two terms can be COMPLETELY independent.

> so the vm_pgoff and vm_start terms cancel and the result

No, they do not.

> reduces to addr >> (PAGE_SHIFT + order), same as before.

No, it doesn't.

>
> For file-backed VMAs, the sum vm_pgoff + (addr >> PAGE_SHIFT)
> - (vm_start >> PAGE_SHIFT) gives the file page offset of addr.

That's the page offset in a VMA for both anon and file-backed?

(addr - vm_start) >> PAGE_SHIFT is the page offset into a VMA (canonically
determined by linear_page_index())

> Shifting by order gives the correct file chunk index.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

You're claiming we have an incorrect calculation here, but are not
providing a Fixes patch or Cc: stable or sending this separately as a fix?

> Assisted-by: Claude:claude-opus-4-6
> Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
>  mm/mempolicy.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4e4421b22b59..d139b074a599 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2048,8 +2048,9 @@ struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
>  		pol = get_task_policy(current);
>  	if (pol->mode == MPOL_INTERLEAVE ||
>  	    pol->mode == MPOL_WEIGHTED_INTERLEAVE) {
> -		*ilx += vma->vm_pgoff >> order;
> -		*ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
> +		*ilx += (vma->vm_pgoff +
> +			(addr >> PAGE_SHIFT) -
> +			(vma->vm_start >> PAGE_SHIFT)) >> order;

This is horrible code. Not only are you doing everything in a single
expression for some reason, you're also making the parens confusing and not
explaining what you're doing here at all.

The code before was at least tractable, this is objectively making it
worse.

And anyway, the canonical way to find the page offset into a VMA is
linear_page_index():

static inline pgoff_t linear_page_index(const struct vm_area_struct *vma,
					const unsigned long address)
{
	pgoff_t pgoff;
	pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
	pgoff += vma->vm_pgoff;
	return pgoff;
}

So isn't a far better solution therefore:

	const pgoff_t index = linear_page_index(vma, addr);

	*ilx += index >> order;

Which has the benefit of being readable, uses the canonical method for
determining page offset in the VMA + eliminates the open-coded stuff?

>  	}
>  	return pol;
>  }
> --
> MST
>

Thanks, Lorenzo

  reply	other threads:[~2026-06-08  9:44 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08  8:33 [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages Michael S. Tsirkin
2026-06-08  8:34 ` [PATCH v10 01/37] mm: mempolicy: fix interleave index calculation Michael S. Tsirkin
2026-06-08  9:43   ` Lorenzo Stoakes [this message]
2026-06-08  8:34 ` [PATCH v10 02/37] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin
2026-06-08  9:43   ` Lorenzo Stoakes
2026-06-08 13:48     ` Michael S. Tsirkin
2026-06-08 14:14       ` Lorenzo Stoakes
2026-06-08 16:20       ` Andrew Morton
2026-06-08  8:34 ` [PATCH v10 03/37] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-06-08  9:52   ` Lorenzo Stoakes
2026-06-08 12:50     ` Matthew Wilcox
2026-06-08  8:34 ` [PATCH v10 04/37] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
2026-06-08  8:34 ` [PATCH v10 05/37] mm: hugetlb: remove dead alloc_hugetlb_folio stub Michael S. Tsirkin
2026-06-08  9:56   ` Lorenzo Stoakes
2026-06-08  8:35 ` [PATCH v10 06/37] mm: move vma_alloc_folio_noprof to page_alloc.c Michael S. Tsirkin
2026-06-08 10:05   ` Lorenzo Stoakes
2026-06-08  8:35 ` [PATCH v10 07/37] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-06-08 10:23   ` Lorenzo Stoakes
2026-06-08 11:06     ` Lorenzo Stoakes
2026-06-08 13:04       ` Matthew Wilcox
2026-06-08 13:09         ` Lorenzo Stoakes
2026-06-08 14:26           ` David Hildenbrand (Arm)
2026-06-08 14:31             ` Matthew Wilcox
2026-06-08 14:37               ` David Hildenbrand (Arm)
2026-06-08 14:44                 ` Matthew Wilcox
2026-06-08 14:55                   ` David Hildenbrand (Arm)
2026-06-08 11:08     ` David Hildenbrand (Arm)
2026-06-08 15:27       ` Zi Yan
2026-06-08  8:35 ` [PATCH v10 08/37] mm: add alloc_contig_frozen_pages_user " Michael S. Tsirkin
2026-06-08 10:29   ` Lorenzo Stoakes
2026-06-08  8:35 ` [PATCH v10 09/37] mm: hugetlb: thread user_addr through gigantic page allocation Michael S. Tsirkin
2026-06-08  8:36 ` [PATCH v10 10/37] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
2026-06-08  9:12   ` Lorenzo Stoakes
2026-06-08  8:36 ` [PATCH v10 11/37] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
2026-06-08 10:33   ` Lorenzo Stoakes
2026-06-08  8:36 ` [PATCH v10 12/37] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
2026-06-08 11:23   ` Lorenzo Stoakes
2026-06-08 15:53     ` Gregory Price
2026-06-08  8:36 ` [PATCH v10 13/37] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
2026-06-08 10:39   ` Lorenzo Stoakes
2026-06-08 10:55     ` Lorenzo Stoakes
2026-06-08  8:37 ` [PATCH v10 14/37] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
2026-06-08 11:29   ` Lorenzo Stoakes
2026-06-08  8:37 ` [PATCH v10 15/37] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-06-08 11:35   ` Lorenzo Stoakes
2026-06-08  8:37 ` [PATCH v10 16/37] mm: alloc_swap_folio: " Michael S. Tsirkin
2026-06-08 11:37   ` Lorenzo Stoakes
2026-06-08 15:59     ` Gregory Price
2026-06-08  8:37 ` [PATCH v10 17/37] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-06-08 12:00   ` Lorenzo Stoakes
2026-06-08 16:09     ` Gregory Price
2026-06-08  8:38 ` [PATCH v10 18/37] mm: page_alloc: use aliasing checks instead of user_alloc_needs_zeroing Michael S. Tsirkin
2026-06-08 11:39   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 19/37] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
2026-06-08 11:47   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 20/37] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-06-08  8:38 ` [PATCH v10 21/37] mm: page_alloc: propagate PG_zeroed in split_large_buddy Michael S. Tsirkin
2026-06-08  8:38 ` [PATCH v10 22/37] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
2026-06-08 12:06   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 23/37] mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe Michael S. Tsirkin
2026-06-08 12:18   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 24/37] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
2026-06-08 12:25   ` Lorenzo Stoakes
2026-06-08 12:46     ` David Hildenbrand (Arm)
2026-06-08 14:08       ` Michael S. Tsirkin
2026-06-08 14:28         ` David Hildenbrand (Arm)
2026-06-08  8:39 ` [PATCH v10 25/37] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
2026-06-08 12:29   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 26/37] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-06-08 12:30   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 27/37] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
2026-06-08 12:32   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 28/37] mm: hugetlb: add gfp parameter and skip zeroing for zeroed pages Michael S. Tsirkin
2026-06-08 12:44   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 29/37] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-06-08 12:47   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 30/37] mm: page_reporting: add per-page zeroed bitmap for host feedback Michael S. Tsirkin
2026-06-08  8:39 ` [PATCH v10 31/37] virtio_balloon: submit reported pages as individual buffers Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 32/37] virtio_balloon: disable indirect descriptors Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 33/37] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 34/37] virtio_balloon: skip zeroing for host-zeroed reported pages Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 35/37] virtio_balloon: disable reporting zeroed optimization for confidential guests Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 36/37] mm: balloon: use put_page_zeroed for zeroed balloon pages Michael S. Tsirkin
2026-06-08 11:10   ` David Hildenbrand (Arm)
2026-06-08  8:40 ` [PATCH v10 37/37] virtio_balloon: implement VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE Michael S. Tsirkin
2026-06-08  9:17 ` [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages Lorenzo Stoakes
2026-06-08 12:52   ` Lorenzo Stoakes
2026-06-08 11:02 ` Vlastimil Babka (SUSE)
2026-06-08 11:13   ` Vlastimil Babka (SUSE)
2026-06-08 15:45     ` Gregory Price
2026-06-08 17:50       ` Lorenzo Stoakes
2026-06-08 14:21 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiaI4Of97oH5YTbi@lucifer \
    --to=ljs@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=chrisl@kernel.org \
    --cc=cl@gentwo.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=eperezma@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hughd@google.com \
    --cc=jackmanb@google.com \
    --cc=jasowang@redhat.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=virtualization@lists.linux.dev \
    --cc=weixugc@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox