* [PATCH 16/30] mm/vma: use vma_start_pgoff(), linear_page_index() in mm code
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
There are many instances in which linear_page_index() (as well as
linear_page_delta()) is open-coded, which is confusing and inconsistent.
Additionally, vma->vm_pgoff doesn't necessarily make it clear that this is
the page offset of the start of the VMA range.
Doing so also aids greppability.
So use vma_start_pgoff() in favour of directly accessing vma->vm_pgoff, and
linear_page_index() where we can.
This also lays the ground for future changes which will add an anonymous
page offset in order to be able to index MAP_PRIVATE-file backed anon
folios in terms of their virtual page offset.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/huge_mm.h | 1 +
include/linux/hugetlb.h | 3 +--
include/linux/pagemap.h | 2 +-
mm/damon/vaddr.c | 5 +++--
mm/debug.c | 2 +-
mm/filemap.c | 7 ++++---
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 11 ++++-------
mm/internal.h | 24 ++++++++++++++----------
mm/khugepaged.c | 3 ++-
mm/madvise.c | 6 +++---
mm/mapping_dirty_helpers.c | 2 +-
mm/memory.c | 25 +++++++++++++------------
mm/mempolicy.c | 13 +++++++------
mm/mremap.c | 12 ++++--------
mm/msync.c | 4 ++--
mm/nommu.c | 7 ++++---
mm/pagewalk.c | 2 +-
mm/shmem.c | 9 +++++----
mm/userfaultfd.c | 4 ++--
mm/util.c | 4 ++--
mm/vma.c | 15 +++++++--------
mm/vma_exec.c | 4 ++--
mm/vma_init.c | 2 +-
24 files changed, 86 insertions(+), 83 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ad20f7f8c179..653b81d08fe7 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -230,6 +230,7 @@ static inline bool thp_vma_suitable_order(struct vm_area_struct *vma,
/* Don't have to check pgoff for anonymous vma */
if (!vma_is_anonymous(vma)) {
+ /* vma_start_pgoff() in mm.h so not available. */
if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
hpage_size >> PAGE_SHIFT))
return false;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2abaf99321e9..8390f50604d6 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -792,8 +792,7 @@ static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
{
struct hstate *h = hstate_vma(vma);
- return ((address - vma->vm_start) >> huge_page_shift(h)) +
- (vma->vm_pgoff >> huge_page_order(h));
+ return linear_page_index(vma, address) >> huge_page_order(h);
}
static inline bool order_is_gigantic(unsigned int order)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 644c0f25ae73..68a88d34a468 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1101,7 +1101,7 @@ static inline pgoff_t linear_page_index(const struct vm_area_struct *vma,
pgoff_t pgoff;
pgoff = linear_page_delta(vma, address);
- pgoff += vma->vm_pgoff;
+ pgoff += vma_start_pgoff(vma);
return pgoff;
}
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index d27147603564..faa44aa3219b 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -12,6 +12,7 @@
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
#include <linux/page_idle.h>
+#include <linux/pagemap.h>
#include <linux/pagewalk.h>
#include <linux/sched/mm.h>
@@ -627,8 +628,8 @@ static void damos_va_migrate_dests_add(struct folio *folio,
}
order = folio_order(folio);
- ilx = vma->vm_pgoff >> order;
- ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
+ ilx = vma_start_pgoff(vma) >> order;
+ ilx += linear_page_delta(vma, addr) >> order;
for (i = 0; i < dests->nr_dests; i++)
weight_total += dests->weight_arr[i];
diff --git a/mm/debug.c b/mm/debug.c
index 77fa8fe1d641..497654b36f1a 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -163,7 +163,7 @@ void dump_vma(const struct vm_area_struct *vma)
"flags: %#lx(%pGv)\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
(unsigned long)pgprot_val(vma->vm_page_prot),
- vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
+ vma->anon_vma, vma->vm_ops, vma_start_pgoff(vma),
vma->vm_file, vma->vm_private_data,
#ifdef CONFIG_PER_VMA_LOCK
refcount_read(&vma->vm_refcnt),
diff --git a/mm/filemap.c b/mm/filemap.c
index 5af62e6abca5..bcb07b21a685 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3402,8 +3402,8 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
* of memory.
*/
struct vm_area_struct *vma = vmf->vma;
- unsigned long start = vma->vm_pgoff;
- unsigned long end = start + vma_pages(vma);
+ const unsigned long start = vma_start_pgoff(vma);
+ const unsigned long end = vma_end_pgoff(vma);
unsigned long ra_end;
ra->order = exec_folio_order();
@@ -3921,7 +3921,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
goto out;
}
- addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ addr = vma->vm_start +
+ ((start_pgoff - vma_start_pgoff(vma)) << PAGE_SHIFT);
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
if (!vmf->pte) {
folio_unlock(folio);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2bccb0a53a0a..e94f56487225 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -180,7 +180,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
*/
if (!in_pf && shmem_file(vma->vm_file))
return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
- vma, vma->vm_pgoff, 0,
+ vma, vma_start_pgoff(vma), 0,
forced_collapse);
if (!vma_is_anonymous(vma)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f45000149a78..d44a3ac5ee0a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1011,8 +1011,7 @@ static long region_count(struct resv_map *resv, long f, long t)
static pgoff_t vma_hugecache_offset(struct hstate *h,
struct vm_area_struct *vma, unsigned long address)
{
- return ((address - vma->vm_start) >> huge_page_shift(h)) +
- (vma->vm_pgoff >> huge_page_order(h));
+ return linear_page_index(vma, address) >> huge_page_order(h);
}
/*
@@ -5372,8 +5371,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* from page cache lookup which is in HPAGE_SIZE units.
*/
address = address & huge_page_mask(h);
- pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) +
- vma->vm_pgoff;
+ pgoff = linear_page_index(vma, address);
mapping = vma->vm_file->f_mapping;
/*
@@ -6771,7 +6769,7 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
struct vm_area_struct *vma,
unsigned long addr, pgoff_t idx)
{
- unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) +
+ unsigned long saddr = ((idx - vma_start_pgoff(svma)) << PAGE_SHIFT) +
svma->vm_start;
unsigned long sbase = saddr & PUD_MASK;
unsigned long s_end = sbase + PUD_SIZE;
@@ -6856,8 +6854,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pud_t *pud)
{
struct address_space *mapping = vma->vm_file->f_mapping;
- pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
- vma->vm_pgoff;
+ const pgoff_t idx = linear_page_index(vma, addr);
struct vm_area_struct *svma;
unsigned long saddr;
pte_t *spte = NULL;
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a2..89e5b7efe256 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1143,26 +1143,28 @@ static inline bool
folio_within_range(struct folio *folio, struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- pgoff_t pgoff, addr;
- unsigned long vma_pglen = vma_pages(vma);
+ const unsigned long vma_pglen = vma_pages(vma);
+ pgoff_t pgoff_folio, pgoff_vma_start;
+ unsigned long addr;
VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio);
if (start > end)
return false;
+ pgoff_folio = folio_pgoff(folio);
+ pgoff_vma_start = vma_start_pgoff(vma);
+
if (start < vma->vm_start)
start = vma->vm_start;
if (end > vma->vm_end)
end = vma->vm_end;
- pgoff = folio_pgoff(folio);
-
/* if folio start address is not in vma range */
- if (!in_range(pgoff, vma->vm_pgoff, vma_pglen))
+ if (!in_range(pgoff_folio, pgoff_vma_start, vma_pglen))
return false;
- addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ addr = vma->vm_start + ((pgoff_folio - pgoff_vma_start) << PAGE_SHIFT);
return !(addr < start || end - addr < folio_size(folio));
}
@@ -1234,15 +1236,16 @@ extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
static inline unsigned long vma_address(const struct vm_area_struct *vma,
pgoff_t pgoff, unsigned long nr_pages)
{
+ const pgoff_t pgoff_start = vma_start_pgoff(vma);
unsigned long address;
- if (pgoff >= vma->vm_pgoff) {
+ if (pgoff >= pgoff_start) {
address = vma->vm_start +
- ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ ((pgoff - pgoff_start) << PAGE_SHIFT);
/* Check for address beyond vma (or wrapped through 0?) */
if (address < vma->vm_start || address >= vma->vm_end)
address = -EFAULT;
- } else if (pgoff + nr_pages - 1 >= vma->vm_pgoff) {
+ } else if (pgoff + nr_pages - 1 >= pgoff_start) {
/* Test above avoids possibility of wrap to 0 on 32-bit */
address = vma->vm_start;
} else {
@@ -1266,7 +1269,8 @@ static inline unsigned long vma_address_end(struct page_vma_mapped_walk *pvmw)
return pvmw->address + PAGE_SIZE;
pgoff = pvmw->pgoff + pvmw->nr_pages;
- address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ address = vma->vm_start +
+ ((pgoff - vma_start_pgoff(vma)) << PAGE_SHIFT);
/* Check for address beyond vma (or wrapped through 0?) */
if (address < vma->vm_start || address > vma->vm_end)
address = vma->vm_end;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index bd5f86cf4bd8..ffef738d826c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2145,7 +2145,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
spinlock_t *ptl;
bool success = false;
- addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ addr = vma->vm_start +
+ ((pgoff - vma_start_pgoff(vma)) << PAGE_SHIFT);
if (addr & ~HPAGE_PMD_MASK ||
vma->vm_end < addr + HPAGE_PMD_SIZE)
continue;
diff --git a/mm/madvise.c b/mm/madvise.c
index cd9bb077072c..6730c4200a93 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -253,7 +253,7 @@ static void shmem_swapin_range(struct vm_area_struct *vma,
continue;
addr = vma->vm_start +
- ((xas.xa_index - vma->vm_pgoff) << PAGE_SHIFT);
+ ((xas.xa_index - vma_start_pgoff(vma)) << PAGE_SHIFT);
xas_pause(&xas);
rcu_read_unlock();
@@ -318,7 +318,7 @@ static long madvise_willneed(struct madvise_behavior *madv_behavior)
mark_mmap_lock_dropped(madv_behavior);
get_file(file);
offset = (loff_t)(start - vma->vm_start)
- + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ + ((loff_t)vma_start_pgoff(vma) << PAGE_SHIFT);
mmap_read_unlock(mm);
vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
fput(file);
@@ -1023,7 +1023,7 @@ static long madvise_remove(struct madvise_behavior *madv_behavior)
return -EACCES;
offset = (loff_t)(start - vma->vm_start)
- + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ + ((loff_t)vma_start_pgoff(vma) << PAGE_SHIFT);
/*
* Filesystem's fallocate may need to take i_rwsem. We need to
diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c
index 737c407f4081..e0efa36e0a07 100644
--- a/mm/mapping_dirty_helpers.c
+++ b/mm/mapping_dirty_helpers.c
@@ -95,7 +95,7 @@ static int clean_record_pte(pte_t *pte, unsigned long addr,
if (pte_dirty(ptent)) {
pgoff_t pgoff = ((addr - walk->vma->vm_start) >> PAGE_SHIFT) +
- walk->vma->vm_pgoff - cwalk->bitmap_pgoff;
+ vma_start_pgoff(walk->vma) - cwalk->bitmap_pgoff;
pte_t old_pte = ptep_modify_prot_start(walk->vma, addr, pte);
ptent = pte_mkclean(old_pte);
diff --git a/mm/memory.c b/mm/memory.c
index 98c1a245f45a..f5eb06544ba4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -725,10 +725,10 @@ static inline struct page *__vm_normal_page(struct vm_area_struct *vma,
if (!pfn_valid(pfn))
return NULL;
} else {
- unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
+ const pgoff_t index = linear_page_index(vma, addr);
/* Only CoW'ed anon folios are "normal". */
- if (pfn == vma->vm_pgoff + off)
+ if (pfn == index)
return NULL;
if (!is_cow_mapping(vma->vm_flags))
return NULL;
@@ -2643,7 +2643,7 @@ static int __vm_map_pages(struct vm_area_struct *vma, struct page **pages,
int vm_map_pages(struct vm_area_struct *vma, struct page **pages,
unsigned long num)
{
- return __vm_map_pages(vma, pages, num, vma->vm_pgoff);
+ return __vm_map_pages(vma, pages, num, vma_start_pgoff(vma));
}
EXPORT_SYMBOL(vm_map_pages);
@@ -3298,7 +3298,8 @@ int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long
unsigned long pfn;
int err;
- err = __simple_ioremap_prep(vm_len, vma->vm_pgoff, start, len, &pfn);
+ err = __simple_ioremap_prep(vm_len, vma_start_pgoff(vma), start, len,
+ &pfn);
if (err)
return err;
@@ -4342,15 +4343,15 @@ static inline void unmap_mapping_range_tree(struct address_space *mapping,
struct zap_details *details)
{
struct vm_area_struct *vma;
- unsigned long start, size;
struct mmu_gather tlb;
mapping_interval_tree_foreach(vma, mapping, first_index, last_index) {
- const pgoff_t start_idx = max(first_index, vma->vm_pgoff);
+ const pgoff_t start_idx = max(first_index, vma_start_pgoff(vma));
const pgoff_t end_idx = min(last_index, vma_last_pgoff(vma)) + 1;
-
- start = vma->vm_start + ((start_idx - vma->vm_pgoff) << PAGE_SHIFT);
- size = (end_idx - start_idx) << PAGE_SHIFT;
+ const pgoff_t offset = start_idx - vma_start_pgoff(vma);
+ const unsigned long offset_bytes = offset << PAGE_SHIFT;
+ const unsigned long start = vma->vm_start + offset_bytes;
+ const unsigned long size = (end_idx - start_idx) << PAGE_SHIFT;
tlb_gather_mmu(&tlb, vma->vm_mm);
zap_vma_range_batched(&tlb, vma, start, size, details);
@@ -5684,7 +5685,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
} else if (nr_pages > 1) {
pgoff_t idx = folio_page_idx(folio, page);
/* The page offset of vmf->address within the VMA. */
- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
+ pgoff_t vma_off = vmf->pgoff - vma_start_pgoff(vmf->vma);
/* The index of the entry in the pagetable for fault page. */
pgoff_t pte_off = pte_index(vmf->address);
@@ -5796,7 +5797,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
pgoff_t nr_pages = READ_ONCE(fault_around_pages);
pgoff_t pte_off = pte_index(vmf->address);
/* The page offset of vmf->address within the VMA. */
- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
+ pgoff_t vma_off = vmf->pgoff - vma_start_pgoff(vmf->vma);
pgoff_t from_pte, to_pte;
vm_fault_t ret;
@@ -7274,7 +7275,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
if (vma && vma->vm_file) {
struct file *f = vma->vm_file;
ip -= vma->vm_start;
- ip += vma->vm_pgoff << PAGE_SHIFT;
+ ip += vma_start_pgoff(vma) << PAGE_SHIFT;
printk("%s%pD[%lx,%lx+%lx]", prefix, f, ip,
vma->vm_start,
vma->vm_end - vma->vm_start);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36699fabd3c2..650cdb23354a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2048,8 +2048,8 @@ struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
pol = get_task_policy(current);
if (pol->mode == MPOL_INTERLEAVE ||
pol->mode == MPOL_WEIGHTED_INTERLEAVE) {
- *ilx += vma->vm_pgoff >> order;
- *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
+ *ilx += vma_start_pgoff(vma) >> order;
+ *ilx += linear_page_delta(vma, addr) >> order;
}
return pol;
}
@@ -3250,16 +3250,17 @@ EXPORT_SYMBOL_FOR_MODULES(mpol_shared_policy_init, "kvm");
int mpol_set_shared_policy(struct shared_policy *sp,
struct vm_area_struct *vma, struct mempolicy *pol)
{
- int err;
+ const pgoff_t pgoff = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
struct sp_node *new = NULL;
- unsigned long sz = vma_pages(vma);
+ int err;
if (pol) {
- new = sp_alloc(vma->vm_pgoff, vma->vm_pgoff + sz, pol);
+ new = sp_alloc(pgoff, pgoff_end, pol);
if (!new)
return -ENOMEM;
}
- err = shared_policy_replace(sp, vma->vm_pgoff, vma->vm_pgoff + sz, new);
+ err = shared_policy_replace(sp, pgoff, pgoff_end, new);
if (err && new)
sp_free(new);
return err;
diff --git a/mm/mremap.c b/mm/mremap.c
index e9c8b1d05832..079a0ba0c4a7 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -948,8 +948,7 @@ static unsigned long vrm_set_new_addr(struct vma_remap_struct *vrm)
struct vm_area_struct *vma = vrm->vma;
unsigned long map_flags = 0;
/* Page Offset _into_ the VMA. */
- pgoff_t internal_pgoff = (vrm->addr - vma->vm_start) >> PAGE_SHIFT;
- pgoff_t pgoff = vma->vm_pgoff + internal_pgoff;
+ const pgoff_t pgoff = linear_page_index(vma, vrm->addr);
unsigned long new_addr = vrm_implies_new_addr(vrm) ? vrm->new_addr : 0;
unsigned long res;
@@ -1255,12 +1254,10 @@ static void unmap_source_vma(struct vma_remap_struct *vrm)
static int copy_vma_and_data(struct vma_remap_struct *vrm,
struct vm_area_struct **new_vma_ptr)
{
- unsigned long internal_offset = vrm->addr - vrm->vma->vm_start;
- unsigned long internal_pgoff = internal_offset >> PAGE_SHIFT;
- unsigned long new_pgoff = vrm->vma->vm_pgoff + internal_pgoff;
- unsigned long moved_len;
+ const unsigned long new_pgoff = linear_page_index(vrm->vma, vrm->addr);
struct vm_area_struct *vma = vrm->vma;
struct vm_area_struct *new_vma;
+ unsigned long moved_len;
int err = 0;
PAGETABLE_MOVE(pmc, NULL, NULL, vrm->addr, vrm->new_addr, vrm->old_len);
@@ -1802,8 +1799,7 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
vrm->populate_expand = true;
/* Need to be careful about a growing mapping */
- pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
- pgoff += vma->vm_pgoff;
+ pgoff = linear_page_index(vma, addr);
if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
return -EINVAL;
diff --git a/mm/msync.c b/mm/msync.c
index ac4c9bfea2e7..90b491a27a14 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -12,6 +12,7 @@
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/file.h>
+#include <linux/pagemap.h>
#include <linux/syscalls.h>
#include <linux/sched.h>
@@ -85,8 +86,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
goto out_unlock;
}
file = vma->vm_file;
- fstart = (start - vma->vm_start) +
- ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ fstart = (loff_t)linear_page_index(vma, start) << PAGE_SHIFT;
fend = fstart + (min(end, vma->vm_end) - start) - 1;
start = vma->vm_end;
if ((flags & MS_SYNC) && file &&
diff --git a/mm/nommu.c b/mm/nommu.c
index 6d168f69763f..60560b2c457e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -975,7 +975,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
/* read the contents of a file into the copy */
loff_t fpos;
- fpos = vma->vm_pgoff;
+ fpos = vma_start_pgoff(vma);
fpos <<= PAGE_SHIFT;
ret = kernel_read(vma->vm_file, base, len, &fpos);
@@ -1355,7 +1355,8 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
delete_nommu_region(vma->vm_region);
if (new_below) {
vma->vm_region->vm_start = vma->vm_start = addr;
- vma->vm_region->vm_pgoff = vma->vm_pgoff += npages;
+ vma->vm_pgoff += npages;
+ vma->vm_region->vm_pgoff = vma_start_pgoff(vma);
} else {
vma->vm_region->vm_end = vma->vm_end = addr;
vma->vm_region->vm_top = addr;
@@ -1603,7 +1604,7 @@ int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long
unsigned long pfn = start >> PAGE_SHIFT;
unsigned long vm_len = vma->vm_end - vma->vm_start;
- pfn += vma->vm_pgoff;
+ pfn += vma_start_pgoff(vma);
return io_remap_pfn_range(vma, vma->vm_start, pfn, vm_len, vma->vm_page_prot);
}
EXPORT_SYMBOL(vm_iomap_memory);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 98d090ede077..0a3bbff57d46 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -813,7 +813,7 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
mapping_interval_tree_foreach(vma, mapping, first_index,
first_index + nr - 1) {
/* Clip to the vma */
- vba = vma->vm_pgoff;
+ vba = vma_start_pgoff(vma);
vea = vba + vma_pages(vma);
cba = first_index;
cba = max(cba, vba);
diff --git a/mm/shmem.c b/mm/shmem.c
index b51f83c970bb..4e7f6bc7a389 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1032,6 +1032,8 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma)
struct inode *inode = file_inode(vma->vm_file);
struct shmem_inode_info *info = SHMEM_I(inode);
struct address_space *mapping = inode->i_mapping;
+ const pgoff_t pgoff = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
unsigned long swapped;
/* Be careful as we don't hold info->lock */
@@ -1045,12 +1047,11 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma)
if (!swapped)
return 0;
- if (!vma->vm_pgoff && vma->vm_end - vma->vm_start >= inode->i_size)
+ if (!pgoff && vma->vm_end - vma->vm_start >= inode->i_size)
return swapped << PAGE_SHIFT;
/* Here comes the more involved part */
- return shmem_partial_swap_usage(mapping, vma->vm_pgoff,
- vma->vm_pgoff + vma_pages(vma));
+ return shmem_partial_swap_usage(mapping, pgoff, pgoff_end);
}
/*
@@ -2839,7 +2840,7 @@ static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
* by page order, as in shmem_get_pgoff_policy() and get_vma_policy()).
*/
*ilx = inode->i_ino;
- index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+ index = linear_page_index(vma, addr);
return mpol_shared_policy_lookup(&SHMEM_I(inode)->policy, index);
}
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 246af12bf801..bf4518f4449d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -481,7 +481,7 @@ static void mfill_retry_state_save(struct mfill_retry_state *s,
{
s->flags = vma_flags_and_mask(&vma->flags, MFILL_RETRY_STATE_VMA_FLAGS);
s->ops = vma_uffd_ops(vma);
- s->pgoff = vma->vm_pgoff;
+ s->pgoff = vma_start_pgoff(vma);
if (vma->vm_file)
s->file = get_file(vma->vm_file);
@@ -507,7 +507,7 @@ static bool mfill_retry_state_changed(struct mfill_retry_state *state,
/* VMA was file backed, but file, inode or offset has changed */
if (!vma->vm_file || vma->vm_file->f_inode != state->file->f_inode ||
- state->file != vma->vm_file || vma->vm_pgoff != state->pgoff)
+ state->file != vma->vm_file || vma_start_pgoff(vma) != state->pgoff)
return true;
return false;
diff --git a/mm/util.c b/mm/util.c
index af2c2103f0d9..61e6d32b2c16 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1188,7 +1188,7 @@ void compat_set_desc_from_vma(struct vm_area_desc *desc,
desc->start = vma->vm_start;
desc->end = vma->vm_end;
- desc->pgoff = vma->vm_pgoff;
+ desc->pgoff = vma_start_pgoff(vma);
desc->vm_file = vma->vm_file;
desc->vma_flags = vma->flags;
desc->page_prot = vma->vm_page_prot;
@@ -1379,7 +1379,7 @@ static int call_vma_mapped(struct vm_area_struct *vma)
if (!vm_ops || !vm_ops->mapped)
return 0;
- err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
+ err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma_start_pgoff(vma),
vma->vm_file, &vm_private_data);
if (err)
return err;
diff --git a/mm/vma.c b/mm/vma.c
index dc4c2c1077f4..ee3a8ca13d07 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -967,10 +967,9 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
* prev middle next
* extend delete delete
*/
-
vmg->start = prev->vm_start;
vmg->end = next->vm_end;
- vmg->pgoff = prev->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(prev);
/*
* We already ensured anon_vma compatibility above, so now it's
@@ -987,9 +986,8 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
* prev middle
* extend shrink/delete
*/
-
vmg->start = prev->vm_start;
- vmg->pgoff = prev->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(prev);
if (!vmg->__remove_middle)
vmg->__adjust_middle_start = true;
@@ -1011,13 +1009,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (vmg->__remove_middle) {
vmg->end = next->vm_end;
- vmg->pgoff = next->vm_pgoff - pglen;
+ vmg->pgoff = vma_start_pgoff(next) - pglen;
} else {
/* We shrink middle and expand next. */
vmg->__adjust_next_start = true;
vmg->start = middle->vm_start;
vmg->end = start;
- vmg->pgoff = middle->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(middle);
}
err = dup_anon_vma(next, middle, &anon_dup);
@@ -1126,7 +1124,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
if (can_merge_left) {
vmg->start = prev->vm_start;
vmg->target = prev;
- vmg->pgoff = prev->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(prev);
/*
* If this merge would result in removal of the next VMA but we
@@ -1957,7 +1955,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
VM_BUG_ON_VMA(faulted_in_anon_vma, new_vma);
*vmap = vma = new_vma;
}
- *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff);
+ *need_rmap_locks =
+ (vma_start_pgoff(new_vma) <= vma_start_pgoff(vma));
} else {
new_vma = vm_area_dup(vma);
if (!new_vma)
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 5cee8b7efa0f..e3644a3042e2 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -37,7 +37,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
unsigned long new_end = old_end - shift;
VMA_ITERATOR(vmi, mm, new_start);
VMG_STATE(vmg, mm, &vmi, new_start, old_end, EMPTY_VMA_FLAGS,
- vma->vm_pgoff);
+ vma_start_pgoff(vma));
struct vm_area_struct *next;
struct mmu_gather tlb;
PAGETABLE_MOVE(pmc, vma, vma, old_start, new_start, length);
@@ -89,7 +89,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
vma_prev(&vmi);
/* Shrink the vma to just the new range */
- return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
+ return vma_shrink(&vmi, vma, new_start, new_end, vma_start_pgoff(vma));
}
/*
diff --git a/mm/vma_init.c b/mm/vma_init.c
index 3c0b65950510..a459669a1654 100644
--- a/mm/vma_init.c
+++ b/mm/vma_init.c
@@ -46,7 +46,7 @@ static void vm_area_init_from(const struct vm_area_struct *src,
dest->vm_start = src->vm_start;
dest->vm_end = src->vm_end;
dest->anon_vma = src->anon_vma;
- dest->vm_pgoff = src->vm_pgoff;
+ dest->vm_pgoff = vma_start_pgoff(src);
dest->vm_file = src->vm_file;
dest->vm_private_data = src->vm_private_data;
vm_flags_init(dest, src->vm_flags);
--
2.54.0
^ permalink raw reply related
* [PATCH 15/30] mm: introduce and use linear_page_delta()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
It's often useful to obtain the number of pages a given address lies at
within a VMA.
Add linear_page_delta() to determine this and update linear_page_index() to
make use of it.
Add comments to describe both linear_page_delta() and linear_page_index().
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2c3718d592d6..644c0f25ae73 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1063,11 +1063,44 @@ static inline pgoff_t folio_pgoff(const struct folio *folio)
return folio->index;
}
+/**
+ * linear_page_delta() - Determine the relative page offset of @address within
+ * @vma.
+ * @vma: The VMA in which @address resides.
+ * @address: The address whose relative page offset is required.
+ *
+ * The result is identical for both file-backed and anonymous mappings and
+ * simply determines how many pages @address lies from @vma->vm_start.
+ *
+ * Returns: The number of pages @address is offset by within @vma.
+ */
+static inline pgoff_t linear_page_delta(const struct vm_area_struct *vma,
+ const unsigned long address)
+{
+ return (address - vma->vm_start) >> PAGE_SHIFT;
+}
+
+/**
+ * linear_page_index() - Determine the absolute page offset of @address within
+ * @vma.
+ * @vma: The VMA in which @address resides.
+ * @address: The address whose absolute page offset is required.
+ *
+ * For file-backed mappings, this returns the page offset of @address within the
+ * file.
+ *
+ * For anonymous mappings, this returns the virtual page offset of @address,
+ * which is the page offset the address possessed at the time the VMA was first
+ * faulted.
+ *
+ * Returns: The absolute page offset of @address within @vma.
+ */
static inline pgoff_t linear_page_index(const struct vm_area_struct *vma,
const unsigned long address)
{
pgoff_t pgoff;
- pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
+
+ pgoff = linear_page_delta(vma, address);
pgoff += vma->vm_pgoff;
return pgoff;
}
@@ -1219,7 +1252,7 @@ static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
void folio_wait_bit(struct folio *folio, int bit_nr);
int folio_wait_bit_killable(struct folio *folio, int bit_nr);
-/*
+/*
* Wait for a folio to be unlocked.
*
* This must be called with the caller "holding" the folio,
--
2.54.0
^ permalink raw reply related
* [PATCH 14/30] mm/vma: minor cleanup of expand_[upwards, downwards]()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Adjust the stack expansion functions expand_upwards() and
expand_downwards() such that they are expressed in terms of named constant
values, and make use of vma_start_pgoff().
This clearly documents that we are referencing the page offset of the start
of the VMA.
Additionally this cleans up the overflow check in expand_upwards().
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 1e99fe8aa6ef..dc4c2c1077f4 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -3216,13 +3216,12 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
/* Somebody else might have raced and expanded it already */
if (address > vma->vm_end) {
- unsigned long size, grow;
-
- size = address - vma->vm_start;
- grow = (address - vma->vm_end) >> PAGE_SHIFT;
+ const unsigned long size = address - vma->vm_start;
+ const unsigned long grow = (address - vma->vm_end) >> PAGE_SHIFT;
+ const pgoff_t pgoff = vma_start_pgoff(vma);
error = -ENOMEM;
- if (vma->vm_pgoff + (size >> PAGE_SHIFT) >= vma->vm_pgoff) {
+ if (pgoff + (size >> PAGE_SHIFT) >= pgoff) {
error = acct_stack_growth(vma, size, grow);
if (!error) {
if (vma_test(vma, VMA_LOCKED_BIT))
@@ -3295,13 +3294,11 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
/* Somebody else might have raced and expanded it already */
if (address < vma->vm_start) {
- unsigned long size, grow;
-
- size = vma->vm_end - address;
- grow = (vma->vm_start - address) >> PAGE_SHIFT;
+ const unsigned long size = vma->vm_end - address;
+ const unsigned long grow = (vma->vm_start - address) >> PAGE_SHIFT;
error = -ENOMEM;
- if (grow <= vma->vm_pgoff) {
+ if (grow <= vma_start_pgoff(vma)) {
error = acct_stack_growth(vma, size, grow);
if (!error) {
if (vma_test(vma, VMA_LOCKED_BIT))
--
2.54.0
^ permalink raw reply related
* [PATCH 13/30] mm/vma: refactor vmg_adjust_set_range() for clarity
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Add comments with ASCII diagrams to describe what we're doing, avoid
dubious use of PHYS_PFN(), and use vma_start_pgoff().
The most complicated scenario represented here is vmg->__adjust_next_start
- when this is set, vmg->[start, end] actually indicate the range to be
retained, so take special care to describe this accurately.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 47 insertions(+), 4 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 6296acecf3b7..1e99fe8aa6ef 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -704,11 +704,54 @@ static void vmg_adjust_set_range(struct vma_merge_struct *vmg)
pgoff_t pgoff;
if (vmg->__adjust_middle_start) {
- adjust = vmg->middle;
- pgoff = adjust->vm_pgoff + PHYS_PFN(vmg->end - adjust->vm_start);
+ /*
+ * vmg->start vmg->end
+ * | |
+ * v merge v
+ * <------------->
+ * delta
+ * <------>
+ * |------|----------------|
+ * | prev | middle |
+ * |------|----------------|
+ * ^
+ * |
+ * middle->vm_start
+ */
+ struct vm_area_struct *middle = vmg->middle;
+ const unsigned long delta = vmg->end - middle->vm_start;
+
+ pgoff = vma_start_pgoff(middle) + (delta >> PAGE_SHIFT);
+ adjust = middle;
} else if (vmg->__adjust_next_start) {
- adjust = vmg->next;
- pgoff = adjust->vm_pgoff - PHYS_PFN(adjust->vm_start - vmg->end);
+ /*
+ * Originally:
+ *
+ * vmg->start vmg->end
+ * | |
+ * v merge v
+ * <------------>
+ * . .
+ * merge_existing_range() updates to:
+ * . .
+ * vmg->start vmg->end .
+ * | | .
+ * v retain v .
+ * <----------> .
+ * delta .
+ * <-----> .
+ * |----------------|------|
+ * | middle | next |
+ * |----------------|------|
+ * ^
+ * |
+ * next->vm_start
+ */
+ struct vm_area_struct *next = vmg->next;
+ const unsigned long delta = next->vm_start - vmg->end;
+
+ pgoff = vma_start_pgoff(next) - (delta >> PAGE_SHIFT);
+ adjust = next;
} else {
return;
}
--
2.54.0
^ permalink raw reply related
* [PATCH 12/30] mm/vma: clean up anon_vma_compatible()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Break up the existing very large conditional, add comments and use
vma_[start/end]_pgoff() to make clearer what we're doing here.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index b60375c6c5c3..6296acecf3b7 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1967,14 +1967,25 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
{
vma_flags_t diff = vma_flags_diff_pair(&a->flags, &b->flags);
+ /* Ignore flags that mprotect() can change. */
vma_flags_clear_mask(&diff, VMA_ACCESS_FLAGS);
+ /* Ignore flags that do not impact merging. */
vma_flags_clear_mask(&diff, VMA_IGNORE_MERGE_FLAGS);
- return a->vm_end == b->vm_start &&
- mpol_equal(vma_policy(a), vma_policy(b)) &&
- a->vm_file == b->vm_file &&
- vma_flags_empty(&diff) &&
- b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
+ /* Must be adjacent. */
+ if (a->vm_end != b->vm_start)
+ return false;
+ /* Must have matching policy. */
+ if (!mpol_equal(vma_policy(a), vma_policy(b)))
+ return false;
+ /* Must both be anon or map the same file (MAP_PRIVATE case). */
+ if (a->vm_file != b->vm_file)
+ return false;
+ /* Flags must be equivalent modulo mprotect(). */
+ if (!vma_flags_empty(&diff))
+ return false;
+ /* Page offset must align. */
+ return vma_end_pgoff(a) == vma_start_pgoff(b);
}
/*
--
2.54.0
^ permalink raw reply related
* [PATCH 11/30] mm/vma: introduce and use vmg_pages(), vmg_[start, end]_pgoff()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
In the VMA logic we often need to determine the number of pages in the
specified merge range, as well as the start and end page offsets of that
range.
Introduce and use helpers for these purposes.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 11 ++++-------
mm/vma.h | 17 +++++++++++++++++
tools/testing/vma/include/dup.h | 10 ++++++++++
3 files changed, 31 insertions(+), 7 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 2be0dbd7bb7b..b60375c6c5c3 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -197,11 +197,9 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
*/
static bool can_vma_merge_before(struct vma_merge_struct *vmg)
{
- pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
-
if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
- if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
+ if (vmg_end_pgoff(vmg) == vma_start_pgoff(vmg->next))
return true;
}
@@ -221,7 +219,7 @@ static bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
- if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
+ if (vma_end_pgoff(vmg->prev) == vmg_start_pgoff(vmg))
return true;
}
return false;
@@ -759,7 +757,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
*/
vma_adjust_trans_huge(vma, vmg->start, vmg->end,
vmg->__adjust_middle_start ? vmg->middle : NULL);
- vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
+ vma_set_range(vma, vmg->start, vmg->end, vmg_start_pgoff(vmg));
vmg_adjust_set_range(vmg);
vma_iter_store_overwrite(vmg->vmi, vmg->target);
@@ -962,8 +960,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
* middle next
* shrink/delete extend
*/
-
- pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
+ const pgoff_t pglen = vmg_pages(vmg);
VM_WARN_ON_VMG(!merge_right, vmg);
/* If we are offset into a VMA, then prev must be middle. */
diff --git a/mm/vma.h b/mm/vma.h
index 8e4b61a7304c..527716c8739d 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -230,6 +230,23 @@ static inline bool vmg_nomem(struct vma_merge_struct *vmg)
return vmg->state == VMA_MERGE_ERROR_NOMEM;
}
+static inline pgoff_t vmg_start_pgoff(const struct vma_merge_struct *vmg)
+{
+ return vmg->pgoff;
+}
+
+static inline pgoff_t vmg_pages(const struct vma_merge_struct *vmg)
+{
+ const unsigned long size = vmg->end - vmg->start;
+
+ return size >> PAGE_SHIFT;
+}
+
+static inline pgoff_t vmg_end_pgoff(const struct vma_merge_struct *vmg)
+{
+ return vmg_start_pgoff(vmg) + vmg_pages(vmg);
+}
+
/* Assumes addr >= vma->vm_start. */
static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
unsigned long addr)
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index bf26b3f48d3a..535747d7fee4 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -1301,6 +1301,16 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
}
+static inline pgoff_t vma_start_pgoff(const struct vm_area_struct *vma)
+{
+ return vma->vm_pgoff;
+}
+
+static inline pgoff_t vma_end_pgoff(const struct vm_area_struct *vma)
+{
+ return vma_start_pgoff(vma) + vma_pages(vma);
+}
+
static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
{
return file->f_op->mmap_prepare(desc);
--
2.54.0
^ permalink raw reply related
* [PATCH 10/30] MAINTAINERS: Move mm/interval_tree.c to rmap section
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
This file implements code for the interval trees used by the file and anon
rmap implementation, so belongs in the rmap section.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 15011f5752a9..c46fee04a516 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17208,6 +17208,7 @@ R: Jann Horn <jannh@google.com>
L: linux-mm@kvack.org
S: Maintained
F: include/linux/rmap.h
+F: mm/interval_tree.c
F: mm/page_vma_mapped.c
F: mm/rmap.c
F: tools/testing/selftests/mm/rmap.c
@@ -17313,7 +17314,6 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: include/trace/events/mmap.h
F: fs/proc/task_mmu.c
F: fs/proc/task_nommu.c
-F: mm/interval_tree.c
F: mm/mincore.c
F: mm/mlock.c
F: mm/mmap.c
--
2.54.0
^ permalink raw reply related
* [PATCH 09/30] mm/rmap: parameterise anon_vma_interval_tree_*() by anon_vma
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Similar to what we did with mapping_interval_tree*(), let's declare
anon_vma_interval_tree*() in terms of anon_vma rather than rb_root_cached.
In each case the rb tree referenced is &anon_vma->rb_root, so just pass
anon_vma and the functions can figure this out themselves.
Additionally, rename 'node' to 'avc', 'index' to 'pgoff_start', and 'last'
to 'pgoff_last' to make clear what is being passed.
Finally express page offsets in terms of pgoff_t to be consistent.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/mm.h | 27 +++++++++++---------
mm/interval_tree.c | 41 ++++++++++++++++---------------
mm/ksm.c | 7 ++----
mm/memory-failure.c | 3 +--
mm/rmap.c | 11 ++++-----
mm/vma.c | 4 +--
tools/testing/vma/include/stubs.h | 4 +--
7 files changed, 48 insertions(+), 49 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 703e07ff7d12..cf2d42747064 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4081,22 +4081,25 @@ mapping_interval_tree_iter_next(struct vm_area_struct *vma,
vma; vma = mapping_interval_tree_iter_next(vma, pgoff_start, \
pgoff_last))
-void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
- struct rb_root_cached *root);
-void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
- struct rb_root_cached *root);
+void anon_vma_interval_tree_insert(struct anon_vma_chain *avc,
+ struct anon_vma *anon_vma);
+void anon_vma_interval_tree_remove(struct anon_vma_chain *avc,
+ struct anon_vma *anon_vma);
struct anon_vma_chain *
-anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
- unsigned long start, unsigned long last);
-struct anon_vma_chain *anon_vma_interval_tree_iter_next(
- struct anon_vma_chain *node, unsigned long start, unsigned long last);
+anon_vma_interval_tree_iter_first(struct anon_vma *anon_vma,
+ pgoff_t pgoff_start, pgoff_t pgoff_last);
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_next(struct anon_vma_chain *avc,
+ pgoff_t pgoff_start, pgoff_t pgoff_last);
#ifdef CONFIG_DEBUG_VM_RB
-void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
+void anon_vma_interval_tree_verify(struct anon_vma_chain *avc);
#endif
-#define anon_vma_interval_tree_foreach(avc, root, start, last) \
- for (avc = anon_vma_interval_tree_iter_first(root, start, last); \
- avc; avc = anon_vma_interval_tree_iter_next(avc, start, last))
+#define anon_vma_interval_tree_foreach(avc, anon_vma, pgoff_start, pgoff_last) \
+ for (avc = anon_vma_interval_tree_iter_first(anon_vma, pgoff_start, \
+ pgoff_last); \
+ avc; avc = anon_vma_interval_tree_iter_next(avc, pgoff_start, \
+ pgoff_last))
/* mmap.c */
extern int __vm_enough_memory(const struct mm_struct *mm, long pages, int cap_sys_admin);
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index cbd3038e46a9..d90e962b28f7 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -81,54 +81,55 @@ mapping_interval_tree_iter_next(struct vm_area_struct *vma,
/* Anonymous interval tree (anon_vma->rb_root) */
-static unsigned long avc_start_pgoff(struct anon_vma_chain *avc)
+static pgoff_t avc_start_pgoff(struct anon_vma_chain *avc)
{
return vma_start_pgoff(avc->vma);
}
-static unsigned long avc_last_pgoff(struct anon_vma_chain *avc)
+static pgoff_t avc_last_pgoff(struct anon_vma_chain *avc)
{
return vma_last_pgoff(avc->vma);
}
-INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
+INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, pgoff_t, rb_subtree_last,
avc_start_pgoff, avc_last_pgoff,
static, __anon_vma_interval_tree)
-void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
- struct rb_root_cached *root)
+void anon_vma_interval_tree_insert(struct anon_vma_chain *avc,
+ struct anon_vma *anon_vma)
{
#ifdef CONFIG_DEBUG_VM_RB
- node->cached_vma_start = avc_start_pgoff(node);
- node->cached_vma_last = avc_last_pgoff(node);
+ avc->cached_vma_start = avc_start_pgoff(avc);
+ avc->cached_vma_last = avc_last_pgoff(avc);
#endif
- __anon_vma_interval_tree_insert(node, root);
+ __anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
}
-void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
- struct rb_root_cached *root)
+void anon_vma_interval_tree_remove(struct anon_vma_chain *avc,
+ struct anon_vma *anon_vma)
{
- __anon_vma_interval_tree_remove(node, root);
+ __anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
}
struct anon_vma_chain *
-anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
- unsigned long first, unsigned long last)
+anon_vma_interval_tree_iter_first(struct anon_vma *anon_vma,
+ pgoff_t pgoff_start, pgoff_t pgoff_last)
{
- return __anon_vma_interval_tree_iter_first(root, first, last);
+ return __anon_vma_interval_tree_iter_first(&anon_vma->rb_root,
+ pgoff_start, pgoff_last);
}
struct anon_vma_chain *
-anon_vma_interval_tree_iter_next(struct anon_vma_chain *node,
- unsigned long first, unsigned long last)
+anon_vma_interval_tree_iter_next(struct anon_vma_chain *avc,
+ pgoff_t pgoff_start, pgoff_t pgoff_last)
{
- return __anon_vma_interval_tree_iter_next(node, first, last);
+ return __anon_vma_interval_tree_iter_next(avc, pgoff_start, pgoff_last);
}
#ifdef CONFIG_DEBUG_VM_RB
-void anon_vma_interval_tree_verify(struct anon_vma_chain *node)
+void anon_vma_interval_tree_verify(struct anon_vma_chain *avc)
{
- WARN_ON_ONCE(node->cached_vma_start != avc_start_pgoff(node));
- WARN_ON_ONCE(node->cached_vma_last != avc_last_pgoff(node));
+ WARN_ON_ONCE(avc->cached_vma_start != avc_start_pgoff(avc));
+ WARN_ON_ONCE(avc->cached_vma_last != avc_last_pgoff(avc));
}
#endif
diff --git a/mm/ksm.c b/mm/ksm.c
index 7d5b76478f0b..c6a6e1ef581d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3186,8 +3186,7 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
anon_vma_lock_read(anon_vma);
}
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
- 0, ULONG_MAX) {
+ anon_vma_interval_tree_foreach(vmac, anon_vma, 0, ULONG_MAX) {
cond_resched();
vma = vmac->vma;
@@ -3248,9 +3247,7 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
task_early_kill(tsk, force_early);
if (!t)
continue;
- anon_vma_interval_tree_foreach(vmac, &av->rb_root, 0,
- ULONG_MAX)
- {
+ anon_vma_interval_tree_foreach(vmac, av, 0, ULONG_MAX) {
vma = vmac->vma;
if (vma->vm_mm == t->mm) {
addr = rmap_item->address & PAGE_MASK;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5b97d26ee9b6..cbdec52b6d23 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -552,8 +552,7 @@ static void collect_procs_anon(const struct folio *folio,
if (!t)
continue;
- anon_vma_interval_tree_foreach(vmac, &av->rb_root,
- pgoff, pgoff) {
+ anon_vma_interval_tree_foreach(vmac, av, pgoff, pgoff) {
vma = vmac->vma;
if (vma->vm_mm != t->mm)
continue;
diff --git a/mm/rmap.c b/mm/rmap.c
index 567e46799c64..183603813255 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -211,7 +211,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
if (likely(!vma->anon_vma)) {
vma->anon_vma = anon_vma;
anon_vma_chain_assign(vma, avc, anon_vma);
- anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
+ anon_vma_interval_tree_insert(avc, anon_vma);
anon_vma->num_active_vmas++;
allocated = NULL;
avc = NULL;
@@ -354,7 +354,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
list_for_each_entry_reverse(avc, &dst->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma = avc->anon_vma;
- anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
+ anon_vma_interval_tree_insert(avc, anon_vma);
if (operation == VMA_OP_FORK)
maybe_reuse_anon_vma(dst, anon_vma);
}
@@ -434,7 +434,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
anon_vma_chain_assign(vma, avc, anon_vma);
/* Now let rmap see it. */
anon_vma_lock_write(anon_vma);
- anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
+ anon_vma_interval_tree_insert(avc, anon_vma);
anon_vma->parent->num_children++;
anon_vma_unlock_write(anon_vma);
@@ -499,7 +499,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma = avc->anon_vma;
- anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
+ anon_vma_interval_tree_remove(avc, anon_vma);
/*
* Leave empty anon_vmas on the list - we'll need
@@ -2986,8 +2986,7 @@ static void rmap_walk_anon(struct folio *folio,
pgoff_start = folio_pgoff(folio);
pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
- anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
- pgoff_start, pgoff_end) {
+ anon_vma_interval_tree_foreach(avc, anon_vma, pgoff_start, pgoff_end) {
struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(vma, pgoff_start,
folio_nr_pages(folio));
diff --git a/mm/vma.c b/mm/vma.c
index 7dc9d087c2c7..2be0dbd7bb7b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -272,7 +272,7 @@ anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma)
struct anon_vma_chain *avc;
list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
- anon_vma_interval_tree_remove(avc, &avc->anon_vma->rb_root);
+ anon_vma_interval_tree_remove(avc, avc->anon_vma);
}
static void
@@ -281,7 +281,7 @@ anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma)
struct anon_vma_chain *avc;
list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
- anon_vma_interval_tree_insert(avc, &avc->anon_vma->rb_root);
+ anon_vma_interval_tree_insert(avc, avc->anon_vma);
}
/*
diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
index 9c151b860f36..51d03e9c23c5 100644
--- a/tools/testing/vma/include/stubs.h
+++ b/tools/testing/vma/include/stubs.h
@@ -272,12 +272,12 @@ static inline void flush_dcache_mmap_unlock(struct address_space *mapping)
}
static inline void anon_vma_interval_tree_insert(struct anon_vma_chain *avc,
- struct rb_root_cached *rb)
+ struct anon_vma *anon_vma)
{
}
static inline void anon_vma_interval_tree_remove(struct anon_vma_chain *avc,
- struct rb_root_cached *rb)
+ struct anon_vma *anon_vma)
{
}
--
2.54.0
^ permalink raw reply related
* [PATCH 08/30] mm/rmap: rename vma_interval_tree_*() to mapping_interval_tree_*()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
The family of vma_interval_tree_() functions manipulate the
address_space (which, of course, is generally referred to as 'mapping')
reverse mapping, but are named the 'VMA' interval tree.
VMAs may be mapped by an anon_vma, an address_space, or both. Therefore
calling the mapping interval tree a 'VMA' interval tree is rather
confusing.
This is also inconsistent with the anon_vma_interval_tree_*() functions
which explicitly reference the rmap object to which they pertain.
Rename the vma_interval_tree_*() functions to mapping_interval_tree_*() to
correct this.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
arch/arm/mm/fault-armv.c | 2 +-
arch/arm/mm/flush.c | 2 +-
arch/nios2/mm/cacheflush.c | 2 +-
arch/parisc/kernel/cache.c | 2 +-
fs/dax.c | 2 +-
fs/hugetlbfs/inode.c | 6 +++---
include/linux/mm.h | 34 +++++++++++++++----------------
kernel/events/uprobes.c | 2 +-
mm/hugetlb.c | 4 ++--
mm/interval_tree.c | 22 ++++++++++----------
mm/khugepaged.c | 4 ++--
mm/memory-failure.c | 6 +++---
mm/memory.c | 2 +-
mm/mmap.c | 2 +-
mm/nommu.c | 8 ++++----
mm/pagewalk.c | 4 ++--
mm/rmap.c | 2 +-
mm/vma.c | 12 +++++------
tools/testing/vma/include/stubs.h | 8 ++++----
19 files changed, 63 insertions(+), 63 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index cd52cf7f8874..bd1ad4181a53 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -140,7 +140,7 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
* cache coherency.
*/
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_foreach(mpnt, mapping, pgoff, pgoff) {
+ mapping_interval_tree_foreach(mpnt, mapping, pgoff, pgoff) {
/*
* If we are using split PTE locks, then we need to take the pte
* lock. Otherwise we are using shared mm->page_table_lock which
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 8c593e9898ee..153132eaa120 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -251,7 +251,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct folio *
pgoff_end = pgoff + folio_nr_pages(folio) - 1;
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff_end) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff_end) {
unsigned long start, offset, pfn;
unsigned int nr;
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 42e3bf892316..f73406365e8b 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -82,7 +82,7 @@ static void flush_aliases(struct address_space *mapping, struct folio *folio)
pgoff = folio->index;
flush_dcache_mmap_lock_irqsave(mapping, flags);
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff + nr - 1) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff + nr - 1) {
unsigned long start;
if (vma->vm_mm != mm)
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index f28aa7884cbf..3c25adc2379e 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -503,7 +503,7 @@ void flush_dcache_folio(struct folio *folio)
* on machines that support equivalent aliasing
*/
flush_dcache_mmap_lock_irqsave(mapping, flags);
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff + nr - 1) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff + nr - 1) {
unsigned long offset = pgoff - vma->vm_pgoff;
unsigned long pfn = folio_pfn(folio);
diff --git a/fs/dax.c b/fs/dax.c
index 2f0818a68a7f..91943fb43c92 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1201,7 +1201,7 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
/* Walk all mappings of a given index of a file and writeprotect them */
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(vma, mapping, index, end) {
+ mapping_interval_tree_foreach(vma, mapping, index, end) {
pfn_mkclean_range(pfn, count, index, vma);
cond_resched();
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 4ea1798f1ffb..894d02e73302 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -393,7 +393,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
i_mmap_lock_write(mapping);
retry:
vma_lock = NULL;
- vma_interval_tree_foreach(vma, mapping, start, end - 1) {
+ mapping_interval_tree_foreach(vma, mapping, start, end - 1) {
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
@@ -469,8 +469,8 @@ hugetlb_vmdelete_list(struct address_space *mapping, pgoff_t start,
* unmapped. Note, end is exclusive, whereas the interval tree takes
* an inclusive "last".
*/
- vma_interval_tree_foreach(vma, mapping, start,
- end ? end - 1 : ULONG_MAX) {
+ mapping_interval_tree_foreach(vma, mapping, start,
+ end ? end - 1 : ULONG_MAX) {
unsigned long v_start;
unsigned long v_end;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bdba25491b0e..703e07ff7d12 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4061,25 +4061,25 @@ extern atomic_long_t mmap_pages_allocated;
extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
/* interval_tree.c */
-void vma_interval_tree_insert(struct vm_area_struct *vma,
- struct address_space *mapping);
-void vma_interval_tree_insert_after(struct vm_area_struct *vma,
- struct vm_area_struct *prev,
- struct address_space *mapping);
-void vma_interval_tree_remove(struct vm_area_struct *vma,
- struct address_space *mapping);
+void mapping_interval_tree_insert(struct vm_area_struct *vma,
+ struct address_space *mapping);
+void mapping_interval_tree_insert_after(struct vm_area_struct *vma,
+ struct vm_area_struct *prev,
+ struct address_space *mapping);
+void mapping_interval_tree_remove(struct vm_area_struct *vma,
+ struct address_space *mapping);
struct vm_area_struct *
-vma_interval_tree_iter_first(struct address_space *mapping,
- pgoff_t pgoff_start, pgoff_t pgoff_last);
+mapping_interval_tree_iter_first(struct address_space *mapping,
+ pgoff_t pgoff_start, pgoff_t pgoff_last);
struct vm_area_struct *
-vma_interval_tree_iter_next(struct vm_area_struct *vma,
- pgoff_t pgoff_start, pgoff_t pgoff_last);
-
-#define vma_interval_tree_foreach(vma, mapping, pgoff_start, pgoff_last) \
- for (vma = vma_interval_tree_iter_first(mapping, pgoff_start, \
- pgoff_last); \
- vma; vma = vma_interval_tree_iter_next(vma, pgoff_start, \
- pgoff_last))
+mapping_interval_tree_iter_next(struct vm_area_struct *vma,
+ pgoff_t pgoff_start, pgoff_t pgoff_last);
+
+#define mapping_interval_tree_foreach(vma, mapping, pgoff_start, pgoff_last) \
+ for (vma = mapping_interval_tree_iter_first(mapping, pgoff_start, \
+ pgoff_last); \
+ vma; vma = mapping_interval_tree_iter_next(vma, pgoff_start, \
+ pgoff_last))
void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
struct rb_root_cached *root);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 50a96a4d812d..f23cebacbc6d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1210,7 +1210,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
again:
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
if (!valid_vma(vma, is_register))
continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1e1fbf348c51..f45000149a78 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5382,7 +5382,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* __unmap_hugepage_range() is called as the lock is already held
*/
i_mmap_lock_write(mapping);
- vma_interval_tree_foreach(iter_vma, mapping, pgoff, pgoff) {
+ mapping_interval_tree_foreach(iter_vma, mapping, pgoff, pgoff) {
/* Do not unmap the current VMA */
if (iter_vma == vma)
continue;
@@ -6864,7 +6864,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t *pte;
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(svma, mapping, idx, idx) {
+ mapping_interval_tree_foreach(svma, mapping, idx, idx) {
if (svma == vma)
continue;
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index b387d39e0547..cbd3038e46a9 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -18,16 +18,16 @@ INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
vma_start_pgoff, vma_last_pgoff, static,
__vma_interval_tree)
-void vma_interval_tree_insert(struct vm_area_struct *vma,
- struct address_space *mapping)
+void mapping_interval_tree_insert(struct vm_area_struct *vma,
+ struct address_space *mapping)
{
__vma_interval_tree_insert(vma, &mapping->i_mmap);
}
/* Insert vma immediately after prev in the interval tree */
-void vma_interval_tree_insert_after(struct vm_area_struct *vma,
- struct vm_area_struct *prev,
- struct address_space *mapping)
+void mapping_interval_tree_insert_after(struct vm_area_struct *vma,
+ struct vm_area_struct *prev,
+ struct address_space *mapping)
{
struct rb_node **link;
struct vm_area_struct *parent;
@@ -58,23 +58,23 @@ void vma_interval_tree_insert_after(struct vm_area_struct *vma,
&__vma_interval_tree_augment);
}
-void vma_interval_tree_remove(struct vm_area_struct *vma,
- struct address_space *mapping)
+void mapping_interval_tree_remove(struct vm_area_struct *vma,
+ struct address_space *mapping)
{
__vma_interval_tree_remove(vma, &mapping->i_mmap);
}
struct vm_area_struct *
-vma_interval_tree_iter_first(struct address_space *mapping,
- pgoff_t pgoff_start, pgoff_t pgoff_last)
+mapping_interval_tree_iter_first(struct address_space *mapping,
+ pgoff_t pgoff_start, pgoff_t pgoff_last)
{
return __vma_interval_tree_iter_first(&mapping->i_mmap,
pgoff_start, pgoff_last);
}
struct vm_area_struct *
-vma_interval_tree_iter_next(struct vm_area_struct *vma,
- pgoff_t pgoff_start, pgoff_t pgoff_last)
+mapping_interval_tree_iter_next(struct vm_area_struct *vma,
+ pgoff_t pgoff_start, pgoff_t pgoff_last)
{
return __vma_interval_tree_iter_next(vma, pgoff_start, pgoff_last);
}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9dcf38dc0f8c..bd5f86cf4bd8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2136,7 +2136,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
struct vm_area_struct *vma;
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
struct mmu_notifier_range range;
struct mm_struct *mm;
unsigned long addr;
@@ -2568,7 +2568,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
* not be able to observe any missing pages due to the
* previously inserted retry entries.
*/
- vma_interval_tree_foreach(vma, mapping, start, end) {
+ mapping_interval_tree_foreach(vma, mapping, start, end) {
if (userfaultfd_missing(vma)) {
result = SCAN_EXCEED_NONE_PTE;
goto immap_locked;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3c842b472a75..5b97d26ee9b6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -586,7 +586,7 @@ static void collect_procs_file(const struct folio *folio,
if (!t)
continue;
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
/*
* Send early kill signal to tasks where a vma covers
* the page but the corrupted page is not necessarily
@@ -637,7 +637,7 @@ static void collect_procs_fsdax(const struct page *page,
t = task_early_kill(tsk, true);
if (!t)
continue;
- vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
if (vma->vm_mm == t->mm)
add_to_kill_fsdax(t, page, vma, to_kill, pgoff);
}
@@ -2238,7 +2238,7 @@ static void collect_procs_pfn(struct pfn_address_space *pfn_space,
t = task_early_kill(tsk, true);
if (!t)
continue;
- vma_interval_tree_foreach(vma, mapping, 0, ULONG_MAX) {
+ mapping_interval_tree_foreach(vma, mapping, 0, ULONG_MAX) {
pgoff_t pgoff;
if (vma->vm_mm == t->mm &&
diff --git a/mm/memory.c b/mm/memory.c
index 1cf59041600c..98c1a245f45a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4345,7 +4345,7 @@ static inline void unmap_mapping_range_tree(struct address_space *mapping,
unsigned long start, size;
struct mmu_gather tlb;
- vma_interval_tree_foreach(vma, mapping, first_index, last_index) {
+ mapping_interval_tree_foreach(vma, mapping, first_index, last_index) {
const pgoff_t start_idx = max(first_index, vma->vm_pgoff);
const pgoff_t end_idx = min(last_index, vma_last_pgoff(vma)) + 1;
diff --git a/mm/mmap.c b/mm/mmap.c
index 2f22fb0d068d..2d09a57e3620 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1830,7 +1830,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
mapping_allow_writable(mapping);
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
- vma_interval_tree_insert_after(tmp, mpnt, mapping);
+ mapping_interval_tree_insert_after(tmp, mpnt, mapping);
flush_dcache_mmap_unlock(mapping);
i_mmap_unlock_write(mapping);
}
diff --git a/mm/nommu.c b/mm/nommu.c
index 9a01b01ba8ed..6d168f69763f 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -569,7 +569,7 @@ static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
i_mmap_lock_write(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_insert(vma, mapping);
+ mapping_interval_tree_insert(vma, mapping);
flush_dcache_mmap_unlock(mapping);
i_mmap_unlock_write(mapping);
}
@@ -585,7 +585,7 @@ static void cleanup_vma_from_mm(struct vm_area_struct *vma)
i_mmap_lock_write(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, mapping);
+ mapping_interval_tree_remove(vma, mapping);
flush_dcache_mmap_unlock(mapping);
i_mmap_unlock_write(mapping);
}
@@ -1816,7 +1816,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size,
i_mmap_lock_read(inode->i_mapping);
/* search for VMAs that fall within the dead zone */
- vma_interval_tree_foreach(vma, inode->i_mapping, low, high) {
+ mapping_interval_tree_foreach(vma, inode->i_mapping, low, high) {
/* found one - only interested if it's shared out of the page
* cache */
if (vma->vm_flags & VM_SHARED) {
@@ -1832,7 +1832,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size,
* we don't check for any regions that start beyond the EOF as there
* shouldn't be any
*/
- vma_interval_tree_foreach(vma, inode->i_mapping, 0, ULONG_MAX) {
+ mapping_interval_tree_foreach(vma, inode->i_mapping, 0, ULONG_MAX) {
if (!(vma->vm_flags & VM_SHARED))
continue;
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 490a14691660..98d090ede077 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -810,8 +810,8 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
return -EINVAL;
lockdep_assert_held(&mapping->i_mmap_rwsem);
- vma_interval_tree_foreach(vma, mapping, first_index,
- first_index + nr - 1) {
+ mapping_interval_tree_foreach(vma, mapping, first_index,
+ first_index + nr - 1) {
/* Clip to the vma */
vba = vma->vm_pgoff;
vea = vba + vma_pages(vma);
diff --git a/mm/rmap.c b/mm/rmap.c
index 13ffa71bd20d..567e46799c64 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -3051,7 +3051,7 @@ static void __rmap_walk_file(struct folio *folio, struct address_space *mapping,
i_mmap_lock_read(mapping);
}
lookup:
- vma_interval_tree_foreach(vma, mapping, pgoff_start, pgoff_end) {
+ mapping_interval_tree_foreach(vma, mapping, pgoff_start, pgoff_end) {
unsigned long address = vma_address(vma, pgoff_start, nr_pages);
VM_BUG_ON_VMA(address == -EFAULT, vma);
diff --git a/mm/vma.c b/mm/vma.c
index ce4ec4b71138..7dc9d087c2c7 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -234,7 +234,7 @@ static void __vma_link_file(struct vm_area_struct *vma,
mapping_allow_writable(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_insert(vma, mapping);
+ mapping_interval_tree_insert(vma, mapping);
flush_dcache_mmap_unlock(mapping);
}
@@ -248,7 +248,7 @@ static void __remove_shared_vm_struct(struct vm_area_struct *vma,
mapping_unmap_writable(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, mapping);
+ mapping_interval_tree_remove(vma, mapping);
flush_dcache_mmap_unlock(mapping);
}
@@ -319,9 +319,9 @@ static void vma_prepare(struct vma_prepare *vp)
if (vp->file) {
flush_dcache_mmap_lock(vp->mapping);
- vma_interval_tree_remove(vp->vma, vp->mapping);
+ mapping_interval_tree_remove(vp->vma, vp->mapping);
if (vp->adj_next)
- vma_interval_tree_remove(vp->adj_next, vp->mapping);
+ mapping_interval_tree_remove(vp->adj_next, vp->mapping);
}
}
@@ -339,8 +339,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
{
if (vp->file) {
if (vp->adj_next)
- vma_interval_tree_insert(vp->adj_next, vp->mapping);
- vma_interval_tree_insert(vp->vma, vp->mapping);
+ mapping_interval_tree_insert(vp->adj_next, vp->mapping);
+ mapping_interval_tree_insert(vp->vma, vp->mapping);
flush_dcache_mmap_unlock(vp->mapping);
}
diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
index 94442b29458d..9c151b860f36 100644
--- a/tools/testing/vma/include/stubs.h
+++ b/tools/testing/vma/include/stubs.h
@@ -257,13 +257,13 @@ static inline void vm_acct_memory(long pages)
{
}
-static inline void vma_interval_tree_insert(struct vm_area_struct *vma,
- struct address_space *mapping)
+static inline void mapping_interval_tree_insert(struct vm_area_struct *vma,
+ struct address_space *mapping)
{
}
-static inline void vma_interval_tree_remove(struct vm_area_struct *vma,
- struct address_space *mapping)
+static inline void mapping_interval_tree_remove(struct vm_area_struct *vma,
+ struct address_space *mapping)
{
}
--
2.54.0
^ permalink raw reply related
* [PATCH 07/30] mm/rmap: elide unnecessary static inline's in interval_tree.c
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
It's not necessary to declare these functions static inline as they are
contained within a single compilation unit.
This makes the anonymous interval tree code consistent with the newly
updated file-backed interval tree code.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/interval_tree.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index ff36fd14ef37..b387d39e0547 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -81,19 +81,19 @@ vma_interval_tree_iter_next(struct vm_area_struct *vma,
/* Anonymous interval tree (anon_vma->rb_root) */
-static inline unsigned long avc_start_pgoff(struct anon_vma_chain *avc)
+static unsigned long avc_start_pgoff(struct anon_vma_chain *avc)
{
return vma_start_pgoff(avc->vma);
}
-static inline unsigned long avc_last_pgoff(struct anon_vma_chain *avc)
+static unsigned long avc_last_pgoff(struct anon_vma_chain *avc)
{
return vma_last_pgoff(avc->vma);
}
INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
avc_start_pgoff, avc_last_pgoff,
- static inline, __anon_vma_interval_tree)
+ static, __anon_vma_interval_tree)
void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
struct rb_root_cached *root)
--
2.54.0
^ permalink raw reply related
* [PATCH 06/30] mm/rmap: parameterise vma_interval_tree_*() by address_space
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
The file-backed mapping interval tree functions vma_interval_tree_*()
accept a raw rb_root_cached pointer to determine the tree in which they are
operating.
However, in each case, this is always associated with an address_space data
type.
So simply pass a pointer to that instead to simplify the code, and more
clearly differentiate between these operations and those concerning
anonymous mappings.
While we're here, make the generated interval tree functions static as they
do not need to be used externally (any previously existing external users
have now been removed).
We also rename VMA parameters from 'node' to 'vma' as calling this a node
is simply confusing, update the input index types to pgoff_t since they
reference page offsets and rename the parameters to pgoff_start and
pgoff_last.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
arch/arm/mm/fault-armv.c | 2 +-
arch/arm/mm/flush.c | 2 +-
arch/nios2/mm/cacheflush.c | 2 +-
arch/parisc/kernel/cache.c | 2 +-
fs/dax.c | 2 +-
fs/hugetlbfs/inode.c | 15 ++++----
include/linux/mm.h | 34 +++++++++---------
kernel/events/uprobes.c | 2 +-
mm/hugetlb.c | 4 +--
mm/interval_tree.c | 58 +++++++++++++++++++++++--------
mm/khugepaged.c | 4 +--
mm/memory-failure.c | 7 ++--
mm/memory.c | 8 ++---
mm/mmap.c | 3 +-
mm/nommu.c | 8 ++---
mm/pagewalk.c | 2 +-
mm/rmap.c | 3 +-
mm/vma.c | 14 ++++----
tools/testing/vma/include/stubs.h | 4 +--
19 files changed, 100 insertions(+), 76 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 91e488767783..cd52cf7f8874 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -140,7 +140,7 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
* cache coherency.
*/
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
+ vma_interval_tree_foreach(mpnt, mapping, pgoff, pgoff) {
/*
* If we are using split PTE locks, then we need to take the pte
* lock. Otherwise we are using shared mm->page_table_lock which
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 4d7ef5cc36b6..8c593e9898ee 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -251,7 +251,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct folio *
pgoff_end = pgoff + folio_nr_pages(folio) - 1;
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff_end) {
unsigned long start, offset, pfn;
unsigned int nr;
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 8321182eb927..42e3bf892316 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -82,7 +82,7 @@ static void flush_aliases(struct address_space *mapping, struct folio *folio)
pgoff = folio->index;
flush_dcache_mmap_lock_irqsave(mapping, flags);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff + nr - 1) {
unsigned long start;
if (vma->vm_mm != mm)
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 0170b69a21d3..f28aa7884cbf 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -503,7 +503,7 @@ void flush_dcache_folio(struct folio *folio)
* on machines that support equivalent aliasing
*/
flush_dcache_mmap_lock_irqsave(mapping, flags);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff + nr - 1) {
unsigned long offset = pgoff - vma->vm_pgoff;
unsigned long pfn = folio_pfn(folio);
diff --git a/fs/dax.c b/fs/dax.c
index 6d175cd47a99..2f0818a68a7f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1201,7 +1201,7 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
/* Walk all mappings of a given index of a file and writeprotect them */
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, index, end) {
+ vma_interval_tree_foreach(vma, mapping, index, end) {
pfn_mkclean_range(pfn, count, index, vma);
cond_resched();
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 216e1a0dd0b2..4ea1798f1ffb 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -380,7 +380,6 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
struct address_space *mapping,
struct folio *folio, pgoff_t index)
{
- struct rb_root_cached *root = &mapping->i_mmap;
struct hugetlb_vma_lock *vma_lock;
unsigned long pfn = folio_pfn(folio);
struct vm_area_struct *vma;
@@ -394,7 +393,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
i_mmap_lock_write(mapping);
retry:
vma_lock = NULL;
- vma_interval_tree_foreach(vma, root, start, end - 1) {
+ vma_interval_tree_foreach(vma, mapping, start, end - 1) {
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
@@ -460,8 +459,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
}
static void
-hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
- zap_flags_t zap_flags)
+hugetlb_vmdelete_list(struct address_space *mapping, pgoff_t start,
+ pgoff_t end, zap_flags_t zap_flags)
{
struct vm_area_struct *vma;
@@ -470,7 +469,8 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
* unmapped. Note, end is exclusive, whereas the interval tree takes
* an inclusive "last".
*/
- vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) {
+ vma_interval_tree_foreach(vma, mapping, start,
+ end ? end - 1 : ULONG_MAX) {
unsigned long v_start;
unsigned long v_end;
@@ -615,8 +615,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset)
i_size_write(inode, offset);
i_mmap_lock_write(mapping);
if (mapping_mapped(mapping))
- hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0,
- ZAP_FLAG_DROP_MARKER);
+ hugetlb_vmdelete_list(mapping, pgoff, 0, ZAP_FLAG_DROP_MARKER);
i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
}
@@ -676,7 +675,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
/* Unmap users of full pages in the hole. */
if (hole_end > hole_start) {
if (mapping_mapped(mapping))
- hugetlb_vmdelete_list(&mapping->i_mmap,
+ hugetlb_vmdelete_list(mapping,
hole_start >> PAGE_SHIFT,
hole_end >> PAGE_SHIFT, 0);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e7ee315d5ba2..bdba25491b0e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4061,23 +4061,25 @@ extern atomic_long_t mmap_pages_allocated;
extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
/* interval_tree.c */
-void vma_interval_tree_insert(struct vm_area_struct *node,
- struct rb_root_cached *root);
-void vma_interval_tree_insert_after(struct vm_area_struct *node,
+void vma_interval_tree_insert(struct vm_area_struct *vma,
+ struct address_space *mapping);
+void vma_interval_tree_insert_after(struct vm_area_struct *vma,
struct vm_area_struct *prev,
- struct rb_root_cached *root);
-void vma_interval_tree_remove(struct vm_area_struct *node,
- struct rb_root_cached *root);
-struct vm_area_struct *vma_interval_tree_subtree_search(struct vm_area_struct *node,
- unsigned long start, unsigned long last);
-struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root_cached *root,
- unsigned long start, unsigned long last);
-struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
- unsigned long start, unsigned long last);
-
-#define vma_interval_tree_foreach(vma, root, start, last) \
- for (vma = vma_interval_tree_iter_first(root, start, last); \
- vma; vma = vma_interval_tree_iter_next(vma, start, last))
+ struct address_space *mapping);
+void vma_interval_tree_remove(struct vm_area_struct *vma,
+ struct address_space *mapping);
+struct vm_area_struct *
+vma_interval_tree_iter_first(struct address_space *mapping,
+ pgoff_t pgoff_start, pgoff_t pgoff_last);
+struct vm_area_struct *
+vma_interval_tree_iter_next(struct vm_area_struct *vma,
+ pgoff_t pgoff_start, pgoff_t pgoff_last);
+
+#define vma_interval_tree_foreach(vma, mapping, pgoff_start, pgoff_last) \
+ for (vma = vma_interval_tree_iter_first(mapping, pgoff_start, \
+ pgoff_last); \
+ vma; vma = vma_interval_tree_iter_next(vma, pgoff_start, \
+ pgoff_last))
void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
struct rb_root_cached *root);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4084e926e284..50a96a4d812d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1210,7 +1210,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
again:
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
if (!valid_vma(vma, is_register))
continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 571212b80835..1e1fbf348c51 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5382,7 +5382,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* __unmap_hugepage_range() is called as the lock is already held
*/
i_mmap_lock_write(mapping);
- vma_interval_tree_foreach(iter_vma, &mapping->i_mmap, pgoff, pgoff) {
+ vma_interval_tree_foreach(iter_vma, mapping, pgoff, pgoff) {
/* Do not unmap the current VMA */
if (iter_vma == vma)
continue;
@@ -6864,7 +6864,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t *pte;
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
+ vma_interval_tree_foreach(svma, mapping, idx, idx) {
if (svma == vma)
continue;
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index 2d50bc6228c4..ff36fd14ef37 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -14,19 +14,26 @@
/* File-backed interval tree (address_space->i_mmap) */
INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
- unsigned long, shared.rb_subtree_last,
- vma_start_pgoff, vma_last_pgoff, /* empty */, vma_interval_tree)
+ pgoff_t, shared.rb_subtree_last,
+ vma_start_pgoff, vma_last_pgoff, static,
+ __vma_interval_tree)
-/* Insert node immediately after prev in the interval tree */
-void vma_interval_tree_insert_after(struct vm_area_struct *node,
+void vma_interval_tree_insert(struct vm_area_struct *vma,
+ struct address_space *mapping)
+{
+ __vma_interval_tree_insert(vma, &mapping->i_mmap);
+}
+
+/* Insert vma immediately after prev in the interval tree */
+void vma_interval_tree_insert_after(struct vm_area_struct *vma,
struct vm_area_struct *prev,
- struct rb_root_cached *root)
+ struct address_space *mapping)
{
struct rb_node **link;
struct vm_area_struct *parent;
- unsigned long last = vma_last_pgoff(node);
+ const pgoff_t pgoff_last = vma_last_pgoff(vma);
- VM_WARN_ON_ONCE_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
+ VM_WARN_ON_ONCE_VMA(vma_start_pgoff(vma) != vma_start_pgoff(prev), vma);
if (!prev->shared.rb.rb_right) {
parent = prev;
@@ -34,21 +41,42 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
} else {
parent = rb_entry(prev->shared.rb.rb_right,
struct vm_area_struct, shared.rb);
- if (parent->shared.rb_subtree_last < last)
- parent->shared.rb_subtree_last = last;
+ if (parent->shared.rb_subtree_last < pgoff_last)
+ parent->shared.rb_subtree_last = pgoff_last;
while (parent->shared.rb.rb_left) {
parent = rb_entry(parent->shared.rb.rb_left,
struct vm_area_struct, shared.rb);
- if (parent->shared.rb_subtree_last < last)
- parent->shared.rb_subtree_last = last;
+ if (parent->shared.rb_subtree_last < pgoff_last)
+ parent->shared.rb_subtree_last = pgoff_last;
}
link = &parent->shared.rb.rb_left;
}
- node->shared.rb_subtree_last = last;
- rb_link_node(&node->shared.rb, &parent->shared.rb, link);
- rb_insert_augmented(&node->shared.rb, &root->rb_root,
- &vma_interval_tree_augment);
+ vma->shared.rb_subtree_last = pgoff_last;
+ rb_link_node(&vma->shared.rb, &parent->shared.rb, link);
+ rb_insert_augmented(&vma->shared.rb, &mapping->i_mmap.rb_root,
+ &__vma_interval_tree_augment);
+}
+
+void vma_interval_tree_remove(struct vm_area_struct *vma,
+ struct address_space *mapping)
+{
+ __vma_interval_tree_remove(vma, &mapping->i_mmap);
+}
+
+struct vm_area_struct *
+vma_interval_tree_iter_first(struct address_space *mapping,
+ pgoff_t pgoff_start, pgoff_t pgoff_last)
+{
+ return __vma_interval_tree_iter_first(&mapping->i_mmap,
+ pgoff_start, pgoff_last);
+}
+
+struct vm_area_struct *
+vma_interval_tree_iter_next(struct vm_area_struct *vma,
+ pgoff_t pgoff_start, pgoff_t pgoff_last)
+{
+ return __vma_interval_tree_iter_next(vma, pgoff_start, pgoff_last);
}
/* Anonymous interval tree (anon_vma->rb_root) */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 617bca76db49..9dcf38dc0f8c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2136,7 +2136,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
struct vm_area_struct *vma;
i_mmap_lock_read(mapping);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
struct mmu_notifier_range range;
struct mm_struct *mm;
unsigned long addr;
@@ -2568,7 +2568,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
* not be able to observe any missing pages due to the
* previously inserted retry entries.
*/
- vma_interval_tree_foreach(vma, &mapping->i_mmap, start, end) {
+ vma_interval_tree_foreach(vma, mapping, start, end) {
if (userfaultfd_missing(vma)) {
result = SCAN_EXCEED_NONE_PTE;
goto immap_locked;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 51508a55c405..3c842b472a75 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -586,8 +586,7 @@ static void collect_procs_file(const struct folio *folio,
if (!t)
continue;
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
- pgoff) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
/*
* Send early kill signal to tasks where a vma covers
* the page but the corrupted page is not necessarily
@@ -638,7 +637,7 @@ static void collect_procs_fsdax(const struct page *page,
t = task_early_kill(tsk, true);
if (!t)
continue;
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+ vma_interval_tree_foreach(vma, mapping, pgoff, pgoff) {
if (vma->vm_mm == t->mm)
add_to_kill_fsdax(t, page, vma, to_kill, pgoff);
}
@@ -2239,7 +2238,7 @@ static void collect_procs_pfn(struct pfn_address_space *pfn_space,
t = task_early_kill(tsk, true);
if (!t)
continue;
- vma_interval_tree_foreach(vma, &mapping->i_mmap, 0, ULONG_MAX) {
+ vma_interval_tree_foreach(vma, mapping, 0, ULONG_MAX) {
pgoff_t pgoff;
if (vma->vm_mm == t->mm &&
diff --git a/mm/memory.c b/mm/memory.c
index ff338c2abe92..1cf59041600c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4336,7 +4336,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
return wp_page_copy(vmf);
}
-static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
+static inline void unmap_mapping_range_tree(struct address_space *mapping,
pgoff_t first_index,
pgoff_t last_index,
struct zap_details *details)
@@ -4345,7 +4345,7 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
unsigned long start, size;
struct mmu_gather tlb;
- vma_interval_tree_foreach(vma, root, first_index, last_index) {
+ vma_interval_tree_foreach(vma, mapping, first_index, last_index) {
const pgoff_t start_idx = max(first_index, vma->vm_pgoff);
const pgoff_t end_idx = min(last_index, vma_last_pgoff(vma)) + 1;
@@ -4387,7 +4387,7 @@ void unmap_mapping_folio(struct folio *folio)
i_mmap_lock_read(mapping);
if (unlikely(mapping_mapped(mapping)))
- unmap_mapping_range_tree(&mapping->i_mmap, first_index,
+ unmap_mapping_range_tree(mapping, first_index,
last_index, &details);
i_mmap_unlock_read(mapping);
}
@@ -4417,7 +4417,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
i_mmap_lock_read(mapping);
if (unlikely(mapping_mapped(mapping)))
- unmap_mapping_range_tree(&mapping->i_mmap, first_index,
+ unmap_mapping_range_tree(mapping, first_index,
last_index, &details);
i_mmap_unlock_read(mapping);
}
diff --git a/mm/mmap.c b/mm/mmap.c
index 2311ae7c2ff4..2f22fb0d068d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1830,8 +1830,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
mapping_allow_writable(mapping);
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
- vma_interval_tree_insert_after(tmp, mpnt,
- &mapping->i_mmap);
+ vma_interval_tree_insert_after(tmp, mpnt, mapping);
flush_dcache_mmap_unlock(mapping);
i_mmap_unlock_write(mapping);
}
diff --git a/mm/nommu.c b/mm/nommu.c
index ed3934bc2de4..9a01b01ba8ed 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -569,7 +569,7 @@ static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
i_mmap_lock_write(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_insert(vma, &mapping->i_mmap);
+ vma_interval_tree_insert(vma, mapping);
flush_dcache_mmap_unlock(mapping);
i_mmap_unlock_write(mapping);
}
@@ -585,7 +585,7 @@ static void cleanup_vma_from_mm(struct vm_area_struct *vma)
i_mmap_lock_write(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, &mapping->i_mmap);
+ vma_interval_tree_remove(vma, mapping);
flush_dcache_mmap_unlock(mapping);
i_mmap_unlock_write(mapping);
}
@@ -1816,7 +1816,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size,
i_mmap_lock_read(inode->i_mapping);
/* search for VMAs that fall within the dead zone */
- vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, low, high) {
+ vma_interval_tree_foreach(vma, inode->i_mapping, low, high) {
/* found one - only interested if it's shared out of the page
* cache */
if (vma->vm_flags & VM_SHARED) {
@@ -1832,7 +1832,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size,
* we don't check for any regions that start beyond the EOF as there
* shouldn't be any
*/
- vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, 0, ULONG_MAX) {
+ vma_interval_tree_foreach(vma, inode->i_mapping, 0, ULONG_MAX) {
if (!(vma->vm_flags & VM_SHARED))
continue;
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 3ae2586ff45b..490a14691660 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -810,7 +810,7 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
return -EINVAL;
lockdep_assert_held(&mapping->i_mmap_rwsem);
- vma_interval_tree_foreach(vma, &mapping->i_mmap, first_index,
+ vma_interval_tree_foreach(vma, mapping, first_index,
first_index + nr - 1) {
/* Clip to the vma */
vba = vma->vm_pgoff;
diff --git a/mm/rmap.c b/mm/rmap.c
index 1c77d5dc06e9..13ffa71bd20d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -3051,8 +3051,7 @@ static void __rmap_walk_file(struct folio *folio, struct address_space *mapping,
i_mmap_lock_read(mapping);
}
lookup:
- vma_interval_tree_foreach(vma, &mapping->i_mmap,
- pgoff_start, pgoff_end) {
+ vma_interval_tree_foreach(vma, mapping, pgoff_start, pgoff_end) {
unsigned long address = vma_address(vma, pgoff_start, nr_pages);
VM_BUG_ON_VMA(address == -EFAULT, vma);
diff --git a/mm/vma.c b/mm/vma.c
index 9eea2850818a..ce4ec4b71138 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -234,7 +234,7 @@ static void __vma_link_file(struct vm_area_struct *vma,
mapping_allow_writable(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_insert(vma, &mapping->i_mmap);
+ vma_interval_tree_insert(vma, mapping);
flush_dcache_mmap_unlock(mapping);
}
@@ -248,7 +248,7 @@ static void __remove_shared_vm_struct(struct vm_area_struct *vma,
mapping_unmap_writable(mapping);
flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, &mapping->i_mmap);
+ vma_interval_tree_remove(vma, mapping);
flush_dcache_mmap_unlock(mapping);
}
@@ -319,10 +319,9 @@ static void vma_prepare(struct vma_prepare *vp)
if (vp->file) {
flush_dcache_mmap_lock(vp->mapping);
- vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
+ vma_interval_tree_remove(vp->vma, vp->mapping);
if (vp->adj_next)
- vma_interval_tree_remove(vp->adj_next,
- &vp->mapping->i_mmap);
+ vma_interval_tree_remove(vp->adj_next, vp->mapping);
}
}
@@ -340,9 +339,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
{
if (vp->file) {
if (vp->adj_next)
- vma_interval_tree_insert(vp->adj_next,
- &vp->mapping->i_mmap);
- vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
+ vma_interval_tree_insert(vp->adj_next, vp->mapping);
+ vma_interval_tree_insert(vp->vma, vp->mapping);
flush_dcache_mmap_unlock(vp->mapping);
}
diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
index 64164e25658f..94442b29458d 100644
--- a/tools/testing/vma/include/stubs.h
+++ b/tools/testing/vma/include/stubs.h
@@ -258,12 +258,12 @@ static inline void vm_acct_memory(long pages)
}
static inline void vma_interval_tree_insert(struct vm_area_struct *vma,
- struct rb_root_cached *rb)
+ struct address_space *mapping)
{
}
static inline void vma_interval_tree_remove(struct vm_area_struct *vma,
- struct rb_root_cached *rb)
+ struct address_space *mapping)
{
}
--
2.54.0
^ permalink raw reply related
* [PATCH 05/30] mm/rmap: update mm/interval_tree.c comments
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Update the file comment to clarify that both file-backed and anonymous
interval trees are provided, referencing the relevant data types for
clarity.
Also add comments to indicate which parts of the file apply to each.
While we're here, convert the VM_BUG_ON_VMA() to VM_WARN_ON_ONCE_VMA().
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/interval_tree.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index 344d1f5946c7..2d50bc6228c4 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * mm/interval_tree.c - interval tree for mapping->i_mmap
+ * mm/interval_tree.c - interval tree for address_space->i_mmap and
+ * anon_vma->rb_root
*
* Copyright (C) 2012, Michel Lespinasse <walken@google.com>
*/
@@ -10,6 +11,8 @@
#include <linux/rmap.h>
#include <linux/interval_tree_generic.h>
+/* File-backed interval tree (address_space->i_mmap) */
+
INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
unsigned long, shared.rb_subtree_last,
vma_start_pgoff, vma_last_pgoff, /* empty */, vma_interval_tree)
@@ -23,7 +26,7 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
struct vm_area_struct *parent;
unsigned long last = vma_last_pgoff(node);
- VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
+ VM_WARN_ON_ONCE_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
if (!prev->shared.rb.rb_right) {
parent = prev;
@@ -48,6 +51,8 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
&vma_interval_tree_augment);
}
+/* Anonymous interval tree (anon_vma->rb_root) */
+
static inline unsigned long avc_start_pgoff(struct anon_vma_chain *avc)
{
return vma_start_pgoff(avc->vma);
--
2.54.0
^ permalink raw reply related
* [PATCH 04/30] mm: introduce and use vma_end_pgoff()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
We already have vma_last_pgoff() which retrieves the last page offset
within a VMA.
However, code often wishes to span a page offset range, which requires the
exclusive end of this range.
So provide this in vma_end_pgoff() and update vma_last_pgoff() to use this
function.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/mm.h | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2f00c75e66bd..e7ee315d5ba2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4298,6 +4298,23 @@ static inline pgoff_t vma_start_pgoff(const struct vm_area_struct *vma)
return vma->vm_pgoff;
}
+/**
+ * vma_end_pgoff() - Get the page offset of the exclusive end of @vma
+ * @vma: The VMA whose end page offset is required.
+ *
+ * This returns the exclusive end page offset of @vma, which is useful for
+ * expressing page offset ranges.
+ *
+ * See the description of vma_start_pgoff() for a description of VMA page
+ * offsets.
+ *
+ * Returns: The exclusive end page offset of @vma.
+ */
+static inline pgoff_t vma_end_pgoff(const struct vm_area_struct *vma)
+{
+ return vma_start_pgoff(vma) + vma_pages(vma);
+}
+
/**
* vma_last_pgoff() - Get the page offset of the last page in @vma
* @vma: The VMA whose last page offset is required.
@@ -4311,7 +4328,7 @@ static inline pgoff_t vma_start_pgoff(const struct vm_area_struct *vma)
*/
static inline pgoff_t vma_last_pgoff(const struct vm_area_struct *vma)
{
- return vma_start_pgoff(vma) + vma_pages(vma) - 1;
+ return vma_end_pgoff(vma) - 1;
}
static inline unsigned long vma_desc_size(const struct vm_area_desc *desc)
--
2.54.0
^ permalink raw reply related
* [PATCH 03/30] tools/testing/vma: use vma_start_pgoff() in merge tests
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Now we have the vma_start_pgoff() helper, update the merge tests to make
use of it for consistency.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
tools/testing/vma/tests/merge.c | 38 ++++++++++++++++-----------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/tools/testing/vma/tests/merge.c b/tools/testing/vma/tests/merge.c
index 03b6f9820e0a..f8666a755749 100644
--- a/tools/testing/vma/tests/merge.c
+++ b/tools/testing/vma/tests/merge.c
@@ -118,7 +118,7 @@ static bool test_simple_merge(void)
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x3000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_FLAGS_SAME_MASK(&vma->flags, vma_flags);
detach_free_vma(vma);
@@ -150,7 +150,7 @@ static bool test_simple_modify(void)
ASSERT_EQ(vma->vm_start, 0x1000);
ASSERT_EQ(vma->vm_end, 0x2000);
- ASSERT_EQ(vma->vm_pgoff, 1);
+ ASSERT_EQ(vma_start_pgoff(vma), 1);
/*
* Now walk through the three split VMAs and make sure they are as
@@ -162,7 +162,7 @@ static bool test_simple_modify(void)
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x1000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
detach_free_vma(vma);
vma_iter_clear(&vmi);
@@ -171,7 +171,7 @@ static bool test_simple_modify(void)
ASSERT_EQ(vma->vm_start, 0x1000);
ASSERT_EQ(vma->vm_end, 0x2000);
- ASSERT_EQ(vma->vm_pgoff, 1);
+ ASSERT_EQ(vma_start_pgoff(vma), 1);
detach_free_vma(vma);
vma_iter_clear(&vmi);
@@ -180,7 +180,7 @@ static bool test_simple_modify(void)
ASSERT_EQ(vma->vm_start, 0x2000);
ASSERT_EQ(vma->vm_end, 0x3000);
- ASSERT_EQ(vma->vm_pgoff, 2);
+ ASSERT_EQ(vma_start_pgoff(vma), 2);
detach_free_vma(vma);
mtree_destroy(&mm.mm_mt);
@@ -209,7 +209,7 @@ static bool test_simple_expand(void)
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x3000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
detach_free_vma(vma);
mtree_destroy(&mm.mm_mt);
@@ -231,7 +231,7 @@ static bool test_simple_shrink(void)
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x1000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
detach_free_vma(vma);
mtree_destroy(&mm.mm_mt);
@@ -324,7 +324,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_TRUE(merged);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x4000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 3);
@@ -343,7 +343,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_TRUE(merged);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x5000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 3);
@@ -364,7 +364,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_TRUE(merged);
ASSERT_EQ(vma->vm_start, 0x6000);
ASSERT_EQ(vma->vm_end, 0x9000);
- ASSERT_EQ(vma->vm_pgoff, 6);
+ ASSERT_EQ(vma_start_pgoff(vma), 6);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 3);
@@ -384,7 +384,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_TRUE(merged);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x9000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 2);
@@ -404,7 +404,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_TRUE(merged);
ASSERT_EQ(vma->vm_start, 0xa000);
ASSERT_EQ(vma->vm_end, 0xc000);
- ASSERT_EQ(vma->vm_pgoff, 0xa);
+ ASSERT_EQ(vma_start_pgoff(vma), 0xa);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 2);
@@ -423,7 +423,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_TRUE(merged);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0xc000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 1);
@@ -443,7 +443,7 @@ static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky,
ASSERT_NE(vma, NULL);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0xc000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
detach_free_vma(vma);
@@ -805,7 +805,7 @@ static bool test_vma_merge_new_with_close(void)
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x5000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_EQ(vma->vm_ops, &vm_ops);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 2);
@@ -865,7 +865,7 @@ static bool __test_merge_existing(bool prev_is_sticky, bool middle_is_sticky, bo
ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
ASSERT_EQ(vma->vm_start, 0x2000);
ASSERT_EQ(vma->vm_end, 0x3000);
- ASSERT_EQ(vma->vm_pgoff, 2);
+ ASSERT_EQ(vma_start_pgoff(vma), 2);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_TRUE(vma_write_started(vma_next));
ASSERT_EQ(mm.map_count, 2);
@@ -931,7 +931,7 @@ static bool __test_merge_existing(bool prev_is_sticky, bool middle_is_sticky, bo
ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
ASSERT_EQ(vma->vm_start, 0x6000);
ASSERT_EQ(vma->vm_end, 0x7000);
- ASSERT_EQ(vma->vm_pgoff, 6);
+ ASSERT_EQ(vma_start_pgoff(vma), 6);
ASSERT_TRUE(vma_write_started(vma_prev));
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 2);
@@ -1416,7 +1416,7 @@ static bool test_merge_extend(void)
ASSERT_EQ(vma_merge_extend(&vmi, vma, 0x2000), vma);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x4000);
- ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma_start_pgoff(vma), 0);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(mm.map_count, 1);
@@ -1456,7 +1456,7 @@ static bool test_expand_only_mode(void)
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma->vm_start, 0x3000);
ASSERT_EQ(vma->vm_end, 0x9000);
- ASSERT_EQ(vma->vm_pgoff, 3);
+ ASSERT_EQ(vma_start_pgoff(vma), 3);
ASSERT_TRUE(vma_write_started(vma));
ASSERT_EQ(vma_iter_addr(&vmi), 0x3000);
vma_assert_attached(vma);
--
2.54.0
^ permalink raw reply related
* [PATCH 02/30] mm: add kdoc comments for vma_start/last_pgoff()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Describe what vma_start_pgoff() and vma_last_pgoff() actually provide in
detail.
This is in order that we can differentiate this between functions that will
be added in a subsequent patch which provide a different page offset.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/mm.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 059144435729..2f00c75e66bd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4278,11 +4278,37 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
}
+/**
+ * vma_start_pgoff() - Get the page offset of the start of @vma
+ * @vma: The VMA whose page offset is required.
+ *
+ * If the VMA is file-backed, this is the page offset into the file.
+ *
+ * If the VMA is anonymous, this is the virtual page offset of the start of the
+ * VMA - if unfaulted, then vma->vm_start >> PAGE_SHIFT, if faulted then the
+ * virtual page offset at the time of first fault.
+ *
+ * Note that if @vma is a MAP_PRIVATE file-backed mapping, then this returns the
+ * file offset.
+ *
+ * Returns: The page offset of the start of @vma.
+ */
static inline pgoff_t vma_start_pgoff(const struct vm_area_struct *vma)
{
return vma->vm_pgoff;
}
+/**
+ * vma_last_pgoff() - Get the page offset of the last page in @vma
+ * @vma: The VMA whose last page offset is required.
+ *
+ * This returns the last page offset contained within @vma.
+ *
+ * See the description of vma_start_pgoff() for a description of VMA page
+ * offsets.
+ *
+ * Returns: The last page offset of @vma.
+ */
static inline pgoff_t vma_last_pgoff(const struct vm_area_struct *vma)
{
return vma_start_pgoff(vma) + vma_pages(vma) - 1;
--
2.54.0
^ permalink raw reply related
* [PATCH 01/30] mm: move vma_start_pgoff() into mm.h and clean up
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
vma_last_pgoff() already lives there, so it's a bit odd to keep
vma_start_pgoff() in mm/interval_tree.c. Move them together.
These each return unsigned long, which pgoff_t is typedef'd to. Make this
consistent and have these functions return pgoff_t instead.
Additionally, express vma_last_pgoff() in terms of vma_start_pgoff(), since
we wrap the vma->vm_pgoff access, we may as well use it here.
Also while we're here, const-ify the VMA and cleanup a bit.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/mm.h | 9 +++++++--
mm/interval_tree.c | 5 -----
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 485df9c2dbdd..059144435729 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4278,9 +4278,14 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
}
-static inline unsigned long vma_last_pgoff(struct vm_area_struct *vma)
+static inline pgoff_t vma_start_pgoff(const struct vm_area_struct *vma)
{
- return vma->vm_pgoff + vma_pages(vma) - 1;
+ return vma->vm_pgoff;
+}
+
+static inline pgoff_t vma_last_pgoff(const struct vm_area_struct *vma)
+{
+ return vma_start_pgoff(vma) + vma_pages(vma) - 1;
}
static inline unsigned long vma_desc_size(const struct vm_area_desc *desc)
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index 32bcfbfcf15f..344d1f5946c7 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -10,11 +10,6 @@
#include <linux/rmap.h>
#include <linux/interval_tree_generic.h>
-static inline unsigned long vma_start_pgoff(struct vm_area_struct *v)
-{
- return v->vm_pgoff;
-}
-
INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
unsigned long, shared.rb_subtree_last,
vma_start_pgoff, vma_last_pgoff, /* empty */, vma_interval_tree)
--
2.54.0
^ permalink raw reply related
* [PATCH 00/30] mm: make VMA page offset handling more consistent
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
This series performs a series of cleanups and improvements around how the
vma->vm_pgoff field is used.
Folios belonging to file-backed mappings are simply indexed by the page
offset within the file they map.
However, anonymous folios belonging to pure anonymous mappings are indexed
by their "virtual" page offset, which is equal to addr >> PAGE_SHIFT at the
time at which the VMA was first faulted in.
The page offset of a VMA is stored in vma->vm_pgoff and indicates the page
offset of the start of the VMA range, whether it be file-backed or
anonymous.
The work here both cleans up how we reference this field, as well as laying
the foundations for a future series which addresses the inconsistency of
CoW'd folios in MAP_PRIVATE-file backed mappings, which are indexed as if
they were file-backed but behave as if they were anonymous.
This future series will make it such that all anonymous folios are indexed
by virtual page offset whether belonging to VMAs who satisfy
vma_is_anonymous() or MAP_PRIVATE-mapped file-backed mappings.
This series:
* Exposes vma_start_pgoff() and updates the kernel to use it consistently.
* Adds and uses the useful vma_end_pgoff() helper.
* Parameterises the file-backed mapping helpers vma_interval_tree_*()
by adress_space rather than rb_root_cached.
* Renames the misleadingly-named vma_interval_tree_*() helpers to
mapping_interval_tree_*() to be consistent with
anon_vma_interval_tree_*().
* Parameterises anon_vma_interval_tree_*() by anon_vma.
* Moves mm/interval_tree.c to the rmap section.
* Adds vmg_*() helpers for page offset.
* Clarifies the confusing vmg_adjust_set_range() function.
* Introduces linear_page_delta() to provide relative pgoff within a VMA.
* Replaces open-coded versions of linear_page_delta() and
linear_page_index() with invocations of these functions.
* Introduces and uses vma_assert_can_modify() to account for whether a VMA
can be modified (detached or write locked).
* Adds and uses vma_[add,sub]_pgoff() to adjust VMA page offset.
* Moves __install_special_mapping() to vma.c.
* Makes vma_set_range() static and internal to vma.c.
* Introduces and makes use of vma_set_pgoff().
* Fixes incorrect vma.h header inclusion.
* Defaults VMA userland tests to 64-bit vma flags size.
* Updates VMA userland tests to give better output on failure.
* Various smaller cleanups.
Lorenzo Stoakes (30):
mm: move vma_start_pgoff() into mm.h and clean up
mm: add kdoc comments for vma_start/last_pgoff()
tools/testing/vma: use vma_start_pgoff() in merge tests
mm: introduce and use vma_end_pgoff()
mm/rmap: update mm/interval_tree.c comments
mm/rmap: parameterise vma_interval_tree_*() by address_space
mm/rmap: elide unnecessary static inline's in interval_tree.c
mm/rmap: rename vma_interval_tree_*() to mapping_interval_tree_*()
mm/rmap: parameterise anon_vma_interval_tree_*() by anon_vma
MAINTAINERS: Move mm/interval_tree.c to rmap section
mm/vma: introduce and use vmg_pages(), vmg_[start, end]_pgoff()
mm/vma: clean up anon_vma_compatible()
mm/vma: refactor vmg_adjust_set_range() for clarity
mm/vma: minor cleanup of expand_[upwards, downwards]()
mm: introduce and use linear_page_delta()
mm/vma: use vma_start_pgoff(), linear_page_index() in mm code
mm: prefer vma_[start,end]_pgoff() to vma->vm_pgoff in kernel/
mm/vma: remove duplicative vma_pgoff_offset() helper
mm: use linear_page_[index, delta]() consistently
mm/vma: introduce vma_assert_can_modify()
mm/vma: add and use vma_[add/sub]_pgoff()
mm/vma: move __install_special_mapping() to vma.c
mm/vma: make vma_set_range() static, drop insert_vm_struct() decl
mm/vma: update vma_shrink() to not pass unnecessary pgoff parameter
mm/vma: update vmg_adjust_set_range() to offset pgoff instead
mm/vma: introduce and use vma_set_pgoff()
mm/vma: correct incorrect vma.h inclusion
mm/vma: use guard clauses in can_vma_merge_[before, after]()
tools/testing/vma: default VMA flag bits to 64-bit
tools/testing/vma: output compared expression on ASSERT_[EQ, NE]()
MAINTAINERS | 2 +-
arch/arm/mm/fault-armv.c | 4 +-
arch/arm/mm/flush.c | 2 +-
arch/nios2/mm/cacheflush.c | 2 +-
arch/parisc/kernel/cache.c | 2 +-
arch/x86/kernel/cpu/sgx/virt.c | 3 +-
drivers/comedi/comedi_fops.c | 3 +-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 3 +-
drivers/gpu/drm/gma500/gem.c | 2 +-
drivers/gpu/drm/msm/msm_gem.c | 3 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 5 +-
drivers/gpu/drm/tegra/gem.c | 3 +-
drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 +-
drivers/vfio/pci/nvgrace-gpu/main.c | 3 +-
drivers/vfio/pci/vfio_pci_core.c | 3 +-
fs/dax.c | 2 +-
fs/hugetlbfs/inode.c | 15 +-
include/linux/huge_mm.h | 1 +
include/linux/hugetlb.h | 3 +-
include/linux/mm.h | 118 +++++++++----
include/linux/mmap_lock.h | 8 +
include/linux/pagemap.h | 39 ++++-
kernel/dma/coherent.c | 7 +-
kernel/dma/direct.c | 6 +-
kernel/dma/mapping.c | 8 +-
kernel/dma/ops_helpers.c | 4 +-
kernel/events/core.c | 20 ++-
kernel/events/uprobes.c | 13 +-
kernel/kcov.c | 2 +-
kernel/trace/ring_buffer.c | 3 +-
mm/damon/vaddr.c | 5 +-
mm/debug.c | 2 +-
mm/filemap.c | 7 +-
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 15 +-
mm/internal.h | 33 ++--
mm/interval_tree.c | 113 +++++++-----
mm/khugepaged.c | 7 +-
mm/ksm.c | 7 +-
mm/madvise.c | 6 +-
mm/mapping_dirty_helpers.c | 2 +-
mm/memory-failure.c | 10 +-
mm/memory.c | 33 ++--
mm/mempolicy.c | 13 +-
mm/mmap.c | 41 +----
mm/mmu_notifier.c | 2 +-
mm/mremap.c | 12 +-
mm/msync.c | 4 +-
mm/nommu.c | 22 +--
mm/pagewalk.c | 6 +-
mm/rmap.c | 14 +-
mm/shmem.c | 9 +-
mm/userfaultfd.c | 4 +-
mm/util.c | 4 +-
mm/vma.c | 239 ++++++++++++++++++--------
mm/vma.h | 59 ++++++-
mm/vma_exec.c | 12 +-
mm/vma_init.c | 6 +-
mm/vma_internal.h | 4 +-
tools/testing/vma/Makefile | 2 +-
tools/testing/vma/include/dup.h | 41 ++++-
tools/testing/vma/include/stubs.h | 12 +-
tools/testing/vma/shared.c | 9 -
tools/testing/vma/shared.h | 36 ++--
tools/testing/vma/tests/merge.c | 40 ++---
virt/kvm/guest_memfd.c | 2 +-
66 files changed, 699 insertions(+), 432 deletions(-)
--
2.54.0
^ permalink raw reply
* Re: [PATCH v8 24/46] KVM: guest_memfd: Make in-place conversion the default\
From: Yan Zhao @ 2026-06-29 11:39 UTC (permalink / raw)
To: Sean Christopherson
Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, jmattson, jthoughton, michael.roth, oupton, pankaj.gupta,
qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
tabba, willy, wyihan, forkloop, pratyush, suzuki.poulose,
aneesh.kumar, liam, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
Kiryl Shutsemau, Baoquan He, Jason Gunthorpe, Vlastimil Babka,
kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <aj7NwCRwWEfLK-gQ@google.com>
On Fri, Jun 26, 2026 at 12:06:40PM -0700, Sean Christopherson wrote:
> On Fri, Jun 26, 2026, Yan Zhao wrote:
> > On Thu, Jun 25, 2026 at 07:36:28AM -0700, Sean Christopherson wrote:
> > > On Thu, Jun 25, 2026, Yan Zhao wrote:
> > > And I'm not remotely convinced that prepending allow_ to the param will help
> > > end users diagnose "unexpected" memory consumption, in quotes because anyone that
> > > is deploying a stack that utilizes out-of-place conversion absolutely needs to
> > > understand and plan for the additional memory consumption. I.e. if the memory
> > > consumption is "unexpected" to the end user, they likely have far bigger problems.
> > My first impression of gmem_in_place_conversion=true was that it enforces gmem
> > in-place conversion. However, it actually only enforces per-gmem private/shared
> > attribute.
> > My worry was that people might think it's a kernel bug if userspace can still
> > have shared memory from other sources after they configured
> > gmem_in_place_conversion=true.
>
> Ah, I see where you're coming from. FWIW, truly enforcing in-place conversion
> is flat out impossible. E.g. userspace can simply replace the memslot, at which
> point the memory effectively reverts to shared.
>
> > However, I have no strong opinion if you think gmem_in_place_conversion is good,
> > and with the above documentation. :)
>
> Ya, I think this largely a documentation problem. I agree that a param name
> like gmem_private_memory_attributes would be more precise, but I think it'd be
> far less informative for the vast majority of users that only care whether or
> not KVM can do in-place conversion, and don't care about how that is done.
Ok.
^ permalink raw reply
* Re: [RFC PATCH v2 1/4] rtla/osnoise: Add IPI tracking cmdline option
From: Tomas Glozar @ 2026-06-29 10:51 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, linux-trace-kernel, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Costa Shulyupin,
Crystal Wood, John Kacur, Ivan Pravdin, Jonathan Corbet
In-Reply-To: <20260617131803.2988989-2-vschneid@redhat.com>
st 17. 6. 2026 v 15:18 odesílatel Valentin Schneider
<vschneid@redhat.com> napsal:
>
> Later commits will add IPI tracking to osnoise top. To avoid breaking
> existing scripts, this new feature will be gated behind a new -i option.
>
> Suggested-by: Tomas Glozar <tglozar@redhat.com>
Thanks. Implementing this as a separate option also means we don't
have to worry about the performance impact in the general use case, as
the feature is not enabled by default.
If we decide to enable IPI tracking by default in the future, we can
just change the option to "--no-ipi" without breaking anything, as
libsubcmd generates all options in a pair by default (i.e. it
automatically recognizes --no-ipi when you define --ipi and vice
versa, unless explicitly disabled).
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> Documentation/tools/rtla/rtla-osnoise-top.rst | 4 ++++
> tools/tracing/rtla/src/cli.c | 1 +
> tools/tracing/rtla/src/cli_p.h | 3 +++
> tools/tracing/rtla/src/common.h | 1 +
> 4 files changed, 9 insertions(+)
>
> [truncated]
>
> --- a/tools/tracing/rtla/src/cli_p.h
> +++ b/tools/tracing/rtla/src/cli_p.h
> @@ -305,6 +305,9 @@ static int opt_filter_cb(const struct option *opt, const char *arg, int unset)
> "the minimum delta to be considered a noise", \
> opt_llong_callback)
>
> +#define OSNOISE_OPT_IPI OPT_BOOLEAN('i', "ipi", ¶ms->common.ipi, \
> + "track sources of IPIs")
> +
As IPI tracking is not a commonly used functionality, unlike e.g.
"-p/--period", and -i is already a different option for timerlat tools
(-i-/--irq), I'd suggest keeping just the long option, --ipi, like I
did for --on-threshold/--on-end (on Arnaldo's suggestion based on his
experience from perf [1]). This will make it clear to user the option
means "IPI detection" and not something else beginning with the letter
"i". We can always add a short option later if its use becomes common.
[1] https://lore.kernel.org/linux-trace-kernel/aEmWyPqQw2Ly7Jlu@x1/
> [truncated]
Tomas
^ permalink raw reply
* Re: [PATCHv4 05/13] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Jiri Olsa @ 2026-06-29 10:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Ingo Molnar, Masami Hiramatsu, Andrii Nakryiko,
bpf, linux-trace-kernel
In-Reply-To: <aj5JuH6AtjDtWIVU@redhat.com>
On Fri, Jun 26, 2026 at 11:43:20AM +0200, Oleg Nesterov wrote:
> On 05/26, Jiri Olsa wrote:
> >
> > which means we need to allow 0x2e prefix which maps to INAT_PFX_CS
> > attribute in is_prefix_bad function.
>
> ...
>
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -266,7 +266,6 @@ static bool is_prefix_bad(struct insn *insn)
> > attr = inat_get_opcode_attribute(p);
> > switch (attr) {
> > case INAT_MAKE_PREFIX(INAT_PFX_ES):
> > - case INAT_MAKE_PREFIX(INAT_PFX_CS):
>
> I know nothing about how x86 CPU works, so let me ask...
>
> What if insn->x86_64 is false? Is it safe to allow the CS prefix in
> this case?
>
> Oleg.
>
hum, right.. I think we could make it x86_64 specific
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 04cd2cdce8c8..de60ec1eeee7 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -265,6 +265,10 @@ static bool is_prefix_bad(struct insn *insn)
attr = inat_get_opcode_attribute(p);
switch (attr) {
+ case INAT_MAKE_PREFIX(INAT_PFX_CS):
+ if (insn->x86_64)
+ break;
+ fallthrough;
case INAT_MAKE_PREFIX(INAT_PFX_ES):
case INAT_MAKE_PREFIX(INAT_PFX_DS):
case INAT_MAKE_PREFIX(INAT_PFX_SS):
or we could just skip it for nop10.. maybe that's better
jirka
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 04cd2cdce8c8..21f26e6fd452 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -285,7 +285,7 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
if (ret < 0)
return -ENOEXEC;
- if (is_prefix_bad(insn))
+ if (!is_optimizable_nop10(insn) && is_prefix_bad(insn))
return -ENOTSUPP;
/* We should not singlestep on the exception masking instructions */
^ permalink raw reply related
* Re: [PATCH] trace_branch: use per-cpu counters for correct/incorrect stats
From: Steven Rostedt @ 2026-06-29 10:47 UTC (permalink / raw)
To: Martin Weiss
Cc: linux-kernel, Martin Weiss, Masami Hiramatsu, Mathieu Desnoyers,
linux-trace-kernel
In-Reply-To: <20260629095838.601926-1-Martin.weiss2410@gmail.com>
On Mon, 29 Jun 2026 16:58:38 +0700
Martin Weiss <martin.githubacc@gmail.com> wrote:
> Replace per-task counters with per-cpu increments to avoid race
> conditions in the branch profiler fast path.
>
> Fixes FIXME about atomicity.
>
> Signed-off-by: Martin Weiss <Martin.weiss2410@gmail.com>
> ---
> kernel/trace/trace_branch.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
> index d8e97ad798f0..960bcb3d7dbf 100644
> --- a/kernel/trace/trace_branch.c
> +++ b/kernel/trace/trace_branch.c
> @@ -213,11 +213,11 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> */
> trace_likely_condition(f, val, expect);
>
> - /* FIXME: Make this atomic! */
> + /* use per-cpu counters to avoid contention */
> if (val == expect)
> - f->data.correct++;
> + this_cpu_inc(f->data.correct);
> else
> - f->data.incorrect++;
> + this_cpu_inc(f->data.incorrect);
I don't think this does what you think it does.
Did you even test this? I'm guessing it would blow up in some fantastic
ways.
NAK.
-- Steve
>
> user_access_restore(flags);
> }
^ permalink raw reply
* Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation
From: Vlastimil Babka (SUSE) @ 2026-06-29 10:03 UTC (permalink / raw)
To: Lorenzo Stoakes, Rik van Riel
Cc: linux-kernel, kernel-team, linux-mm, david, willy, surenb, hannes,
ziy, usama.arif, fvdl, Andrew Morton, Jonathan Corbet,
Chris Mason, David Sterba, Steven Rostedt, Masami Hiramatsu,
Rafael J. Wysocki, Oscar Salvador, Mike Rapoport, linux-doc,
linux-btrfs, linux-trace-kernel, linux-pm, linux-cxl,
Linus Torvalds
In-Reply-To: <akIjA_dqh4OHAYo4@lucifer>
On 6/29/26 11:29, Lorenzo Stoakes wrote:
> TL;DR - please don't send unfiltered LLM code to list _at all_. If you want
> to share it, link to a repo.
>
> On Sat, Jun 27, 2026 at 09:36:51AM -0400, Rik van Riel wrote:
>> That is the one reason I sent out RFC code before it
>> is ready. I am looking for feedback on the concepts
>> in this series.
> ...
>> Once I know what I need to do, coming up with a
>> cleaner implementation is very doable.
> ...
>> The mess in the RFC is the result of trying something
>> that seemed right, watching it fail in some subtle
>> way, and trying to fix it up.
> ...
>> > But the execution has to be _completely_ rethought.
>>
>> There's no argument there.
> ...
>> > Another issue here is maintainer time - even this _extremely_ light-
>> > touch
>> > review has taken me a few hours (of my weekend :). To review it in
>> > detail
>> > would take probably DAYS of dedicated work.
>>
>> I suspect there is a mismatch in expectations here.
>>
>> I already knew this code has to be totally redone.
>
> I'm glad we are in agreement on this :)
>
> But in general I feel you have sent this and at least one other series like this
> without being as clear as you should have been.
>
> I hate to belabour the point but just to be clear:
>
> * You label one patch [DO-NOT-MERGE], but none of the others (implying they
> are candidates for being merged) [0] and the cover letter has TODOs,
> including trivia like naming, but nothing about the code.
>
> * You sent a non-RFC series with identical code quality issues [1]
> recently.
>
> * Until I pointed it out, you were responding to other review here as if
> the series was genuinely was intended for (eventual) merge:
>
> - "This is a userspace-visible removal. Writes to
> /proc/sys/vm/watermark_boost_factor will now return -ENOENT instead of
> being accepted, breaking userspace." [2]
>
> <-: "I'll just drop this patch for now." [3]
>
> - "I left a small code nit inline, but whether you take that suggestion
> or leave it, you can add Reviewed-by: ..." [4]
>
> <-: "I sent it with this series mostly because it's needed to make the
> series work, and to provide context on why it's needed. I'm happy to
> resend it with a GFP mask passed in by each caller. That would look
> better, indeed!" [5]
>
> So to be concrete, if you send really rough code, Use [pre-RFC] or [DO NOT
> MERGE] (on the series as a whole) to make that clear and say so in the
> cover letter VERY VERY clearly.
Yes please. [POC NOT-FOR-MERGE] perhaps?
> Or, you can put it in a repo somewhere and link it in an email discussing
> the concepts (like I did with scalable CoW for instance).
Indeed.
> As above, firstly make it clear that the code you are sending for review is
> not to be reviewed so people don't waste highly contended maintainer time
> on that! :)
>
> Also, you didn't respond to my point regarding cc'ing the right people -
> but that's clearly something you need to get right if you want this kind of
> feedback to start with.
>
> For instance, you didn't cc- the page allocator maintainer (Vlastimil) on a
> series that is fundamentally changing the page allocator. That's not going
> to help with feedback.
Right! Thanks a lot for adding me, Lorenzo.
> In general, this area of the page allocator and compaction isn't my
> specialism in the kernel so I can't give you the in-depth feedback you need
> on that.
>
> But I do have thoughts in general as to how to achieve what you want here:
>
> Firstly - you should try to summarise what you're doing here and what
> you're changing alongside the trade-offs as clearly as you can in the cover
> letter.
>
> Then highlight what it is you need feedback on, broken out into clear
> questions or points that make it easy for people to respond to.
Yep.
> And _you have already done this_ in your reply here:
>
> * "How do people feel about splitting up the free lists, so each gigabyte
> (well, PUD sized) chunk of memory has its own free lists?"
My immediate response is that now we'd need to search multiple sets of lists
instead of a single one? What about the overhead?
Having a POC (even vibe-coded) for measuring that overhead might be actually
useful to quickly figure out whether the idea is viable or not.
But then the code doesn't need to be sent as a huge series if it's not for
review. As Lorenzo said, git repo link is enough.
> * "How can we balance the desire for higher-order kernel allocations,
> against the desire to preserve gigabyte sized chunks of memory that can
> be used for user space?"
>
> * "How do we balance the desire to keep compaction overhead low with the
> desire to do higher order allocations almost everywhere?"
How can we have a cake and eat it too? :)
> I think a really good way of doing this would be to start out with
> something like:
>
> Right now compaction often fails to achieve what we need, with
> fragmentation occurring anyway and (for instance) THP stalling on
> the availability of higher order folios.
>
> etc. etc.
>
> Summarising _the problem_.
>
> Then a section about your proposed solution, e.g.:
>
> I propose a means by which we proactively achieve gigabyte-sized
> pageblocks with logic which maintains these as physically
> contiguous under both ordinary and contended workloads
>
> Then list out the "secret sauce" of your approach, e.g.:
>
> This works by arranging memory such that unmovable allocations are
> grouped at <blah blah blah> etc.
>
> Then raise your questions e.g.:
>
> I'd like to ask the community - how do people feel about splitting
> up the free lists, so each gigabyte (well, PUD sized) chunk of
> memory has its own free lists? <etc. etc.>
>
> Then make it clear whether this is an RFC that is ready for primetime or
> not:
>
> This series is simply intended as a proof-of-concept - PLEASE DO
> NOT REVIEW THE CODE per-se, but rather comment on the concepts!
>
> (And obviously as above, if that _is_ what you intend, underline it with
> [DO NOT MERGE] or [pre-RFC] or something like that).
Ack.
> I'd also very strongly suggest (as I did in my original reply) breaking out
> parts that can be broken out as prerequisite series.
>
> If you're doing something good or useful _anyway_ then just send that
> separately first, and have later work rely on the earlier work.
Ack.
> There's no rush, this is huge and will take time.
>
> A final KEY point:
>
> NEVER submit unfiltered code generated by an LLMs to the list in _any_
> form. If you want people to access code like that to test or something,
> then put it in a remote repo and link to it.
>
> The code is SO overly complicated and SO messy that it's really difficult
> for people to understand what's actually going on.
>
> At the heart of what you need here is CLARITY.
>
> You need to CLEARLY communicate what it is you're doing so busy maintainers
> can examine it. That's the _only_ way you're going to get something like
> this merged.
>
> The LLM-generated code is so awful that ain't nobody got the time to try to
> understand what it's doing.
Indeed.
> The workload for this really has to be on submitters, not maintainers.
>
> And what you've done, even if not intended, is workslopping, and that's
> really not acceptable. Quoting the kernel process on tool-generated content
> [6]:
>
> "If tools permit you to generate a contribution automatically, expect
> additional scrutiny in proportion to how much of it was generated.
>
> As with the output of any tooling, the result may be incorrect or
> inappropriate. You are expected to understand and to be able to defend
> everything you submit. If you are unable to do so, then do not submit the
> resulting changes.
>
> If you do so anyway, maintainers are entitled to reject your series without
> detailed review."
>
> As per this and my previous reply, AI slop doesn't scale, even as an RFC -
> I won't have time to reply like this in future, and we will just have to
> reject your series out of hand, which helps nobody.
True. Thanks a lot for going out of your way on this!
>>
>>
>> --
>> All Rights Reversed.
>
> Thanks, Lorenzo
>
> [0]:https://lore.kernel.org/all/20260520150018.2491267-41-riel@surriel.com/
> [1]:https://lore.kernel.org/linux-mm/20260616190300.1509639-1-riel@surriel.com/
> [2]:https://lore.kernel.org/all/20260526140204.1390573-1-usama.arif@linux.dev/
> [3]:https://lore.kernel.org/all/2ecf71858845e7d14c718b1a6845389cb78b986e.camel@surriel.com/
> [4]:https://lore.kernel.org/all/20260520174749.GA1458531@zen.localdomain/
> [5]:https://lore.kernel.org/all/daa29c92f055d028a5b3ec0e42cfb1ee1496a593.camel@surriel.com/
> [6]:https://docs.kernel.org/process/generated-content.html
^ permalink raw reply
* [PATCH] trace_branch: use per-cpu counters for correct/incorrect stats
From: Martin Weiss @ 2026-06-29 9:58 UTC (permalink / raw)
To: linux-kernel
Cc: Martin Weiss, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-trace-kernel
Replace per-task counters with per-cpu increments to avoid race
conditions in the branch profiler fast path.
Fixes FIXME about atomicity.
Signed-off-by: Martin Weiss <Martin.weiss2410@gmail.com>
---
kernel/trace/trace_branch.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index d8e97ad798f0..960bcb3d7dbf 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -213,11 +213,11 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
*/
trace_likely_condition(f, val, expect);
- /* FIXME: Make this atomic! */
+ /* use per-cpu counters to avoid contention */
if (val == expect)
- f->data.correct++;
+ this_cpu_inc(f->data.correct);
else
- f->data.incorrect++;
+ this_cpu_inc(f->data.incorrect);
user_access_restore(flags);
}
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v8 23/46] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
From: Yan Zhao @ 2026-06-29 9:40 UTC (permalink / raw)
To: Ackerley Tng
Cc: Sean Christopherson, aik, andrew.jones, binbin.wu, brauner,
chao.p.peng, david, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <CAEvNRgHb6WmOha6Pct_Tn8Ucuov95L=fj5=2R9gcHfx=b2V_+A@mail.gmail.com>
On Fri, Jun 26, 2026 at 08:28:32AM -0700, Ackerley Tng wrote:
> Yan Zhao <yan.y.zhao@intel.com> writes:
>
> > On Thu, Jun 25, 2026 at 05:07:23PM -0700, Ackerley Tng wrote:
> >> Yan Zhao <yan.y.zhao@intel.com> writes:
> >>
> >> > On Wed, Jun 24, 2026 at 04:00:32PM -0700, Ackerley Tng wrote:
> >> >> Sean Christopherson <seanjc@google.com> writes:
> >> >>
> >> >> > On Tue, Jun 23, 2026, Yan Zhao wrote:
> >> >> >> On Tue, Jun 23, 2026 at 01:16:14PM +0800, Yan Zhao wrote:
> >> >> >> > On Mon, Jun 22, 2026 at 06:22:45PM -0700, Sean Christopherson wrote:
> >> >> >> > > On Mon, Jun 22, 2026, Yan Zhao wrote:
> >> >> >> > > > On Thu, Jun 18, 2026 at 05:32:00PM -0700, Ackerley Tng via B4 Relay wrote:
> >> >> >> > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> >> >> >> > > > > index ffe9d0db58c59..56d10333c61a7 100644
> >> >> >> > > > > --- a/arch/x86/kvm/vmx/tdx.c
> >> >> >> > > > > +++ b/arch/x86/kvm/vmx/tdx.c
> >> >> >> > > > > @@ -3198,8 +3198,12 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> >> >> >> > > > > if (KVM_BUG_ON(kvm_tdx->page_add_src, kvm))
> >> >> >> > > > > return -EIO;
> >> >> >> > > > >
> >> >> >> > > > > - if (!src_page)
> >> >> >> > > > > - return -EOPNOTSUPP;
> >> >> >> > > > > + if (!src_page) {
> >> >> >> > > > > + if (!gmem_in_place_conversion)
> >> >> >> > > > When userspace turns on gmem_in_place_conversion while creating guest_memfd
> >> >> >> > > > without the MMAP flag, the absence of src_page should still be treated as an
> >> >> >> > > > error.
> >> >> >> > >
> >> >> >> > > Why MMAP?
> >> >> >> > Hmm, I was showing a scenario that in-place conversion couldn't occur.
> >> >> >> > I didn't mean that with the MMAP flag, mmap() and user write must occur.
> >> >> >> >
> >> >> >> > > Shouldn't this be a general "if (!src_page && !up-to-date)"? Just
> >> >> >> > > because userspace _can_ mmap() the memory doesn't mean userspace _has_ mmap()'d
> >> >> >> > > and written memory. And when write() lands, MMAP wouldn't be necessary to
> >> >> >> > > initialize the memory.
> >> >> >> > Do you mean using up-to-date flag as below?
> >> >> >
> >> >> > Yes? I didn't actually look at the implementation details.
> >> >> >
> >> >> >> > if (!src_page) {
> >> >> >> > src_page = pfn_to_page(pfn);
> >> >> >> > if (!folio_test_uptodate(page_folio(src_page)))
> >> >> >> > return -EOPNOTSUPP;
> >> >> >> > }
> >> >>
> >> >> Yan is right that with the earlier patch "Zero page while getting pfn",
> >> >> folio_test_uptodate() here will always return true.
> >> >>
> >> >> Actually, this is an alternative fix for the issue Sashiko pointed out
> >> >> on v7 where userspace can do a populate() (either TDX or SNP) without
> >> >> first allocating the page, with src_address == NULL, and leak
> >> >> uninitialized memory into the guest.
> >> >>
> >> >> Advantage of using the uptodate check in populate: if the host never
> >> >> allocates the page, populate doesn't incur zeroing before writing the
> >> >> page anyway in populate().
> >> >>
> >> >> Disadvantage: Both TDX and SNP will have to implement this uptodate
> >> >> check. guest_memfd can't check centrally because for SNP, for a
> >> >> PAGE_TYPE_ZERO, !src_page should be allowed with a !uptodate page since
> >> >> firmware will zero and there's no leakage of uninitialized host memory?
> >> > Another disadvantage: the uptodate flag is per-folio. What if the folio
> >> > is only partially initialized by the userspace especially after huge page is
> >> > supported?
> >> >
> >>
> >> Good point on huge pages!
> >>
> >> The uptodate flag on the folio in guest_memfd means "this folio has been
> >> written to". As of now (before patch at [1]), this happens when
> >>
> >> + folio is zeroed on first use by userspace
> >> + folio is zeroed on first use of the guest
> >> + folio is populated
> >>
> >> When huge pages are supported, the folio can't partially be initialized?
> >>
> >> On allocation, if any part is shared, we split the page. The parts are
> >> separate folios that have their own uptodate flags.
> >>
> >> On splitting, if the huge page is uptodate, the split pages will also be
> >> uptodate. If the huge page is not uptodate, the split pages won't be
> >> uptodate, but that's ok since they will be marked uptodate on first use.
> >>
> >> On merging, the non-uptodate parts have to be zeroed and then marked
> > If that's true, it would be good.
> >
> >> uptodate. Any parts that are in use would have been marked uptodate
> >> already, so there's no overwriting data that is in use. I'll need to
> >> think more about when it's safe to zero.
> >>
> >> I'm still on the fence between the two options
> >>
> >> 1. Using uptodate check in populate to reject src_pages that have never
> >> been written to or
> >> 2. Always zero before populate
> > 2 does not work?
> > The flow is
> > 1. mmap gmem_fd, make GFN shared, and write initial content.
> > 2. convert GFN to private
> > 3. invoke ioctl to trigger populate.
> >
>
> This flow is correct, is what users of in-place conversion should do.
>
> "Always" is the wrong word, I should have said "zero if not uptodate
> before populate", as in, with patch at [1].
>
> By doing the zeroing in __kvm_gmem_get_pfn instead, by the time populate
> gets the pfn, the page would be zeroed, either because userspace faulted
> it in, and the zeroing happened in kvm_gmem_fault_user_mapping(), or if
> userspace never faulted it in, the zeroing would happen because
> populate() allocated the page.
I see.
> >> but whether the uptodate flag is per-folio or not doesn't affect these
> >> two options in terms of fixing the leak of uninitialized host memory,
> >> right?
> > yes, provided "On merging, the non-uptodate parts have to be zeroed and then
> > marked uptodate".
> >
>
> Thank you so much for bringing this up, I hadn't considered this
> before. I'll do that when I get to guest_memfd hugepage restructuring.
>
> >> >
> >> >> >> Another concern with this fix is that:
> >> >> >> commit "KVM: guest_memfd: Zero page while getting pfn" [1] always marks the
> >> >> >> folio uptodate before reaching post_populate().
> >> >> >>
> >> >> >> [1] https://lore.kernel.org/all/20260618-gmem-inplace-conversion-v8-21-9d2959357853@google.com/
> >> >> >>
> >> >> >> > One concern is that TDX now does not much care about the up-to-date flag since
> >> >> >> > TDX doesn't rely on the flag to clear pages on conversions.
> >> >> >> > I'm not sure if the flag can be reliably checked in this case. e.g.,
> >> >> >> > now the whole folio is marked up-to-date even if only part of it is faulted by
> >> >> >> > user access.
> >> >> >> > Ensuring that the up-to-date flag works correctly with huge page support seems
> >> >> >> > to have more effort than introducing a dedicated flag for TDX.
> >> >> >> >
> >> >> >> > > > Additionally, to properly enable in-place copying for the TDX initial memory
> >> >> >> > > > region, userspace must not only specify source_addr to NULL, but also follow
> >> >> >> > > > a specific sequence (where steps 1/2/3/7 are required only for in-place copy):
> >> >> >> > > > 1. create guest_memfd with MMAP flag
> >> >> >> > > > 2. mmap the guest_memfd.
> >> >> >> > > > 3. convert the initial memory range to shared.
> >> >> >> > > > 4. copy initial content to the source page.
> >> >> >> > > > 5. convert the initial memory range to private
> >> >> >> > > > 6. invoke ioctl KVM_TDX_INIT_MEM_REGION.
> >> >> >> > > > 7. do not unmap the source backend.
> >> >> >> > > >
> >> >> >> > > > So, would it be reasonable to introduce a dedicated flag that allows userspace
> >> >> >> > > > to explicitly opt into the in-place copy functionality? e.g.,
> >> >> >> > >
> >> >> >> > > Why? It's userspace's responsibility to get the above right. If userspace fails
> >> >> >> > > to provide a src_page when it doesn't want in-place copy, that's a userspace bug.
> >> >>
> >> >> Yan, is your concern that userspace forgot to update the code and
> >> >> forgets to provide a src_page, and if we keep the "Zero page while
> >> > Yes. Previously, it would be rejected after GUP fails.
> >> >
> >>
> >> I see, didn't realize previously it would be rejected because GUP
> >> fails. GUP failed because it wasn't faulted into the host?
> > GUP fails if 0 is not a valid user address.
> > But GUP would not fail if 0 is a valid address. e.g., in below scenario:
> >
> > #include <sys/mman.h>
> > #include <stdio.h>
> > int main(void)
> > {
> > void *p=mmap((void*)0,4096,PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,-1,0);
> > if (p==MAP_FAILED) {
> > perror("mmap");
> > return 1;
> > }
> > *(char*)0='Y';
> > printf("addr0=%p val=%c\n",p,*(char*)0);
> > return 0;
> > }
> >
> >
> >> That's kind of orthogonal, I don't think GUP fail leading to rejecting
> >> populate was meant to help userspace catch these issues. GUP would also
> >> fail if the user did mmap(), write to it, unmap using
> >> madvise(MADV_DONTNEED), then forget and pass 0 as src_address.
> > The original uAPI did not explicitly define 0 as an invalid uaddr. Whether 0 was
> > rejected depended on whether the user mmap()'d address 0. If 0 was a valid
> > mapping, populate() could proceed.
> >
> > commit 2a62345b3052 ("KVM: guest_memfd: GUP source pages prior to populating
> > guest memory") changed the behavior though. It would return -EOPNOTSUPP for a 0
> > uaddr.
> >
>
> I see, I only looked at this after commit 2a62345b3052.
>
> > But if a user configures 0 uaddr as valid, writes to it, and then passes 0 as
> > source_addr(not from gmem), I'm not sure if it's good for the kernel to silently
> > treat 0 uaddr as an identifier for in-place copy from the private PFN in gmem.
> >
>
> I'd say the original uAPI perhaps just didn't document 0 as an
> unsupported uaddr. Given that commit 2a62345b3052 already merged, uAPI
> was perhaps accidentally changed and no customer complained, I think we
> can move forward with 0 as an invalid src_address? I wouldn't think
> anyone relies on 0 intentionally being a valid address.
>
> I could document that, if it helps?
What about just documenting that 0 is an unsupported uaddr which will be
re-purposed as an indicator to use the target pfn as the source, regardless of
whether gmem_in_place_conversion is true? i.e.,
if (!src_page)
src_page = pfn_to_page(pfn);
I don't get why the two scenarios should be treated differently:
1. gmem_in_place_conversion==true, shared memory is not from gmem
2. gmem_in_place_conversion==false, shared memory is not from gmem
In both case, a 0 uaddr could be mapped to a valid page not from gmem.
So why not update the uAPI to handle both cases consistently? :)
> >> >> getting pfn" patch, ends up with the guest silently having a zero page?
> >> >> I think that would be found quite early in userspace VMM testing...
> >> > I actually encountered this during testing this patch.
> >> > I update most code path to follow this sequence. However, still some corner ones
> >> > for TDVF HOB, which are less obvious and harder to update.
> >> > The TD just booted up and hang silently.
> >> >
> >>
> >> I think this is just the life of a close-to-hardware software engineer
> >> :P no errors, got stuck somewhere, root cause is some unitialized
> >> thing.
> >>
> >> >> >> > I mean if userspace specifies a NULL source_addr by mistake, it's better for
> >> >> >> > kernel to detect this mistake, similar to how it validates whether source_addr
> >> >> >> > is PAGE_ALIGNED.
> >> >> >
> >> >> > The alignment case is different. If userspace provides an unaligned value, KVM
> >> >> > *can't* do what userspace is asking because hardware and thus KVM only supports
> >> >> > converting on page boundaries.
> >> >> >
> >> >> > For a NULL source, KVM can still do what userspace is asking. Rejecting userspace's
> >> >> > request would then be making assumptions about what userspace wants.
> >> >> >
> >> >>
> >> >> Also, +1 on this, what if userspace, knowing that pages are zeroed on
> >> >> allocation, actually wants to rely on that to get a zero page in the guest?
> >> > What if 0 uaddr is a valid address? :)
> >> >
> >> >> >> > Since userspace already needs to perform additional steps to enable in-place
> >> >> >> > copy, specifying a dedicated flag to indicate that the NULL source_addr is
> >> >> >> > intentional seems like a reasonable burden.
> >> >> >
> >> >> > I don't see how it adds any value. I wouldn't be at all surprised if most VMMs
> >> >> > just wen up with code that does:
> >> >> >
> >> >> > if (in-place) {
> >> >> > src = NULL;
> >> >> > flags |= KVM_TDX_IN_PLACE_COPY_INITIAL_MEMORY_REGION;
> >> >> > }
> >> >>
^ permalink raw reply
* Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation
From: Lorenzo Stoakes @ 2026-06-29 9:29 UTC (permalink / raw)
To: Rik van Riel
Cc: linux-kernel, kernel-team, linux-mm, david, willy, surenb, hannes,
ziy, usama.arif, fvdl, Andrew Morton, Jonathan Corbet,
Chris Mason, David Sterba, Vlastimil Babka, Steven Rostedt,
Masami Hiramatsu, Rafael J. Wysocki, Oscar Salvador,
Mike Rapoport, linux-doc, linux-btrfs, linux-trace-kernel,
linux-pm, linux-cxl, Linus Torvalds
In-Reply-To: <528e3a5fbc27c9dc7a098121c32b7679b4c9962a.camel@surriel.com>
TL;DR - please don't send unfiltered LLM code to list _at all_. If you want
to share it, link to a repo.
On Sat, Jun 27, 2026 at 09:36:51AM -0400, Rik van Riel wrote:
> That is the one reason I sent out RFC code before it
> is ready. I am looking for feedback on the concepts
> in this series.
...
> Once I know what I need to do, coming up with a
> cleaner implementation is very doable.
...
> The mess in the RFC is the result of trying something
> that seemed right, watching it fail in some subtle
> way, and trying to fix it up.
...
> > But the execution has to be _completely_ rethought.
>
> There's no argument there.
...
> > Another issue here is maintainer time - even this _extremely_ light-
> > touch
> > review has taken me a few hours (of my weekend :). To review it in
> > detail
> > would take probably DAYS of dedicated work.
>
> I suspect there is a mismatch in expectations here.
>
> I already knew this code has to be totally redone.
I'm glad we are in agreement on this :)
But in general I feel you have sent this and at least one other series like this
without being as clear as you should have been.
I hate to belabour the point but just to be clear:
* You label one patch [DO-NOT-MERGE], but none of the others (implying they
are candidates for being merged) [0] and the cover letter has TODOs,
including trivia like naming, but nothing about the code.
* You sent a non-RFC series with identical code quality issues [1]
recently.
* Until I pointed it out, you were responding to other review here as if
the series was genuinely was intended for (eventual) merge:
- "This is a userspace-visible removal. Writes to
/proc/sys/vm/watermark_boost_factor will now return -ENOENT instead of
being accepted, breaking userspace." [2]
<-: "I'll just drop this patch for now." [3]
- "I left a small code nit inline, but whether you take that suggestion
or leave it, you can add Reviewed-by: ..." [4]
<-: "I sent it with this series mostly because it's needed to make the
series work, and to provide context on why it's needed. I'm happy to
resend it with a GFP mask passed in by each caller. That would look
better, indeed!" [5]
So to be concrete, if you send really rough code, Use [pre-RFC] or [DO NOT
MERGE] (on the series as a whole) to make that clear and say so in the
cover letter VERY VERY clearly.
Or, you can put it in a repo somewhere and link it in an email discussing
the concepts (like I did with scalable CoW for instance).
Also if people respond to the series as if it isn't pre-RFC, I'd suggest in
your replies saying something like 'I intend to completely rework all this
anyway' or something like that! :)
> How do people feel about splitting up the free lists,
> so each gigabyte (well, PUD sized) chunk of memory
> has its own free lists?
>
> How can we balance the desire for higher-order kernel
> allocations, against the desire to preserve gigabyte
> sized chunks of memory that can be used for user space?
...
> That's another big question. How do we balance the
> desire to keep compaction overhead low with the desire
> to do higher order allocations almost everywhere?
>
> >
...
>
> I am just hoping to figure out what I should be
> doing on a conceptual level, before figuring out
> how to do it cleanly.
>
...
>
> I was looking for feedback on the basic concepts
> and design in the patch series, but failed to
> clearly communicate that.
>
> You provided some detailed feedback on the code,
> but as of yet nobody has really provided any
> opinions on things like whether it is desirable
> at all to have the free lists per gigablock,
> or whether we need to come up with some totally
> different approach.
>
> How do we better communicate that kind of thing
> in the future?
>
> Is that something to spell out more clearly in
> the cover letter?
>
> Is that kind of feedback something developers
> could even reasonably ask for? (if not, how do
> we figure out what maintainers want?)
As above, firstly make it clear that the code you are sending for review is
not to be reviewed so people don't waste highly contended maintainer time
on that! :)
Also, you didn't respond to my point regarding cc'ing the right people -
but that's clearly something you need to get right if you want this kind of
feedback to start with.
For instance, you didn't cc- the page allocator maintainer (Vlastimil) on a
series that is fundamentally changing the page allocator. That's not going
to help with feedback.
In general, this area of the page allocator and compaction isn't my
specialism in the kernel so I can't give you the in-depth feedback you need
on that.
But I do have thoughts in general as to how to achieve what you want here:
Firstly - you should try to summarise what you're doing here and what
you're changing alongside the trade-offs as clearly as you can in the cover
letter.
Then highlight what it is you need feedback on, broken out into clear
questions or points that make it easy for people to respond to.
And _you have already done this_ in your reply here:
* "How do people feel about splitting up the free lists, so each gigabyte
(well, PUD sized) chunk of memory has its own free lists?"
* "How can we balance the desire for higher-order kernel allocations,
against the desire to preserve gigabyte sized chunks of memory that can
be used for user space?"
* "How do we balance the desire to keep compaction overhead low with the
desire to do higher order allocations almost everywhere?"
I think a really good way of doing this would be to start out with
something like:
Right now compaction often fails to achieve what we need, with
fragmentation occurring anyway and (for instance) THP stalling on
the availability of higher order folios.
etc. etc.
Summarising _the problem_.
Then a section about your proposed solution, e.g.:
I propose a means by which we proactively achieve gigabyte-sized
pageblocks with logic which maintains these as physically
contiguous under both ordinary and contended workloads
Then list out the "secret sauce" of your approach, e.g.:
This works by arranging memory such that unmovable allocations are
grouped at <blah blah blah> etc.
Then raise your questions e.g.:
I'd like to ask the community - how do people feel about splitting
up the free lists, so each gigabyte (well, PUD sized) chunk of
memory has its own free lists? <etc. etc.>
Then make it clear whether this is an RFC that is ready for primetime or
not:
This series is simply intended as a proof-of-concept - PLEASE DO
NOT REVIEW THE CODE per-se, but rather comment on the concepts!
(And obviously as above, if that _is_ what you intend, underline it with
[DO NOT MERGE] or [pre-RFC] or something like that).
I'd also very strongly suggest (as I did in my original reply) breaking out
parts that can be broken out as prerequisite series.
If you're doing something good or useful _anyway_ then just send that
separately first, and have later work rely on the earlier work.
There's no rush, this is huge and will take time.
A final KEY point:
NEVER submit unfiltered code generated by an LLMs to the list in _any_
form. If you want people to access code like that to test or something,
then put it in a remote repo and link to it.
The code is SO overly complicated and SO messy that it's really difficult
for people to understand what's actually going on.
At the heart of what you need here is CLARITY.
You need to CLEARLY communicate what it is you're doing so busy maintainers
can examine it. That's the _only_ way you're going to get something like
this merged.
The LLM-generated code is so awful that ain't nobody got the time to try to
understand what it's doing.
The workload for this really has to be on submitters, not maintainers.
And what you've done, even if not intended, is workslopping, and that's
really not acceptable. Quoting the kernel process on tool-generated content
[6]:
"If tools permit you to generate a contribution automatically, expect
additional scrutiny in proportion to how much of it was generated.
As with the output of any tooling, the result may be incorrect or
inappropriate. You are expected to understand and to be able to defend
everything you submit. If you are unable to do so, then do not submit the
resulting changes.
If you do so anyway, maintainers are entitled to reject your series without
detailed review."
As per this and my previous reply, AI slop doesn't scale, even as an RFC -
I won't have time to reply like this in future, and we will just have to
reject your series out of hand, which helps nobody.
>
>
> --
> All Rights Reversed.
Thanks, Lorenzo
[0]:https://lore.kernel.org/all/20260520150018.2491267-41-riel@surriel.com/
[1]:https://lore.kernel.org/linux-mm/20260616190300.1509639-1-riel@surriel.com/
[2]:https://lore.kernel.org/all/20260526140204.1390573-1-usama.arif@linux.dev/
[3]:https://lore.kernel.org/all/2ecf71858845e7d14c718b1a6845389cb78b986e.camel@surriel.com/
[4]:https://lore.kernel.org/all/20260520174749.GA1458531@zen.localdomain/
[5]:https://lore.kernel.org/all/daa29c92f055d028a5b3ec0e42cfb1ee1496a593.camel@surriel.com/
[6]:https://docs.kernel.org/process/generated-content.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox