* [PATCH] samples/ftrace: Prevent division by zero when nr_function_calls is zero
From: Samuel Moelius @ 2026-06-29 15:26 UTC (permalink / raw)
To: Steven Rostedt
Cc: Samuel Moelius, Masami Hiramatsu, Mark Rutland,
open list:FUNCTION HOOKS (FTRACE),
open list:FUNCTION HOOKS (FTRACE)
The ftrace-ops sample exposes nr_function_calls as a module parameter
and uses it as the divisor when printing the measured time per call.
Loading the module with nr_function_calls=0 skips the benchmark loop and
then divides the elapsed time by zero, crashing the kernel during sample
module initialization.
Keep accepting the parameter value, but report -1LL as the per-call
duration when the call count is zero instead of dividing by it.
Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
---
samples/ftrace/ftrace-ops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/ftrace/ftrace-ops.c b/samples/ftrace/ftrace-ops.c
index 68d6685c80bd..e6c07da407cc 100644
--- a/samples/ftrace/ftrace-ops.c
+++ b/samples/ftrace/ftrace-ops.c
@@ -223,7 +223,7 @@ static int __init ftrace_ops_sample_init(void)
pr_info("Attempted %u calls to %ps in %lluns (%lluns / call)\n",
nr_function_calls, tracee_relevant,
- period, div_u64(period, nr_function_calls));
+ period, nr_function_calls ? div_u64(period, nr_function_calls) : -1LL);
if (persist)
return 0;
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] samples/ftrace: reject zero ftrace-ops call count
From: Samuel Moelius @ 2026-06-29 15:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mark Rutland, open list:FUNCTION HOOKS (FTRACE),
open list:FUNCTION HOOKS (FTRACE)
In-Reply-To: <20260610200336.3c4d32c3@robin>
On Wed, Jun 10, 2026 at 8:03 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 9 Jun 2026 07:26:27 -0400
> Samuel Moelius <sam.moelius@trailofbits.com> wrote:
>
> > Is it okay to keep the same subject line or should I change it?
>
> Yeah, and also note that the tracing subsystem uses capital letters:
>
> samples/ftrace: Reject zero ftrace-ops call count
>
> But you can change it to:
>
> samples/ftrace: Prevent division by zero when nr_function_calls is zero
I will submit a new patch with that subject line.
^ permalink raw reply
* Re: [PATCH 19/30] mm: use linear_page_[index, delta]() consistently
From: Lorenzo Stoakes @ 2026-06-29 14:56 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Rob Clark, Dmitry Baryshkov, Tomi Valkeinen, Thierry Reding,
Mikko Perttunen, Jonathan Hunter, Christian Koenig, Huang Rui,
Ankit Agrawal, Alex Williamson, Alexander Viro, Christian Brauner,
Dan Williams, Muchun Song, Oscar Salvador, David Hildenbrand,
Suren Baghdasaryan, Liam R . Howlett, Matthew Wilcox,
Marek Szyprowski, Peter Zijlstra, Arnaldo Carvalho de Melo,
Namhyung Kim, Masami Hiramatsu, Oleg Nesterov, Steven Rostedt,
SeongJae Park, Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook,
Paolo Bonzini, linux-kernel, linux-arm-kernel, linux-parisc,
linux-sgx, etnaviv, dri-devel, linux-arm-msm, freedreno,
linux-tegra, kvm, linux-fsdevel, nvdimm, linux-mm, iommu,
linux-perf-users, linux-trace-kernel, kasan-dev, damon,
Pedro Falcato, Rik van Riel, Harry Yoo, Jann Horn
In-Reply-To: <21c4d96a-cd1b-4c65-8a66-2223df3b6109@suse.de>
On Mon, Jun 29, 2026 at 03:56:33PM +0200, Thomas Zimmermann wrote:
> Hi
>
> Am 29.06.26 um 14:23 schrieb Lorenzo Stoakes:
> > There are a number of places where we open code what linear_page_index()
> > and linear_page_delta() calculate.
> >
> > Replace this code with the appropriate functions for consistency.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
>
> For the DRM changes:
>
> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Thanks!
>
> See below for two additional comments.
>
>
> > ---
> > arch/arm/mm/fault-armv.c | 2 +-
> > arch/x86/kernel/cpu/sgx/virt.c | 3 ++-
> > drivers/comedi/comedi_fops.c | 3 ++-
> > drivers/gpu/drm/etnaviv/etnaviv_gem.c | 3 ++-
> > drivers/gpu/drm/gma500/gem.c | 2 +-
> > drivers/gpu/drm/msm/msm_gem.c | 3 ++-
> > drivers/gpu/drm/omapdrm/omap_gem.c | 5 +++--
> > drivers/gpu/drm/tegra/gem.c | 3 ++-
> > drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 ++++---
> > drivers/vfio/pci/nvgrace-gpu/main.c | 3 ++-
> > drivers/vfio/pci/vfio_pci_core.c | 3 ++-
> > mm/nommu.c | 2 +-
> > mm/vma.c | 2 +-
> > virt/kvm/guest_memfd.c | 2 +-
> > 14 files changed, 26 insertions(+), 17 deletions(-)
> >
>
> [...]
>
> > #include <linux/io.h>
> > #include <linux/uaccess.h>
> > @@ -2462,7 +2463,7 @@ static int comedi_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > {
> > struct comedi_buf_map *bm = vma->vm_private_data;
> > unsigned long offset =
> > - addr - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT);
> > + addr - vma->vm_start + (vma_start_pgoff(vma) << PAGE_SHIFT);
>
> This doesn't seem to belong here.
Ah yeah, I'll move that on a respin thanks!
>
> > if (len < 0)
> > return -EINVAL;
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > index b0436a1e103f..2e4d6d117ee2 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > @@ -6,6 +6,7 @@
> > #include <drm/drm_prime.h>
> > #include <drm/drm_print.h>
> > #include <linux/dma-mapping.h>
> > +#include <linux/pagemap.h>
> > #include <linux/shmem_fs.h>
> > #include <linux/spinlock.h>
> > #include <linux/vmalloc.h>
> > @@ -188,7 +189,7 @@ static vm_fault_t etnaviv_gem_fault(struct vm_fault *vmf)
> > }
> > /* We don't use vmf->pgoff since that has the fake offset: */
> > - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> > + pgoff = linear_page_delta(vma, vmf->address);
> > pfn = page_to_pfn(pages[pgoff]);
> > diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
> > index 88f1e86c8903..2708e8c68f4c 100644
> > --- a/drivers/gpu/drm/gma500/gem.c
> > +++ b/drivers/gpu/drm/gma500/gem.c
> > @@ -288,7 +288,7 @@ static vm_fault_t psb_gem_fault(struct vm_fault *vmf)
> > /* Page relative to the VMA start - we must calculate this ourselves
> > because vmf->pgoff is the fake GEM offset */
> > - page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> > + page_offset = linear_page_delta(vma, vmf->address);
> > /* CPU view of the page, don't go via the GART for CPU writes */
> > if (pobj->stolen)
> > diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> > index efd3d3c9a449..cbf723a5d86f 100644
> > --- a/drivers/gpu/drm/msm/msm_gem.c
> > +++ b/drivers/gpu/drm/msm/msm_gem.c
> > @@ -9,6 +9,7 @@
> > #include <linux/spinlock.h>
> > #include <linux/shmem_fs.h>
> > #include <linux/dma-buf.h>
> > +#include <linux/pagemap.h>
> > #include <drm/drm_dumb_buffers.h>
> > #include <drm/drm_prime.h>
> > @@ -360,7 +361,7 @@ static vm_fault_t msm_gem_fault(struct vm_fault *vmf)
> > }
> > /* We don't use vmf->pgoff since that has the fake offset: */
> > - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> > + pgoff = linear_page_delta(vma, vmf->address);
> > pfn = page_to_pfn(pages[pgoff]);
> > diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
> > index 8e013e4f2c6b..00404fb6c29a 100644
> > --- a/drivers/gpu/drm/omapdrm/omap_gem.c
> > +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
> > @@ -5,6 +5,7 @@
> > */
> > #include <linux/dma-mapping.h>
> > +#include <linux/pagemap.h>
> > #include <linux/seq_file.h>
> > #include <linux/shmem_fs.h>
> > #include <linux/spinlock.h>
> > @@ -359,7 +360,7 @@ static vm_fault_t omap_gem_fault_1d(struct drm_gem_object *obj,
> > pgoff_t pgoff;
> > /* We don't use vmf->pgoff since that has the fake offset: */
> > - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> > + pgoff = linear_page_delta(vma, vmf->address);
> > if (omap_obj->pages) {
> > omap_gem_cpu_sync_page(obj, pgoff);
> > @@ -407,7 +408,7 @@ static vm_fault_t omap_gem_fault_2d(struct drm_gem_object *obj,
> > const int m = DIV_ROUND_UP(omap_obj->width << fmt, PAGE_SIZE);
> > /* We don't use vmf->pgoff since that has the fake offset: */
> > - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> > + pgoff = linear_page_delta(vma, vmf->address);
> > /*
> > * Actual address we start mapping at is rounded down to previous slot
> > diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
> > index 436394e04812..1d8d27a5ea89 100644
> > --- a/drivers/gpu/drm/tegra/gem.c
> > +++ b/drivers/gpu/drm/tegra/gem.c
> > @@ -13,6 +13,7 @@
> > #include <linux/dma-buf.h>
> > #include <linux/iommu.h>
> > #include <linux/module.h>
> > +#include <linux/pagemap.h>
> > #include <linux/vmalloc.h>
> > #include <drm/drm_drv.h>
> > @@ -564,7 +565,7 @@ static vm_fault_t tegra_bo_fault(struct vm_fault *vmf)
> > if (!bo->pages)
> > return VM_FAULT_SIGBUS;
> > - offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> > + offset = linear_page_delta(vma, vmf->address);
> > page = bo->pages[offset];
> > return vmf_insert_page(vma, vmf->address, page);
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > index a80510489c45..88babf435ac2 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > @@ -32,6 +32,7 @@
> > #define pr_fmt(fmt) "[TTM] " fmt
> > #include <linux/export.h>
> > +#include <linux/pagemap.h>
> > #include <drm/ttm/ttm_bo.h>
> > #include <drm/ttm/ttm_placement.h>
> > @@ -208,9 +209,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > if (unlikely(err != 0))
> > return VM_FAULT_SIGBUS;
> > - page_offset = ((address - vma->vm_start) >> PAGE_SHIFT) +
> > - vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node);
> > - page_last = vma_pages(vma) + vma->vm_pgoff -
> > + page_offset = linear_page_index(vma, address) -
> > + drm_vma_node_start(&bo->base.vma_node);
> > + page_last = vma_end_pgoff(vma) -
> > drm_vma_node_start(&bo->base.vma_node);
>
> Not your fault, but page_last seems misnamed here.
Yeah :)
>
> Best regards
> Thomas
>
> > if (unlikely(page_offset >= PFN_UP(bo->base.size)))
> > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> > index d07dcacb76bd..963fd8ded20d 100644
> > --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> > @@ -11,6 +11,7 @@
> > #include <linux/jiffies.h>
> > #include <linux/sched.h>
> > #include <linux/pci-p2pdma.h>
> > +#include <linux/pagemap.h>
> > #include <linux/pm_runtime.h>
> > #include <linux/memory-failure.h>
> > @@ -385,7 +386,7 @@ static unsigned long addr_to_pgoff(struct vm_area_struct *vma,
> > u64 pgoff = vma->vm_pgoff &
> > ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
> > - return ((addr - vma->vm_start) >> PAGE_SHIFT) + pgoff;
> > + return linear_page_delta(vma, addr) + pgoff;
> > }
> > static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index a28f1e99362c..55d4937d495a 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -19,6 +19,7 @@
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/notifier.h>
> > +#include <linux/pagemap.h>
> > #include <linux/pci.h>
> > #include <linux/pm_runtime.h>
> > #include <linux/slab.h>
> > @@ -1727,7 +1728,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
> > struct vm_area_struct *vma = vmf->vma;
> > struct vfio_pci_core_device *vdev = vma->vm_private_data;
> > unsigned long addr = vmf->address & ~((PAGE_SIZE << order) - 1);
> > - unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
> > + unsigned long pgoff = linear_page_delta(vma, addr);
> > unsigned long pfn = vma_to_pfn(vma) + pgoff;
> > vm_fault_t ret = VM_FAULT_FALLBACK;
> > diff --git a/mm/nommu.c b/mm/nommu.c
> > index 60560b2c457e..7333d855e974 100644
> > --- a/mm/nommu.c
> > +++ b/mm/nommu.c
> > @@ -1332,7 +1332,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > *region = *vma->vm_region;
> > new->vm_region = region;
> > - npages = (addr - vma->vm_start) >> PAGE_SHIFT;
> > + npages = linear_page_delta(vma, addr);
> > if (new_below) {
> > region->vm_top = region->vm_end = new->vm_end = addr;
> > diff --git a/mm/vma.c b/mm/vma.c
> > index ee3a8ca13d07..185d07397ca6 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -517,7 +517,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > new->vm_end = addr;
> > } else {
> > new->vm_start = addr;
> > - new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
> > + new->vm_pgoff += linear_page_delta(vma, addr);
> > }
> > err = -ENOMEM;
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index db57c5766ab6..f0e5da490866 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -440,7 +440,7 @@ static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpo
> > static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> > unsigned long addr, pgoff_t *ilx)
> > {
> > - pgoff_t pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> > + pgoff_t pgoff = linear_page_index(vma, addr);
> > struct inode *inode = file_inode(vma->vm_file);
> > *ilx = inode->i_ino;
>
> --
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
> GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
>
>
Cheers, Lorenzo
^ permalink raw reply
* Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation
From: Rik van Riel @ 2026-06-29 14:39 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), Lorenzo Stoakes
Cc: linux-kernel, kernel-team, linux-mm, david, willy, surenb, hannes,
ziy, usama.arif, fvdl, Andrew Morton, Jonathan Corbet,
Chris Mason, David Sterba, Steven Rostedt, Masami Hiramatsu,
Rafael J. Wysocki, Oscar Salvador, Mike Rapoport, linux-doc,
linux-btrfs, linux-trace-kernel, linux-pm, linux-cxl,
Linus Torvalds
In-Reply-To: <361fd2e5-a5f9-42fe-90fc-bc0af109553e@kernel.org>
On Mon, 2026-06-29 at 12:03 +0200, Vlastimil Babka (SUSE) wrote:
> On 6/29/26 11:29, Lorenzo Stoakes wrote:
> >
> > So to be concrete, if you send really rough code, Use [pre-RFC] or
> > [DO NOT
> > MERGE] (on the series as a whole) to make that clear and say so in
> > the
> > cover letter VERY VERY clearly.
>
> Yes please. [POC NOT-FOR-MERGE] perhaps?
>
> > Or, you can put it in a repo somewhere and link it in an email
> > discussing
> > the concepts (like I did with scalable CoW for instance).
>
> Indeed.
I'll do that for the next version.
I suspect it will take a while to beat this thing
into shape.
>
> > And _you have already done this_ in your reply here:
> >
> > * "How do people feel about splitting up the free lists, so each
> > gigabyte
> > (well, PUD sized) chunk of memory has its own free lists?"
>
> My immediate response is that now we'd need to search multiple sets
> of lists
> instead of a single one? What about the overhead?
The current code is clearly not good enough. It
has to try several gigablocks almost blindly,
because there is no efficient way to find the
right gigablock.
I have an idea on how to fix that with bitmaps.
We could have one bitmap per order, indicating which
gigablocks have order 0 pages, order 1 pages, etc
Then a second set of bitmaps indicating which gigablocks
have unmovable / reclaimable pages.
At that point, finding a good gigablock to allocate
from can be done with a bitmap_and and a search.
These bitmaps would only need to be changed when the
status of a gigablock changes, eg. going from having
order 0 pages free, to not having any order 0 pages
free.
Does that seem like a workable approach?
Once we can quickly pinpoint a gigablock for the
page allocator to grab pages from, we can also
split out the "pick a gigablock" code from the
"allocate a page" code.
>
> > * "How can we balance the desire for higher-order kernel
> > allocations,
> > against the desire to preserve gigabyte sized chunks of memory
> > that can
> > be used for user space?"
> >
> > * "How do we balance the desire to keep compaction overhead low
> > with the
> > desire to do higher order allocations almost everywhere?"
>
> How can we have a cake and eat it too? :)
Pretty much :/
I suspect it's going to require some fun interactions
between allocation, reclaim, and compaction.
However, with everybody from networking, to filesystems,
to anonymous memory wanting to use higher order allocations
of differing sizes, it seems like we're going to have to
tackle this somehow.
>
> > I'd also very strongly suggest (as I did in my original reply)
> > breaking out
> > parts that can be broken out as prerequisite series.
> >
> > If you're doing something good or useful _anyway_ then just send
> > that
> > separately first, and have later work rely on the earlier work.
>
That becomes cleaner with the "post a link to
a tree" thing, as well.
The pcpbuddy stuff is likely to go in separately.
Johannes is still working on that code.
The "make btrfs inode cache pages movable" thing
already went in.
I think I have a few more things in the tree that
can go in separately, but hopefully that will grow
as this code solidifies.
On the flip side, things like "making compaction
scale" may well end up depending on the gigablock
stuff, because lack of targeting data seems like a
likely cause for why compaction has to try so hard.
I'll make sure to go over every point raised by
you guys before writing the next version of the
code, and again before posting a link to the
tree.
--
All Rights Reversed.
^ permalink raw reply
* Re: [PATCH 19/30] mm: use linear_page_[index, delta]() consistently
From: Thomas Zimmermann @ 2026-06-29 13:56 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Rob Clark, Dmitry Baryshkov, Tomi Valkeinen, Thierry Reding,
Mikko Perttunen, Jonathan Hunter, Christian Koenig, Huang Rui,
Ankit Agrawal, Alex Williamson, Alexander Viro, Christian Brauner,
Dan Williams, Muchun Song, Oscar Salvador, David Hildenbrand,
Suren Baghdasaryan, Liam R . Howlett, Matthew Wilcox,
Marek Szyprowski, Peter Zijlstra, Arnaldo Carvalho de Melo,
Namhyung Kim, Masami Hiramatsu, Oleg Nesterov, Steven Rostedt,
SeongJae Park, Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook,
Paolo Bonzini, linux-kernel, linux-arm-kernel, linux-parisc,
linux-sgx, etnaviv, dri-devel, linux-arm-msm, freedreno,
linux-tegra, kvm, linux-fsdevel, nvdimm, linux-mm, iommu,
linux-perf-users, linux-trace-kernel, kasan-dev, damon,
Pedro Falcato, Rik van Riel, Harry Yoo, Jann Horn
In-Reply-To: <bf56e2e98b512962a2fb88900d535a0e9e6769d8.1782735110.git.ljs@kernel.org>
Hi
Am 29.06.26 um 14:23 schrieb Lorenzo Stoakes:
> There are a number of places where we open code what linear_page_index()
> and linear_page_delta() calculate.
>
> Replace this code with the appropriate functions for consistency.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
For the DRM changes:
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
See below for two additional comments.
> ---
> arch/arm/mm/fault-armv.c | 2 +-
> arch/x86/kernel/cpu/sgx/virt.c | 3 ++-
> drivers/comedi/comedi_fops.c | 3 ++-
> drivers/gpu/drm/etnaviv/etnaviv_gem.c | 3 ++-
> drivers/gpu/drm/gma500/gem.c | 2 +-
> drivers/gpu/drm/msm/msm_gem.c | 3 ++-
> drivers/gpu/drm/omapdrm/omap_gem.c | 5 +++--
> drivers/gpu/drm/tegra/gem.c | 3 ++-
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 ++++---
> drivers/vfio/pci/nvgrace-gpu/main.c | 3 ++-
> drivers/vfio/pci/vfio_pci_core.c | 3 ++-
> mm/nommu.c | 2 +-
> mm/vma.c | 2 +-
> virt/kvm/guest_memfd.c | 2 +-
> 14 files changed, 26 insertions(+), 17 deletions(-)
>
[...]
>
> #include <linux/io.h>
> #include <linux/uaccess.h>
> @@ -2462,7 +2463,7 @@ static int comedi_vm_access(struct vm_area_struct *vma, unsigned long addr,
> {
> struct comedi_buf_map *bm = vma->vm_private_data;
> unsigned long offset =
> - addr - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT);
> + addr - vma->vm_start + (vma_start_pgoff(vma) << PAGE_SHIFT);
This doesn't seem to belong here.
>
> if (len < 0)
> return -EINVAL;
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index b0436a1e103f..2e4d6d117ee2 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -6,6 +6,7 @@
> #include <drm/drm_prime.h>
> #include <drm/drm_print.h>
> #include <linux/dma-mapping.h>
> +#include <linux/pagemap.h>
> #include <linux/shmem_fs.h>
> #include <linux/spinlock.h>
> #include <linux/vmalloc.h>
> @@ -188,7 +189,7 @@ static vm_fault_t etnaviv_gem_fault(struct vm_fault *vmf)
> }
>
> /* We don't use vmf->pgoff since that has the fake offset: */
> - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> + pgoff = linear_page_delta(vma, vmf->address);
>
> pfn = page_to_pfn(pages[pgoff]);
>
> diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
> index 88f1e86c8903..2708e8c68f4c 100644
> --- a/drivers/gpu/drm/gma500/gem.c
> +++ b/drivers/gpu/drm/gma500/gem.c
> @@ -288,7 +288,7 @@ static vm_fault_t psb_gem_fault(struct vm_fault *vmf)
>
> /* Page relative to the VMA start - we must calculate this ourselves
> because vmf->pgoff is the fake GEM offset */
> - page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> + page_offset = linear_page_delta(vma, vmf->address);
>
> /* CPU view of the page, don't go via the GART for CPU writes */
> if (pobj->stolen)
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index efd3d3c9a449..cbf723a5d86f 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -9,6 +9,7 @@
> #include <linux/spinlock.h>
> #include <linux/shmem_fs.h>
> #include <linux/dma-buf.h>
> +#include <linux/pagemap.h>
>
> #include <drm/drm_dumb_buffers.h>
> #include <drm/drm_prime.h>
> @@ -360,7 +361,7 @@ static vm_fault_t msm_gem_fault(struct vm_fault *vmf)
> }
>
> /* We don't use vmf->pgoff since that has the fake offset: */
> - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> + pgoff = linear_page_delta(vma, vmf->address);
>
> pfn = page_to_pfn(pages[pgoff]);
>
> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
> index 8e013e4f2c6b..00404fb6c29a 100644
> --- a/drivers/gpu/drm/omapdrm/omap_gem.c
> +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
> @@ -5,6 +5,7 @@
> */
>
> #include <linux/dma-mapping.h>
> +#include <linux/pagemap.h>
> #include <linux/seq_file.h>
> #include <linux/shmem_fs.h>
> #include <linux/spinlock.h>
> @@ -359,7 +360,7 @@ static vm_fault_t omap_gem_fault_1d(struct drm_gem_object *obj,
> pgoff_t pgoff;
>
> /* We don't use vmf->pgoff since that has the fake offset: */
> - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> + pgoff = linear_page_delta(vma, vmf->address);
>
> if (omap_obj->pages) {
> omap_gem_cpu_sync_page(obj, pgoff);
> @@ -407,7 +408,7 @@ static vm_fault_t omap_gem_fault_2d(struct drm_gem_object *obj,
> const int m = DIV_ROUND_UP(omap_obj->width << fmt, PAGE_SIZE);
>
> /* We don't use vmf->pgoff since that has the fake offset: */
> - pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> + pgoff = linear_page_delta(vma, vmf->address);
>
> /*
> * Actual address we start mapping at is rounded down to previous slot
> diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
> index 436394e04812..1d8d27a5ea89 100644
> --- a/drivers/gpu/drm/tegra/gem.c
> +++ b/drivers/gpu/drm/tegra/gem.c
> @@ -13,6 +13,7 @@
> #include <linux/dma-buf.h>
> #include <linux/iommu.h>
> #include <linux/module.h>
> +#include <linux/pagemap.h>
> #include <linux/vmalloc.h>
>
> #include <drm/drm_drv.h>
> @@ -564,7 +565,7 @@ static vm_fault_t tegra_bo_fault(struct vm_fault *vmf)
> if (!bo->pages)
> return VM_FAULT_SIGBUS;
>
> - offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
> + offset = linear_page_delta(vma, vmf->address);
> page = bo->pages[offset];
>
> return vmf_insert_page(vma, vmf->address, page);
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index a80510489c45..88babf435ac2 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -32,6 +32,7 @@
> #define pr_fmt(fmt) "[TTM] " fmt
>
> #include <linux/export.h>
> +#include <linux/pagemap.h>
>
> #include <drm/ttm/ttm_bo.h>
> #include <drm/ttm/ttm_placement.h>
> @@ -208,9 +209,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> if (unlikely(err != 0))
> return VM_FAULT_SIGBUS;
>
> - page_offset = ((address - vma->vm_start) >> PAGE_SHIFT) +
> - vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node);
> - page_last = vma_pages(vma) + vma->vm_pgoff -
> + page_offset = linear_page_index(vma, address) -
> + drm_vma_node_start(&bo->base.vma_node);
> + page_last = vma_end_pgoff(vma) -
> drm_vma_node_start(&bo->base.vma_node);
Not your fault, but page_last seems misnamed here.
Best regards
Thomas
>
> if (unlikely(page_offset >= PFN_UP(bo->base.size)))
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index d07dcacb76bd..963fd8ded20d 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -11,6 +11,7 @@
> #include <linux/jiffies.h>
> #include <linux/sched.h>
> #include <linux/pci-p2pdma.h>
> +#include <linux/pagemap.h>
> #include <linux/pm_runtime.h>
> #include <linux/memory-failure.h>
>
> @@ -385,7 +386,7 @@ static unsigned long addr_to_pgoff(struct vm_area_struct *vma,
> u64 pgoff = vma->vm_pgoff &
> ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>
> - return ((addr - vma->vm_start) >> PAGE_SHIFT) + pgoff;
> + return linear_page_delta(vma, addr) + pgoff;
> }
>
> static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a28f1e99362c..55d4937d495a 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -19,6 +19,7 @@
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/notifier.h>
> +#include <linux/pagemap.h>
> #include <linux/pci.h>
> #include <linux/pm_runtime.h>
> #include <linux/slab.h>
> @@ -1727,7 +1728,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
> struct vm_area_struct *vma = vmf->vma;
> struct vfio_pci_core_device *vdev = vma->vm_private_data;
> unsigned long addr = vmf->address & ~((PAGE_SIZE << order) - 1);
> - unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
> + unsigned long pgoff = linear_page_delta(vma, addr);
> unsigned long pfn = vma_to_pfn(vma) + pgoff;
> vm_fault_t ret = VM_FAULT_FALLBACK;
>
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 60560b2c457e..7333d855e974 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -1332,7 +1332,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> *region = *vma->vm_region;
> new->vm_region = region;
>
> - npages = (addr - vma->vm_start) >> PAGE_SHIFT;
> + npages = linear_page_delta(vma, addr);
>
> if (new_below) {
> region->vm_top = region->vm_end = new->vm_end = addr;
> diff --git a/mm/vma.c b/mm/vma.c
> index ee3a8ca13d07..185d07397ca6 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -517,7 +517,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> new->vm_end = addr;
> } else {
> new->vm_start = addr;
> - new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
> + new->vm_pgoff += linear_page_delta(vma, addr);
> }
>
> err = -ENOMEM;
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index db57c5766ab6..f0e5da490866 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -440,7 +440,7 @@ static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpo
> static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
> unsigned long addr, pgoff_t *ilx)
> {
> - pgoff_t pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
> + pgoff_t pgoff = linear_page_index(vma, addr);
> struct inode *inode = file_inode(vma->vm_file);
>
> *ilx = inode->i_ino;
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply
* Re: [PATCH] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Bradley Morgan @ 2026-06-29 13:53 UTC (permalink / raw)
To: Breno Leitao; +Cc: akpm, mhiramat, linux-kernel, linux-trace-kernel, stable
In-Reply-To: <akJ0f2gsiEt01spu@gmail.com>
On 29 June 2026 14:41:37 BST, Breno Leitao <leitao@debian.org> wrote:
>On Sun, Jun 28, 2026 at 11:56:16AM +0000, Bradley Morgan wrote:
>> When xbc_snprint_cmdline() is called during the size-probing phase
>> (with buf = NULL and size = 0), the function computes the end pointer
>> as 'buf + size' (NULL + 0) and repeatedly advances the pointer via
>> 'buf += ret'.
>>
>> Under the C standard, performing pointer arithmetic on a NULL pointer is
>> undefined behavior. While harmless inside the kernel, this code is also
>> compiled into the userspace host tool 'tools/bootconfig', where host
>> compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
>> detect NULL pointer arithmetic.
>>
>> Fix this by tracking the running written length as an integer offset
>> ('len') rather than advancing 'buf' directly. Only perform pointer
>> arithmetic if 'buf' is actually non-NULL.
>>
>> Fixes: 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to
>lib/bootconfig.c")
>
>Isn't commit 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to
>lib/bootconfig.c") just a code movement?
Ugh, Geminis bullcrap, you are right. I should've just manually looked
for the fixes tag (as I always do)
>> xbc_node_for_each_key_value(root, knode, val) {
>> @@ -439,10 +437,12 @@ int __init xbc_snprint_cmdline(char *buf, size_t
>size, struct xbc_node *root)
>>
>> vnode = xbc_node_get_child(knode);
>> if (!vnode) {
>> - ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
>> + ret = snprintf(buf ? buf + len : NULL,
>> + size > len ? size - len : 0,
>
>Why not keeping rest() and updating it, instead of open coding it?
>
>Thanks for the fix.
sure I'll do V2, btw if u didn't read, gemini found and fixed this.
As in fully. :)
>--breno
>
Thanks!
^ permalink raw reply
* Re: [PATCH] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Breno Leitao @ 2026-06-29 13:41 UTC (permalink / raw)
To: Bradley Morgan; +Cc: akpm, mhiramat, linux-kernel, linux-trace-kernel, stable
In-Reply-To: <20260628115617.3190-1-include@grrlz.net>
On Sun, Jun 28, 2026 at 11:56:16AM +0000, Bradley Morgan wrote:
> When xbc_snprint_cmdline() is called during the size-probing phase
> (with buf = NULL and size = 0), the function computes the end pointer
> as 'buf + size' (NULL + 0) and repeatedly advances the pointer via
> 'buf += ret'.
>
> Under the C standard, performing pointer arithmetic on a NULL pointer is
> undefined behavior. While harmless inside the kernel, this code is also
> compiled into the userspace host tool 'tools/bootconfig', where host
> compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
> detect NULL pointer arithmetic.
>
> Fix this by tracking the running written length as an integer offset
> ('len') rather than advancing 'buf' directly. Only perform pointer
> arithmetic if 'buf' is actually non-NULL.
>
> Fixes: 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c")
Isn't commit 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to
lib/bootconfig.c") just a code movement?
> xbc_node_for_each_key_value(root, knode, val) {
> @@ -439,10 +437,12 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>
> vnode = xbc_node_get_child(knode);
> if (!vnode) {
> - ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
> + ret = snprintf(buf ? buf + len : NULL,
> + size > len ? size - len : 0,
Why not keeping rest() and updating it, instead of open coding it?
Thanks for the fix.
--breno
^ permalink raw reply
* Re: [PATCH v4 1/3] ftrace: Build trace_btf.c when CONFIG_DEBUG_INFO_BTF is enabled
From: Donglin Peng @ 2026-06-29 13:19 UTC (permalink / raw)
To: Steven Rostedt
Cc: mhiramat, linux-trace-kernel, bpf, linux-kernel, pengdonglin,
Xiaoqin Zhang
In-Reply-To: <20260513104229.2f39eb03@gandalf.local.home>
On Wed, May 13, 2026 at 10:42 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
> Sorry for the late reply, I've been a bit busy on other things recently.
>
> On Mon, 15 Dec 2025 11:41:51 +0800
> Donglin Peng <dolinux.peng@gmail.com> wrote:
>
> > From: pengdonglin <pengdonglin@xiaomi.com>
> >
> > The trace_btf.c file provides BTF helper functions used by the ftrace
> > subsystem. This change makes its compilation solely dependent on
>
> Nit, change logs should never say "This change". Instead it should be
> worded as:
>
> "Make the compilation of trace_btf.c soley depend on..."
Thanks, I will fix it.
>
> -- Steve
>
> > CONFIG_DEBUG_INFO_BTF, allowing features like funcgraph-retval to also
> > utilize these helpers.
> >
> > Additionally, the redundant dependency on CONFIG_PROBE_EVENTS_BTF_ARGS
> > is removed, as CONFIG_DEBUG_INFO_BTF already depends on
> > CONFIG_BPF_SYSCALL.
> >
> > Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Xiaoqin Zhang <zhangxiaoqin@xiaomi.com>
> > Signed-off-by: pengdonglin <pengdonglin@xiaomi.com>
> > ---
> > kernel/trace/Kconfig | 2 +-
> > kernel/trace/Makefile | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index e1214b9dc990..653c1fcefa4c 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -755,7 +755,7 @@ config FPROBE_EVENTS
> > config PROBE_EVENTS_BTF_ARGS
> > depends on HAVE_FUNCTION_ARG_ACCESS_API
> > depends on FPROBE_EVENTS || KPROBE_EVENTS
> > - depends on DEBUG_INFO_BTF && BPF_SYSCALL
> > + depends on DEBUG_INFO_BTF
> > bool "Support BTF function arguments for probe events"
> > default y
> > help
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index fc5dcc888e13..6c4bf5a6c4f3 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -116,7 +116,7 @@ obj-$(CONFIG_KGDB_KDB) += trace_kdb.o
> > endif
> > obj-$(CONFIG_DYNAMIC_EVENTS) += trace_dynevent.o
> > obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
> > -obj-$(CONFIG_PROBE_EVENTS_BTF_ARGS) += trace_btf.o
> > +obj-$(CONFIG_DEBUG_INFO_BTF) += trace_btf.o
> > obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
> > obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
> > obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
>
^ permalink raw reply
* Re: [PATCH v4 2/3] fgraph: Enhance funcgraph-retval with BTF-based type-aware output
From: Donglin Peng @ 2026-06-29 13:19 UTC (permalink / raw)
To: Steven Rostedt
Cc: mhiramat, linux-trace-kernel, bpf, linux-kernel, pengdonglin,
Xiaoqin Zhang
In-Reply-To: <20260513104758.6e64167e@gandalf.local.home>
On Wed, May 13, 2026 at 10:47 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 15 Dec 2025 11:41:52 +0800
> Donglin Peng <dolinux.peng@gmail.com> wrote:
>
> > From: pengdonglin <pengdonglin@xiaomi.com>
> >
> > The current funcgraph-retval implementation suffers from two accuracy
> > issues:
> >
> > 1. Void-returning functions still print a return value, creating
> > misleading noise in the trace output.
> >
> > 2. For functions returning narrower types (e.g., char, short), the
> > displayed value can be incorrect because high bits of the register
> > may contain undefined data.
> >
> > This patch addresses both problems by leveraging BTF to obtain the exact
> > return type of each traced kernel function. The key changes are:
> >
> > 1. Void function filtering: Functions with void return type no longer
> > display any return value in the trace output, eliminating unnecessary
> > clutter.
> >
> > 2. Type-aware value formatting: The return value is now properly truncated
> > to match the actual width of the return type before being displayed.
> > Additionally, the value is formatted according to its type for better
> > human readability.
> >
> > Here is an output comparison:
> >
> > Before:
> > # perf ftrace -G vfs_read --graph-opts retval
> > ...
> > 1) | touch_atime() {
> > 1) | atime_needs_update() {
> > 1) 0.069 us | make_vfsuid(); /* ret=0x0 */
> > 1) 0.067 us | make_vfsgid(); /* ret=0x0 */
> > 1) | current_time() {
> > 1) 0.197 us | ktime_get_coarse_real_ts64_mg(); /* ret=0x187f886aec3ed6f5 */
> > 1) 0.352 us | } /* current_time ret=0x69380753 */
> > 1) 0.792 us | } /* atime_needs_update ret=0x0 */
> > 1) 0.937 us | } /* touch_atime ret=0x0 */
> >
> > After:
> > # perf ftrace -G vfs_read --graph-opts retval
> > ...
> > 2) | touch_atime() {
> > 2) | atime_needs_update() {
> > 2) 0.070 us | make_vfsuid(); /* ret=0x0 */
> > 2) 0.070 us | make_vfsgid(); /* ret=0x0 */
> > 2) | current_time() {
> > 2) 0.162 us | ktime_get_coarse_real_ts64_mg();
> > 2) 0.312 us | } /* current_time ret=0x69380649(trunc) */
> > 2) 0.753 us | } /* atime_needs_update ret=false */
> > 2) 0.899 us | } /* touch_atime */
> >
> > Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Xiaoqin Zhang <zhangxiaoqin@xiaomi.com>
> > Signed-off-by: pengdonglin <pengdonglin@xiaomi.com>
> > ---
> > kernel/trace/trace_functions_graph.c | 124 ++++++++++++++++++++++++---
> > 1 file changed, 111 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> > index 17c75cf2348e..46b66b1cfc16 100644
> > --- a/kernel/trace/trace_functions_graph.c
> > +++ b/kernel/trace/trace_functions_graph.c
> > @@ -15,6 +15,7 @@
> >
> > #include "trace.h"
> > #include "trace_output.h"
> > +#include "trace_btf.h"
> >
> > /* When set, irq functions might be ignored */
> > static int ftrace_graph_skip_irqs;
> > @@ -120,6 +121,13 @@ enum {
> > FLAGS_FILL_END = 3 << TRACE_GRAPH_PRINT_FILL_SHIFT,
> > };
> >
> > +enum {
> > + RETVAL_FMT_HEX = BIT(0),
> > + RETVAL_FMT_DEC = BIT(1),
> > + RETVAL_FMT_BOOL = BIT(2),
> > + RETVAL_FMT_TRUNC = BIT(3),
> > +};
> > +
> > static void
> > print_graph_duration(struct trace_array *tr, unsigned long long duration,
> > struct trace_seq *s, u32 flags);
> > @@ -865,6 +873,73 @@ static void print_graph_retaddr(struct trace_seq *s, struct fgraph_retaddr_ent_e
> >
> > #if defined(CONFIG_FUNCTION_GRAPH_RETVAL) || defined(CONFIG_FUNCTION_GRAPH_RETADDR)
> >
> > +static void trim_retval(unsigned long func, unsigned long *retval, bool *print_retval,
> > + int *fmt)
>
> This function should really be in trace_btf.c and a stub when btf is not
> enabled.
Thanks, I will fix it.
>
> -- Steve
>
> > +{
> > + const struct btf_type *t;
> > + char name[KSYM_NAME_LEN];
> > + struct btf *btf;
> > + u32 v, msb;
> > + int kind;
> > +
> > + if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF))
> > + return;
> > +
> > + if (lookup_symbol_name(func, name))
> > + return;
> > +
> > + t = btf_find_func_proto(name, &btf);
> > + if (IS_ERR_OR_NULL(t))
> > + return;
> > +
> > + t = btf_type_skip_modifiers(btf, t->type, NULL);
> > + kind = t ? BTF_INFO_KIND(t->info) : BTF_KIND_UNKN;
> > + switch (kind) {
> > + case BTF_KIND_UNKN:
> > + *print_retval = false;
> > + break;
> > + case BTF_KIND_STRUCT:
> > + case BTF_KIND_UNION:
> > + case BTF_KIND_ENUM:
> > + case BTF_KIND_ENUM64:
> > + if (kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION)
> > + *fmt = RETVAL_FMT_HEX;
> > + else
> > + *fmt = RETVAL_FMT_DEC;
> > +
> > + if (t->size > sizeof(unsigned long)) {
> > + *fmt |= RETVAL_FMT_TRUNC;
> > + } else {
> > + msb = BITS_PER_BYTE * t->size - 1;
> > + *retval &= GENMASK(msb, 0);
> > + }
> > + break;
> > + case BTF_KIND_INT:
> > + v = *(u32 *)(t + 1);
> > + if (BTF_INT_ENCODING(v) == BTF_INT_BOOL) {
> > + *fmt = RETVAL_FMT_BOOL;
> > + msb = 0;
> > + } else {
> > + if (BTF_INT_ENCODING(v) == BTF_INT_SIGNED)
> > + *fmt = RETVAL_FMT_DEC;
> > + else
> > + *fmt = RETVAL_FMT_HEX;
> > +
> > + if (t->size > sizeof(unsigned long)) {
> > + *fmt |= RETVAL_FMT_TRUNC;
> > + msb = BITS_PER_LONG - 1;
> > + } else {
> > + msb = BTF_INT_BITS(v) - 1;
> > + }
> > + }
> > + *retval &= GENMASK(msb, 0);
> > + break;
> > + default:
> > + *fmt = RETVAL_FMT_HEX;
> > + break;
> > + }
> > +}
> > +
^ permalink raw reply
* Re: [RFC PATCH v2 2/4] rtla/osnoise: Record IPI count in osnoise top
From: Tomas Glozar @ 2026-06-29 12:56 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, linux-trace-kernel, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Costa Shulyupin,
Crystal Wood, John Kacur, Ivan Pravdin, Jonathan Corbet
In-Reply-To: <20260617131803.2988989-3-vschneid@redhat.com>
st 17. 6. 2026 v 15:18 odesílatel Valentin Schneider
<vschneid@redhat.com> napsal:
>
> Leverage the ipi_send_cpu and ipi_send_cpumask trace events to record the
> count of IPIs sent to monitored CPUs. These interferences are already
> accounted by the IRQ count, but this split gives a better overall picture.
>
> This uses the newly added -i cmdline option.
>
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> tools/tracing/rtla/src/osnoise_top.c | 124 ++++++++++++++++++++++++++-
> 1 file changed, 123 insertions(+), 1 deletion(-)
>
Overall, looks good to me (see small comments below). The reported
numbers make sense:
[tglozar@cs9 rtla]$ sudo ./rtla osnoise top -q -c 0,1 -d 5s --ipi
Operating System Noise
duration: 0 00:00:05 | time is in us
CPU Period Runtime Noise % CPU Aval Max Noise Max
Single HW NMI IRQ Softirq Thread
IPI
0 #4 4000000 28481 99.28797 8977
248 6756 0 4002 18 1
42
1 #5 5000000 38025 99.23950 8120
185 8403 0 5260 153 141
49
(It looks good in the terminal, I'm sure Gmail will garble it...)
I'll compare with trace output on the next patch.
> diff --git a/tools/tracing/rtla/src/osnoise_top.c b/tools/tracing/rtla/src/osnoise_top.c
> index 512a6299cb018..5b462a3543b97 100644
> --- a/tools/tracing/rtla/src/osnoise_top.c
> +++ b/tools/tracing/rtla/src/osnoise_top.c
>
> [truncated]
>
> @@ -70,6 +72,91 @@ static struct osnoise_top_data *osnoise_alloc_top(void)
> return NULL;
> }
> +static void account_ipi(struct osnoise_tool *tool,
> + unsigned long long src_cpu, unsigned long long dst_cpu)
> +{
> + struct osnoise_top_cpu *cpu_data;
> + struct osnoise_top_data *data;
> + unsigned long long inc = 1;
> +
> + data = tool->data;
> + cpu_data = &data->cpu_data[dst_cpu];
> +
> + update_sum(&cpu_data->ipi_count, &inc);
> +}
> +
> +/*
> + * osnoise_ipi_cpu_handler - this is the handler for single CPU IPI events.
> + */
> +static int
> +osnoise_ipi_cpu_handler(struct trace_seq *s, struct tep_record *record,
> + struct tep_event *event, void *context)
> +{
> + struct osnoise_tool *tool;
> + struct osnoise_params *params;
> + unsigned long long src_cpu, dst_cpu;
> + struct trace_instance *trace = context;
> +
> + tool = container_of(trace, struct osnoise_tool, trace);
> + params = to_osnoise_params(tool->params);
> +
> + src_cpu = record->cpu;
> + tep_get_field_val(s, event, "cpu", record, &dst_cpu, 1);
> +
> + if (CPU_ISSET(dst_cpu, ¶ms->common.monitored_cpus))
> + account_ipi(tool, src_cpu, dst_cpu);
Do we need to retrieve and pass the src_cpu here? I get it if you plan
on using it in the future, but as far as I understand, you are
specifically tracking the destination CPU, not the source CPU. Same
note applies to osnoise_ipi_cpumask_handler() below.
> +
> + return 0;
> +}
> +
> +static cpu_set_t cpumask_tmp_cpus;
> +
> +/*
> + * osnoise_ipi_cpumask_handler - this is the handler for broadcasted IPI events.
> + */
> +static int
> +osnoise_ipi_cpumask_handler(struct trace_seq *s, struct tep_record *record,
> + struct tep_event *event, void *context)
> +{
> + struct trace_instance *trace = context;
> + struct osnoise_tool *tool;
> + struct osnoise_params *params;
> + struct tep_format_field *field;
> + unsigned long long src_cpu;
> + cpu_set_t *event_cpus;
> + int len;
> +
> + tool = container_of(trace, struct osnoise_tool, trace);
> + params = to_osnoise_params(tool->params);
> +
> + src_cpu = record->cpu;
> +
> + field = tep_find_field(event, "cpumask");
> + if (!field)
> + return 0;
> +
> + event_cpus = tep_get_field_raw(s, event, "cpumask", record, &len, 1);
> + if (!event_cpus) {
> + err_msg("Failed to get cpumask field\n");
> + return 0;
> + }
> +
> + CPU_AND(&cpumask_tmp_cpus, event_cpus, ¶ms->common.monitored_cpus);
> +
> + /*
> + * Computing the mask weight is overkill but there is no leaner option
> + * provided by glibc, e.g cpumask_first() or somesuch.
> + */
> + if (CPU_COUNT(&cpumask_tmp_cpus)) {
> + for (int cpu = 0; cpu < nr_cpus; cpu++) {
> + if (CPU_ISSET(cpu, &cpumask_tmp_cpus))
> + account_ipi(tool, src_cpu, cpu);
> + }
> + }
Technically, the existing code already relies on the glibc cpumask
implementation (cpu_set_t) matching the kernel "cpumask_t" type, as
the "cpumask" field is the latter (per
/sys/kernel/tracing/events/ipi/ipi_send_cpumask/format), not the
former. So I wouldn't worry about the opaqueness of cpu_set_t much.
Not sure how this is handled in other tracing tools that need to use
cpumask, I'd have to look around a bit. It might even make sense to
have a "tools" version of the cpumask functions like cpumask_first(),
I guess, like we already do for e.g. lists and container_of.
> +
> + return 0;
> +}
> +
> /*
> * osnoise_top_handler - this is the handler for osnoise tracer events
> */
Nit: As this is extra functionality, it'd be more readable to have the
IPI handling after the main top handler, so that someone not familiar
with the source code will see the core logic first. That would also
match IPI being displayed to the right of the other numbers in the top
output.
> @@ -164,6 +251,8 @@ static void osnoise_top_header(struct osnoise_tool *top)
> goto eol;
>
> trace_seq_printf(s, " IRQ Softirq Thread");
> + if (params->common.ipi)
> + trace_seq_printf(s, " IPI");
>
> eol:
> if (pretty)
> @@ -218,7 +307,13 @@ static void osnoise_top_print(struct osnoise_tool *tool, int cpu)
>
> trace_seq_printf(s, "%12llu ", cpu_data->irq_count);
> trace_seq_printf(s, "%12llu ", cpu_data->softirq_count);
> - trace_seq_printf(s, "%12llu\n", cpu_data->thread_count);
> + trace_seq_printf(s, "%12llu", cpu_data->thread_count);
> + if (!params->common.ipi) {
> + trace_seq_printf(s, "\n");
> + return;
> + }
> +
> + trace_seq_printf(s, " %12llu\n", cpu_data->ipi_count);
Maybe at this point it is worth it to print the "\n" in a separate
statement, readability-wise:
trace_seq_printf(s, "%12llu ", cpu_data->irq_count);
trace_seq_printf(s, "%12llu ", cpu_data->softirq_count);
trace_seq_printf(s, "%12llu", cpu_data->thread_count);
if (params->common.ipi)
trace_seq_printf(s, " %12llu", cpu_data->ipi_count);
trace_seq_printf(s, "\n");
It would also make diffs nicer when adding new options.
> [truncated]
Tomas
^ permalink raw reply
* [PATCH 30/30] tools/testing/vma: output compared expression on ASSERT_[EQ, NE]()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Update the macros to output the compared values at hex for easier debugging
when test asserts fail.
Also remove unused IS_SET() macro.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
tools/testing/vma/shared.h | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/tools/testing/vma/shared.h b/tools/testing/vma/shared.h
index ca4f1238f1c7..216be4cda369 100644
--- a/tools/testing/vma/shared.h
+++ b/tools/testing/vma/shared.h
@@ -21,19 +21,28 @@
} \
} while (0)
-#define ASSERT_TRUE(_expr) \
- do { \
- if (!(_expr)) { \
- fprintf(stderr, \
- "Assert FAILED at %s:%d:%s(): %s is FALSE.\n", \
- __FILE__, __LINE__, __FUNCTION__, #_expr); \
- return false; \
- } \
+#define __ASSERT_TRUE(_expr, _fmt, ...) \
+ do { \
+ if (!(_expr)) { \
+ fprintf(stderr, \
+ "Assert FAILED at %s:%d:%s(): %s is FALSE" \
+ _fmt ".\n", \
+ __FILE__, __LINE__, __FUNCTION__, #_expr \
+ __VA_OPT__(,) __VA_ARGS__); \
+ return false; \
+ } \
} while (0)
+#define __TO_SCALAR(x) ((unsigned long long)(uintptr_t)(x))
+
+#define ASSERT_TRUE(_expr) __ASSERT_TRUE(_expr, "")
#define ASSERT_FALSE(_expr) ASSERT_TRUE(!(_expr))
-#define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
-#define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
+#define ASSERT_EQ(_val1, _val2) \
+ __ASSERT_TRUE((_val1) == (_val2), " (0x%llx != 0x%llx)", \
+ __TO_SCALAR(_val1), __TO_SCALAR(_val2))
+#define ASSERT_NE(_val1, _val2) \
+ __ASSERT_TRUE((_val1) != (_val2), " (0x%llx == 0x%llx)", \
+ __TO_SCALAR(_val1), __TO_SCALAR(_val2))
#define ASSERT_FLAGS_SAME_MASK(_flags, _flags_other) \
ASSERT_TRUE(vma_flags_same_mask((_flags), (_flags_other)))
@@ -53,8 +62,6 @@
#define ASSERT_FLAGS_NONEMPTY(_flags) \
ASSERT_FALSE(vma_flags_empty(_flags))
-#define IS_SET(_val, _flags) ((_val & _flags) == _flags)
-
extern bool fail_prealloc;
/* Override vma_iter_prealloc() so we can choose to fail it. */
--
2.54.0
^ permalink raw reply related
* [PATCH 29/30] tools/testing/vma: default VMA flag bits to 64-bit
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
With all of the sanitisers turned on, setting the VMA flag bits depth to
128 by default results in overly long build times.
Reduce this to 64 - we can always manipulate these later for testing of
larger bitmaps as needed.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
tools/testing/vma/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/vma/Makefile b/tools/testing/vma/Makefile
index e72b45dedda5..ef6cc558afe1 100644
--- a/tools/testing/vma/Makefile
+++ b/tools/testing/vma/Makefile
@@ -10,7 +10,7 @@ OFILES = $(SHARED_OFILES) main.o shared.o maple-shim.o
TARGETS = vma
# These can be varied to test different sizes.
-CFLAGS += -DNUM_VMA_FLAG_BITS=128 -DNUM_MM_FLAG_BITS=128
+CFLAGS += -DNUM_VMA_FLAG_BITS=64 -DNUM_MM_FLAG_BITS=64
main.o: main.c shared.c shared.h vma_internal.h tests/merge.c tests/mmap.c tests/vma.c ../../../mm/vma.c ../../../mm/vma_init.c ../../../mm/vma_exec.c ../../../mm/vma.h include/custom.h include/dup.h include/stubs.h
--
2.54.0
^ permalink raw reply related
* [PATCH 28/30] mm/vma: use guard clauses in can_vma_merge_[before, after]()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Rather than combining a bunch of conditionals in a single expression,
simplify by inverting the mergeability requirements into guard clauses.
that is - instead of checking what must be true for the conditions to be
met, instead check the inverse of the requirements and return false if any
are true, defaulting to true.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 5c3062e0e706..7201199fc668 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -215,13 +215,13 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
*/
static bool can_vma_merge_before(struct vma_merge_struct *vmg)
{
- if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
- is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
- if (vmg_end_pgoff(vmg) == vma_start_pgoff(vmg->next))
- return true;
- }
-
- return false;
+ if (!is_mergeable_vma(vmg, /* merge_next = */ true))
+ return false;
+ if (!is_mergeable_anon_vma(vmg, /* merge_next = */ true))
+ return false;
+ if (vmg_end_pgoff(vmg) != vma_start_pgoff(vmg->next))
+ return false;
+ return true;
}
/*
@@ -235,12 +235,13 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
*/
static bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
- if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
- is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
- if (vma_end_pgoff(vmg->prev) == vmg_start_pgoff(vmg))
- return true;
- }
- return false;
+ if (!is_mergeable_vma(vmg, /* merge_next = */ false))
+ return false;
+ if (!is_mergeable_anon_vma(vmg, /* merge_next = */ false))
+ return false;
+ if (vma_end_pgoff(vmg->prev) != vmg_start_pgoff(vmg))
+ return false;
+ return true;
}
static void __vma_link_file(struct vm_area_struct *vma,
--
2.54.0
^ permalink raw reply related
* [PATCH 27/30] mm/vma: correct incorrect vma.h inclusion
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
The only files which should be including vma.h are the implementation files
for the core VMA logic - vma.c, vma_init.c, and vma_exec.c.
This is in order to allow for userland testing of core VMA logic. In this
cases, vma_internal.h and vma.h are included, providing both the
dependencies upon which the core VMA logic requires and its declarations.
Userland testable VMA logic is achieved by having separate vma_internal.h
implementations for userland and kernel.
Callers other than the core VMA implementation should include internal.h
instead. This header does not need to include vma_internal.h as it only
contains the vma.h declarations, for which the includes already present
suffice.
Update code to reflect this, update comments to reflect the fact there are
3 VMA implementation files and document things more clearly.
While we're here, slightly improve the language of the comment describing
vma_exec.c.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/mmu_notifier.c | 2 +-
mm/nommu.c | 1 -
mm/vma.c | 4 ++++
mm/vma.h | 9 ++++++++-
mm/vma_exec.c | 8 ++++++--
mm/vma_init.c | 4 ++++
mm/vma_internal.h | 4 ++--
7 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 245b74f39f91..df69ba6e797f 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -19,7 +19,7 @@
#include <linux/sched/mm.h>
#include <linux/slab.h>
-#include "vma.h"
+#include "internal.h"
/* global SRCU for all MMs */
DEFINE_STATIC_SRCU(srcu);
diff --git a/mm/nommu.c b/mm/nommu.c
index ba1c923c0942..4fef6fbbd6e9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -41,7 +41,6 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include "internal.h"
-#include "vma.h"
unsigned long highest_memmap_pfn;
int heap_stack_gap = 0;
diff --git a/mm/vma.c b/mm/vma.c
index d727150e377a..5c3062e0e706 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -4,6 +4,10 @@
* VMA-specific functions.
*/
+/*
+ * To allow for userland testing we place internal dependencies in
+ * vma_internal.h and external VMA API declarations in vma.h.
+ */
#include "vma_internal.h"
#include "vma.h"
diff --git a/mm/vma.h b/mm/vma.h
index 155eadda47aa..f4f885615a92 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -2,7 +2,14 @@
/*
* vma.h
*
- * Core VMA manipulation API implemented in vma.c.
+ * Core VMA manipulation API implemented in vma.c, vma_init.c and vma_exec.c.
+ *
+ * Note that, in order for VMA logic to be userland testable, this header
+ * intentionally includes no dependencies.
+ *
+ * This is specifically scoped to mm-only. Users of this functionality (other
+ * than the core VMA implementation itself) should not include this header
+ * directly, but rather include internal.h.
*/
#ifndef __MM_VMA_H
#define __MM_VMA_H
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 0107a6e3918c..c0f7ba2cfb27 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -1,10 +1,14 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * Functions explicitly implemented for exec functionality which however are
- * explicitly VMA-only logic.
+ * Functions provided for exec functionality which however are
+ * specifically VMA-only logic.
*/
+/*
+ * To allow for userland testing we place internal dependencies in
+ * vma_internal.h and external VMA API declarations in vma.h.
+ */
#include "vma_internal.h"
#include "vma.h"
diff --git a/mm/vma_init.c b/mm/vma_init.c
index a459669a1654..715feee283f0 100644
--- a/mm/vma_init.c
+++ b/mm/vma_init.c
@@ -5,6 +5,10 @@
* between CONFIG_MMU and non-CONFIG_MMU kernel configurations.
*/
+/*
+ * To allow for userland testing we place internal dependencies in
+ * vma_internal.h and external VMA API declarations in vma.h.
+ */
#include "vma_internal.h"
#include "vma.h"
diff --git a/mm/vma_internal.h b/mm/vma_internal.h
index 2da6d224c1a8..4d300e7bbaf4 100644
--- a/mm/vma_internal.h
+++ b/mm/vma_internal.h
@@ -2,8 +2,8 @@
/*
* vma_internal.h
*
- * Headers required by vma.c, which can be substituted accordingly when testing
- * VMA functionality.
+ * Headers required by vma.c, vma_init.c and vma_exec.c, which can be
+ * substituted accordingly when testing VMA functionality.
*/
#ifndef __MM_VMA_INTERNAL_H
--
2.54.0
^ permalink raw reply related
* [PATCH 26/30] mm/vma: introduce and use vma_set_pgoff()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
In order to lay the foundation for work that permits us to track the
virtual page offset of MAP_PRIVATE file-backed mappings, we abstract the
assignment of vma->vm_pgoff to vma_set_pgoff().
We additionally add a lock check here using the newly introduced
vma_assert_can_modify(). This asserts the VMA write lock if the VMA is
attached.
We also assert that, if this is an anonymous VMA and unfaulted, that its
(virtual) page offset is equal to the page offset of the VMA's address.
In order to maintain correctness given this assert, we also update
__install_special_mapping() to invoke vma_set_range() after it's set
vma->vm_ops (which determine whether the VMA is anonymous or not).
We do not use vma_set_pgoff() in vm_area_init_from(), as at the point of
forking, we don't necessarily have correct locking state.
Updating vma_set_range() covers most cases, but in addition to this we also
update insert_vm_struct(), compat_set_vma_from_desc() and nommu callers.
We also update vma_add_pgoff() and vma_sub_pgoff() to use vma_set_pgoff().
While we're here, we drop a BUG_ON() and update insert_vm_struct()'s
comment to reflect the fact anonymous mappings can be added here.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/nommu.c | 2 +-
mm/vma.c | 14 +++++++-------
mm/vma.h | 15 ++++++++++++---
tools/testing/vma/include/dup.h | 2 +-
4 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/mm/nommu.c b/mm/nommu.c
index c7fafcd87c14..ba1c923c0942 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1059,7 +1059,7 @@ unsigned long do_mmap(struct file *file,
region->vm_pgoff = pgoff;
vm_flags_init(vma, vm_flags);
- vma->vm_pgoff = pgoff;
+ vma_set_pgoff(vma, pgoff);
if (file) {
region->vm_file = get_file(file);
diff --git a/mm/vma.c b/mm/vma.c
index 0579fc8c9bd5..d727150e377a 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -81,7 +81,7 @@ static void vma_set_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff)
{
__vma_set_range(vma, start, end);
- vma->vm_pgoff = pgoff;
+ vma_set_pgoff(vma, pgoff);
}
/* Was this VMA ever forked from a parent, i.e. maybe contains CoW mappings? */
@@ -3345,9 +3345,9 @@ int __vm_munmap(unsigned long start, size_t len, bool unlock)
return ret;
}
-/* Insert vm structure into process list sorted by address
- * and into the inode's i_mmap tree. If vm_file is non-NULL
- * then i_mmap_rwsem is taken here.
+/*
+ * Insert vm structure into process list sorted by address
+ * and into the inode's i_mmap tree if file-backed.
*/
int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
{
@@ -3373,8 +3373,8 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
* Similarly in do_mmap and in do_brk_flags.
*/
if (vma_is_anonymous(vma)) {
- BUG_ON(vma->anon_vma);
- vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
+ WARN_ON_ONCE(vma->anon_vma);
+ vma_set_pgoff(vma, vma->vm_start >> PAGE_SHIFT);
}
if (vma_link(mm, vma)) {
@@ -3420,7 +3420,6 @@ struct vm_area_struct *__install_special_mapping(
if (unlikely(vma == NULL))
return ERR_PTR(-ENOMEM);
- vma_set_range(vma, addr, addr + len, 0);
vm_flags |= mm->def_flags | VM_DONTEXPAND;
if (pgtable_supports_soft_dirty())
vm_flags |= VM_SOFTDIRTY;
@@ -3429,6 +3428,7 @@ struct vm_area_struct *__install_special_mapping(
vma->vm_ops = ops;
vma->vm_private_data = priv;
+ vma_set_range(vma, addr, addr + len, 0);
ret = insert_vm_struct(mm, vma);
if (ret)
diff --git a/mm/vma.h b/mm/vma.h
index 9658e0c678ad..155eadda47aa 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -247,16 +247,25 @@ static inline pgoff_t vmg_end_pgoff(const struct vma_merge_struct *vmg)
return vmg_start_pgoff(vmg) + vmg_pages(vmg);
}
+static inline void vma_set_pgoff(struct vm_area_struct *vma, pgoff_t pgoff)
+{
+ vma_assert_can_modify(vma);
+
+ VM_WARN_ON_ONCE(vma_is_anonymous(vma) && !vma->anon_vma &&
+ pgoff != vma->vm_start >> PAGE_SHIFT);
+ vma->vm_pgoff = pgoff;
+}
+
static inline void vma_add_pgoff(struct vm_area_struct *vma, pgoff_t delta)
{
vma_assert_can_modify(vma);
- vma->vm_pgoff += delta;
+ vma_set_pgoff(vma, vma_start_pgoff(vma) + delta);
}
static inline void vma_sub_pgoff(struct vm_area_struct *vma, pgoff_t delta)
{
vma_assert_can_modify(vma);
- vma->vm_pgoff -= delta;
+ vma_set_pgoff(vma, vma_start_pgoff(vma) - delta);
}
#define VMG_STATE(name, mm_, vmi_, start_, end_, vma_flags_, pgoff_) \
@@ -332,7 +341,7 @@ static inline void compat_set_vma_from_desc(struct vm_area_struct *vma,
*/
/* Mutable fields. Populated with initial state. */
- vma->vm_pgoff = desc->pgoff;
+ vma_set_pgoff(vma, desc->pgoff);
if (desc->vm_file != vma->vm_file)
vma_set_file(vma, desc->vm_file);
vma->flags = desc->vma_flags;
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 41fea90a344d..5d7d0afd7765 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -1186,7 +1186,7 @@ static inline void vma_assert_can_modify(struct vm_area_struct *vma)
static inline void vma_assert_detached(struct vm_area_struct *vma)
{
- WARN_ON_ONCE(refcount_read(&vma->vm_refcnt));
+ WARN_ON_ONCE(vma_is_attached(vma));
}
static inline void vma_assert_write_locked(struct vm_area_struct *);
--
2.54.0
^ permalink raw reply related
* [PATCH 25/30] mm/vma: update vmg_adjust_set_range() to offset pgoff instead
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
We are calculating the pgoff as an offset, since we have vma_add_pgoff()
and vma_sub_pgoff() available, just offset this value directly and use
__vma_set_range() for vma->vm_[start, end] values.
We take care to update the range before offsetting the page offset, so the
adjusted VMA's vm_start and vm_pgoff are mutually consistent at the point
the page offset helpers operate - this matters once vma_set_pgoff() comes
to assert invariants which relate the two.
Doing so lays the foundation for future work which allows for use of
virtual page offsets for MAP_PRIVATE-file backed mappings.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index e3355eab11f2..0579fc8c9bd5 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -714,9 +714,6 @@ void validate_mm(struct mm_struct *mm)
*/
static void vmg_adjust_set_range(struct vma_merge_struct *vmg)
{
- struct vm_area_struct *adjust;
- pgoff_t pgoff;
-
if (vmg->__adjust_middle_start) {
/*
* vmg->start vmg->end
@@ -735,8 +732,8 @@ static void vmg_adjust_set_range(struct vma_merge_struct *vmg)
struct vm_area_struct *middle = vmg->middle;
const unsigned long delta = vmg->end - middle->vm_start;
- pgoff = vma_start_pgoff(middle) + (delta >> PAGE_SHIFT);
- adjust = middle;
+ __vma_set_range(middle, vmg->end, middle->vm_end);
+ vma_add_pgoff(middle, delta >> PAGE_SHIFT);
} else if (vmg->__adjust_next_start) {
/*
* Originally:
@@ -764,13 +761,9 @@ static void vmg_adjust_set_range(struct vma_merge_struct *vmg)
struct vm_area_struct *next = vmg->next;
const unsigned long delta = next->vm_start - vmg->end;
- pgoff = vma_start_pgoff(next) - (delta >> PAGE_SHIFT);
- adjust = next;
- } else {
- return;
+ __vma_set_range(next, vmg->end, next->vm_end);
+ vma_sub_pgoff(next, delta >> PAGE_SHIFT);
}
-
- vma_set_range(adjust, vmg->end, adjust->vm_end, pgoff);
}
/*
--
2.54.0
^ permalink raw reply related
* [PATCH 24/30] mm/vma: update vma_shrink() to not pass unnecessary pgoff parameter
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
vma_shrink() does not need to adjust vma->vm_pgoff, we were passing this
parameter solely to satisfy vma_set_range()'s requirement for pgoff being
specified.
Since vma_set_range() is now isolated to vma.c, we can simply introduce
__vma_set_range() which sets only vma->vm_[start, end], and invoke this
instead, removing pgoff from vma_shrink() altogether.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.c | 14 ++++++++++----
mm/vma.h | 2 +-
mm/vma_exec.c | 2 +-
tools/testing/vma/tests/merge.c | 2 +-
4 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index b16c5b20862f..e3355eab11f2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -70,11 +70,17 @@ struct mmap_state {
.state = VMA_MERGE_START, \
}
-static void vma_set_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff)
+static void __vma_set_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end)
{
vma->vm_start = start;
vma->vm_end = end;
+}
+
+static void vma_set_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, pgoff_t pgoff)
+{
+ __vma_set_range(vma, start, end);
vma->vm_pgoff = pgoff;
}
@@ -1289,7 +1295,7 @@ int vma_expand(struct vma_merge_struct *vmg)
* Returns: 0 on success, -ENOMEM otherwise
*/
int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff)
+ unsigned long start, unsigned long end)
{
struct vma_prepare vp;
@@ -1310,7 +1316,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_adjust_trans_huge(vma, start, end, NULL);
vma_iter_clear(vmi);
- vma_set_range(vma, start, end, pgoff);
+ __vma_set_range(vma, start, end);
vma_complete(&vp, vmi, vma->vm_mm);
validate_mm(vma->vm_mm);
return 0;
diff --git a/mm/vma.h b/mm/vma.h
index 14f026bf3be4..9658e0c678ad 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -298,7 +298,7 @@ void validate_mm(struct mm_struct *mm);
__must_check int vma_expand(struct vma_merge_struct *vmg);
__must_check int vma_shrink(struct vma_iterator *vmi,
struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff);
+ unsigned long start, unsigned long end);
static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
struct vm_area_struct *vma, gfp_t gfp)
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index e3644a3042e2..0107a6e3918c 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -89,7 +89,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
vma_prev(&vmi);
/* Shrink the vma to just the new range */
- return vma_shrink(&vmi, vma, new_start, new_end, vma_start_pgoff(vma));
+ return vma_shrink(&vmi, vma, new_start, new_end);
}
/*
diff --git a/tools/testing/vma/tests/merge.c b/tools/testing/vma/tests/merge.c
index f8666a755749..04704d6eb426 100644
--- a/tools/testing/vma/tests/merge.c
+++ b/tools/testing/vma/tests/merge.c
@@ -227,7 +227,7 @@ static bool test_simple_shrink(void)
ASSERT_FALSE(attach_vma(&mm, vma));
- ASSERT_FALSE(vma_shrink(&vmi, vma, 0, 0x1000, 0));
+ ASSERT_FALSE(vma_shrink(&vmi, vma, 0, 0x1000));
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x1000);
--
2.54.0
^ permalink raw reply related
* [PATCH 23/30] mm/vma: make vma_set_range() static, drop insert_vm_struct() decl
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
With __install_special_mapping() moved to vma.c, vma_set_range() can be
made into a static function there and is now completely isolated from the
rest of mm.
While we're here, we can also remove the insert_vm_struct() declaration
from mm.h - the function is implemented in vma.c and already declared in
vma.h, and has no users outside of mm.
Also update the VMA userland tests to reflect this change.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/mm.h | 1 -
mm/internal.h | 9 ---------
mm/vma.c | 8 ++++++++
tools/testing/vma/shared.c | 9 ---------
tools/testing/vma/shared.h | 5 -----
5 files changed, 8 insertions(+), 24 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index cf2d42747064..868b2334bff3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4103,7 +4103,6 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *avc);
/* mmap.c */
extern int __vm_enough_memory(const struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void exit_mmap(struct mm_struct *);
bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, bool write);
diff --git a/mm/internal.h b/mm/internal.h
index 89e5b7efe256..e127dfea9c0f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1720,15 +1720,6 @@ extern bool mirrored_kernelcore;
bool memblock_has_mirror(void);
void memblock_free_all(void);
-static __always_inline void vma_set_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- pgoff_t pgoff)
-{
- vma->vm_start = start;
- vma->vm_end = end;
- vma->vm_pgoff = pgoff;
-}
-
static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
{
/*
diff --git a/mm/vma.c b/mm/vma.c
index f4de706a2728..b16c5b20862f 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -70,6 +70,14 @@ struct mmap_state {
.state = VMA_MERGE_START, \
}
+static void vma_set_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, pgoff_t pgoff)
+{
+ vma->vm_start = start;
+ vma->vm_end = end;
+ vma->vm_pgoff = pgoff;
+}
+
/* Was this VMA ever forked from a parent, i.e. maybe contains CoW mappings? */
static bool vma_is_fork_child(struct vm_area_struct *vma)
{
diff --git a/tools/testing/vma/shared.c b/tools/testing/vma/shared.c
index 2565a5aecb80..bea9ea6db02a 100644
--- a/tools/testing/vma/shared.c
+++ b/tools/testing/vma/shared.c
@@ -120,12 +120,3 @@ unsigned long rlimit(unsigned int limit)
{
return (unsigned long)-1;
}
-
-void vma_set_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- pgoff_t pgoff)
-{
- vma->vm_start = start;
- vma->vm_end = end;
- vma->vm_pgoff = pgoff;
-}
diff --git a/tools/testing/vma/shared.h b/tools/testing/vma/shared.h
index 8b9e3b11c3cb..ca4f1238f1c7 100644
--- a/tools/testing/vma/shared.h
+++ b/tools/testing/vma/shared.h
@@ -125,8 +125,3 @@ void __vma_set_dummy_anon_vma(struct vm_area_struct *vma,
/* Provide a simple dummy VMA/anon_vma dummy setup for testing. */
void vma_set_dummy_anon_vma(struct vm_area_struct *vma,
struct anon_vma_chain *avc);
-
-/* Helper function to specify a VMA's range. */
-void vma_set_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- pgoff_t pgoff);
--
2.54.0
^ permalink raw reply related
* [PATCH 22/30] mm/vma: move __install_special_mapping() to vma.c
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
This function is operating on VMAs and rightly belongs in vma.c, where it
can be subject to VMA userland testing and allows us to isolate it from the
rest of mm.
The _install_special_mapping() function will remain in mmap.c as a wrapper,
since this is used by architecture-specific code.
Doing so allows us to isolate more functions in vma.c for the same reasons.
This forms part of work to allow for tracking MAP_PRIVATE file-backed
mappings by their anonymous virtual page offset, as doing so allows us to
isolate and keep code that interacts with this together.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/mmap.c | 38 --------------------------------------
mm/vma.c | 38 ++++++++++++++++++++++++++++++++++++++
mm/vma.h | 5 +++++
3 files changed, 43 insertions(+), 38 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 2d09a57e3620..46174e706bbe 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1447,44 +1447,6 @@ static vm_fault_t special_mapping_fault(struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
}
-static struct vm_area_struct *__install_special_mapping(
- struct mm_struct *mm,
- unsigned long addr, unsigned long len,
- vm_flags_t vm_flags, void *priv,
- const struct vm_operations_struct *ops)
-{
- int ret;
- struct vm_area_struct *vma;
-
- vma = vm_area_alloc(mm);
- if (unlikely(vma == NULL))
- return ERR_PTR(-ENOMEM);
-
- vma_set_range(vma, addr, addr + len, 0);
- vm_flags |= mm->def_flags | VM_DONTEXPAND;
- if (pgtable_supports_soft_dirty())
- vm_flags |= VM_SOFTDIRTY;
- vm_flags_init(vma, vm_flags & ~VM_LOCKED_MASK);
- vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
-
- vma->vm_ops = ops;
- vma->vm_private_data = priv;
-
- ret = insert_vm_struct(mm, vma);
- if (ret)
- goto out;
-
- vm_stat_account(mm, vma->vm_flags, len >> PAGE_SHIFT);
-
- perf_event_mmap(vma);
-
- return vma;
-
-out:
- vm_area_free(vma);
- return ERR_PTR(ret);
-}
-
bool vma_is_special_mapping(const struct vm_area_struct *vma,
const struct vm_special_mapping *sm)
{
diff --git a/mm/vma.c b/mm/vma.c
index cb7222e20c93..f4de706a2728 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -3399,3 +3399,41 @@ __weak unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
{
return vma_kernel_pagesize(vma);
}
+
+struct vm_area_struct *__install_special_mapping(
+ struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ vm_flags_t vm_flags, void *priv,
+ const struct vm_operations_struct *ops)
+{
+ int ret;
+ struct vm_area_struct *vma;
+
+ vma = vm_area_alloc(mm);
+ if (unlikely(vma == NULL))
+ return ERR_PTR(-ENOMEM);
+
+ vma_set_range(vma, addr, addr + len, 0);
+ vm_flags |= mm->def_flags | VM_DONTEXPAND;
+ if (pgtable_supports_soft_dirty())
+ vm_flags |= VM_SOFTDIRTY;
+ vm_flags_init(vma, vm_flags & ~VM_LOCKED_MASK);
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+ vma->vm_ops = ops;
+ vma->vm_private_data = priv;
+
+ ret = insert_vm_struct(mm, vma);
+ if (ret)
+ goto out;
+
+ vm_stat_account(mm, vma->vm_flags, len >> PAGE_SHIFT);
+
+ perf_event_mmap(vma);
+
+ return vma;
+
+out:
+ vm_area_free(vma);
+ return ERR_PTR(ret);
+}
diff --git a/mm/vma.h b/mm/vma.h
index 47fe35e5307e..14f026bf3be4 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -775,4 +775,9 @@ static inline bool map_deny_write_exec(const vma_flags_t *old,
}
#endif
+struct vm_area_struct *__install_special_mapping(struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ vm_flags_t vm_flags, void *priv,
+ const struct vm_operations_struct *ops);
+
#endif /* __MM_VMA_H */
--
2.54.0
^ permalink raw reply related
* [PATCH 21/30] mm/vma: add and use vma_[add/sub]_pgoff()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Add helpers for adding or subtracting to a VMA's page offset, exposed
internally for VMA users within mm in mm/vma.h.
This is to lay the foundations for tracking anonymous page offset for
MAP_PRIVATE file-backed mappings, where adding and subtracting from this
value must be reflected in both the file and anonymous offsets.
These are used on VMA split and downward stack expansion.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/nommu.c | 6 ++++--
mm/vma.c | 6 +++---
mm/vma.h | 12 ++++++++++++
tools/testing/vma/include/dup.h | 13 ++++++++++++-
4 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/mm/nommu.c b/mm/nommu.c
index 7333d855e974..c7fafcd87c14 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -41,6 +41,7 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include "internal.h"
+#include "vma.h"
unsigned long highest_memmap_pfn;
int heap_stack_gap = 0;
@@ -1338,7 +1339,8 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
region->vm_top = region->vm_end = new->vm_end = addr;
} else {
region->vm_start = new->vm_start = addr;
- region->vm_pgoff = new->vm_pgoff += npages;
+ vma_add_pgoff(new, npages);
+ region->vm_pgoff = vma_start_pgoff(new);
}
vma_iter_config(vmi, new->vm_start, new->vm_end);
@@ -1355,7 +1357,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
delete_nommu_region(vma->vm_region);
if (new_below) {
vma->vm_region->vm_start = vma->vm_start = addr;
- vma->vm_pgoff += npages;
+ vma_add_pgoff(vma, npages);
vma->vm_region->vm_pgoff = vma_start_pgoff(vma);
} else {
vma->vm_region->vm_end = vma->vm_end = addr;
diff --git a/mm/vma.c b/mm/vma.c
index 185d07397ca6..cb7222e20c93 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -517,7 +517,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
new->vm_end = addr;
} else {
new->vm_start = addr;
- new->vm_pgoff += linear_page_delta(vma, addr);
+ vma_add_pgoff(new, linear_page_delta(vma, addr));
}
err = -ENOMEM;
@@ -556,7 +556,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (new_below) {
vma->vm_start = addr;
- vma->vm_pgoff += (addr - new->vm_start) >> PAGE_SHIFT;
+ vma_add_pgoff(vma, (addr - new->vm_start) >> PAGE_SHIFT);
} else {
vma->vm_end = addr;
}
@@ -3305,7 +3305,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
vm_stat_account(mm, vma->vm_flags, grow);
anon_vma_interval_tree_pre_update_vma(vma);
vma->vm_start = address;
- vma->vm_pgoff -= grow;
+ vma_sub_pgoff(vma, grow);
/* Overwrite old entry in mtree. */
vma_iter_store_overwrite(&vmi, vma);
anon_vma_interval_tree_post_update_vma(vma);
diff --git a/mm/vma.h b/mm/vma.h
index 2342516ce00e..47fe35e5307e 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -247,6 +247,18 @@ static inline pgoff_t vmg_end_pgoff(const struct vma_merge_struct *vmg)
return vmg_start_pgoff(vmg) + vmg_pages(vmg);
}
+static inline void vma_add_pgoff(struct vm_area_struct *vma, pgoff_t delta)
+{
+ vma_assert_can_modify(vma);
+ vma->vm_pgoff += delta;
+}
+
+static inline void vma_sub_pgoff(struct vm_area_struct *vma, pgoff_t delta)
+{
+ vma_assert_can_modify(vma);
+ vma->vm_pgoff -= delta;
+}
+
#define VMG_STATE(name, mm_, vmi_, start_, end_, vma_flags_, pgoff_) \
struct vma_merge_struct name = { \
.mm = mm_, \
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 7ed165c8d9bc..41fea90a344d 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -1163,6 +1163,11 @@ static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi)
return mas_find(&vmi->mas, ULONG_MAX);
}
+static inline bool vma_is_attached(struct vm_area_struct *vma)
+{
+ return refcount_read(&vma->vm_refcnt);
+}
+
/*
* WARNING: to avoid racing with vma_mark_attached()/vma_mark_detached(), these
* assertions should be made either under mmap_write_lock or when the object
@@ -1170,7 +1175,13 @@ static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi)
*/
static inline void vma_assert_attached(struct vm_area_struct *vma)
{
- WARN_ON_ONCE(!refcount_read(&vma->vm_refcnt));
+ WARN_ON_ONCE(!vma_is_attached(vma));
+}
+
+static inline void vma_assert_can_modify(struct vm_area_struct *vma)
+{
+ if (vma_is_attached(vma))
+ vma_assert_write_locked(vma);
}
static inline void vma_assert_detached(struct vm_area_struct *vma)
--
2.54.0
^ permalink raw reply related
* [PATCH 20/30] mm/vma: introduce vma_assert_can_modify()
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
vma_assert_write_locked() and vma_assert_attached() are useful for their
own purposes, however VMA code absolutely does allow the modification of
non-write locked VMAs if they are at that point detached (i.e. unreachable
from anywhere).
It's therefore useful to be able to assert that a VMA is either
detached (modification doesn't matter) or write locked (you're explicitly
locked for modification).
Therefore introduce vma_assert_can_modify() for this purpose.
While we're here, make vma_is_attached() available generally - if
!CONFIG_PER_VMA_LOCKS, then there's no sense in which a VMA is
detached (vma_mark_detached() is a noop), so have this default to true in
this case.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/mmap_lock.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 04b8f61ece5d..d513286d8160 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -506,6 +506,8 @@ static inline __must_check
int vma_start_write_killable(struct vm_area_struct *vma) { return 0; }
static inline void vma_assert_write_locked(struct vm_area_struct *vma)
{ mmap_assert_write_locked(vma->vm_mm); }
+static inline bool vma_is_attached(struct vm_area_struct *vma)
+ { return true; }
static inline void vma_assert_attached(struct vm_area_struct *vma) {}
static inline void vma_assert_detached(struct vm_area_struct *vma) {}
static inline void vma_mark_attached(struct vm_area_struct *vma) {}
@@ -530,6 +532,12 @@ static inline void vma_assert_stabilised(struct vm_area_struct *vma)
#endif /* CONFIG_PER_VMA_LOCK */
+static inline void vma_assert_can_modify(struct vm_area_struct *vma)
+{
+ if (vma_is_attached(vma))
+ vma_assert_write_locked(vma);
+}
+
static inline void mmap_write_lock(struct mm_struct *mm)
{
__mmap_lock_trace_start_locking(mm, true);
--
2.54.0
^ permalink raw reply related
* [PATCH 19/30] mm: use linear_page_[index, delta]() consistently
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
There are a number of places where we open code what linear_page_index()
and linear_page_delta() calculate.
Replace this code with the appropriate functions for consistency.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
arch/arm/mm/fault-armv.c | 2 +-
arch/x86/kernel/cpu/sgx/virt.c | 3 ++-
drivers/comedi/comedi_fops.c | 3 ++-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 3 ++-
drivers/gpu/drm/gma500/gem.c | 2 +-
drivers/gpu/drm/msm/msm_gem.c | 3 ++-
drivers/gpu/drm/omapdrm/omap_gem.c | 5 +++--
drivers/gpu/drm/tegra/gem.c | 3 ++-
drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 ++++---
drivers/vfio/pci/nvgrace-gpu/main.c | 3 ++-
drivers/vfio/pci/vfio_pci_core.c | 3 ++-
mm/nommu.c | 2 +-
mm/vma.c | 2 +-
virt/kvm/guest_memfd.c | 2 +-
14 files changed, 26 insertions(+), 17 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index bd1ad4181a53..306cfd7b0765 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -132,7 +132,7 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
pgoff_t pgoff;
int aliases = 0;
- pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ pgoff = linear_page_index(vma, addr);
/*
* If we have any shared mappings that are in the same mm
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index db6806c40483..6a1933ddc6fc 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -9,6 +9,7 @@
#include <linux/miscdevice.h>
#include <linux/mm.h>
#include <linux/mman.h>
+#include <linux/pagemap.h>
#include <linux/sched/mm.h>
#include <linux/sched/signal.h>
#include <linux/slab.h>
@@ -41,7 +42,7 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
WARN_ON(!mutex_is_locked(&vepc->lock));
/* Calculate index of EPC page in virtual EPC's page_array */
- index = vma->vm_pgoff + PFN_DOWN(addr - vma->vm_start);
+ index = linear_page_index(vma, addr);
epc_page = xa_load(&vepc->page_array, index);
if (epc_page)
diff --git a/drivers/comedi/comedi_fops.c b/drivers/comedi/comedi_fops.c
index c09bbe04be6c..536c25d8dcee 100644
--- a/drivers/comedi/comedi_fops.c
+++ b/drivers/comedi/comedi_fops.c
@@ -25,6 +25,7 @@
#include <linux/fs.h>
#include <linux/comedi/comedidev.h>
#include <linux/cdev.h>
+#include <linux/pagemap.h>
#include <linux/io.h>
#include <linux/uaccess.h>
@@ -2462,7 +2463,7 @@ static int comedi_vm_access(struct vm_area_struct *vma, unsigned long addr,
{
struct comedi_buf_map *bm = vma->vm_private_data;
unsigned long offset =
- addr - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT);
+ addr - vma->vm_start + (vma_start_pgoff(vma) << PAGE_SHIFT);
if (len < 0)
return -EINVAL;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index b0436a1e103f..2e4d6d117ee2 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -6,6 +6,7 @@
#include <drm/drm_prime.h>
#include <drm/drm_print.h>
#include <linux/dma-mapping.h>
+#include <linux/pagemap.h>
#include <linux/shmem_fs.h>
#include <linux/spinlock.h>
#include <linux/vmalloc.h>
@@ -188,7 +189,7 @@ static vm_fault_t etnaviv_gem_fault(struct vm_fault *vmf)
}
/* We don't use vmf->pgoff since that has the fake offset: */
- pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ pgoff = linear_page_delta(vma, vmf->address);
pfn = page_to_pfn(pages[pgoff]);
diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
index 88f1e86c8903..2708e8c68f4c 100644
--- a/drivers/gpu/drm/gma500/gem.c
+++ b/drivers/gpu/drm/gma500/gem.c
@@ -288,7 +288,7 @@ static vm_fault_t psb_gem_fault(struct vm_fault *vmf)
/* Page relative to the VMA start - we must calculate this ourselves
because vmf->pgoff is the fake GEM offset */
- page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ page_offset = linear_page_delta(vma, vmf->address);
/* CPU view of the page, don't go via the GART for CPU writes */
if (pobj->stolen)
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index efd3d3c9a449..cbf723a5d86f 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -9,6 +9,7 @@
#include <linux/spinlock.h>
#include <linux/shmem_fs.h>
#include <linux/dma-buf.h>
+#include <linux/pagemap.h>
#include <drm/drm_dumb_buffers.h>
#include <drm/drm_prime.h>
@@ -360,7 +361,7 @@ static vm_fault_t msm_gem_fault(struct vm_fault *vmf)
}
/* We don't use vmf->pgoff since that has the fake offset: */
- pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ pgoff = linear_page_delta(vma, vmf->address);
pfn = page_to_pfn(pages[pgoff]);
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 8e013e4f2c6b..00404fb6c29a 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -5,6 +5,7 @@
*/
#include <linux/dma-mapping.h>
+#include <linux/pagemap.h>
#include <linux/seq_file.h>
#include <linux/shmem_fs.h>
#include <linux/spinlock.h>
@@ -359,7 +360,7 @@ static vm_fault_t omap_gem_fault_1d(struct drm_gem_object *obj,
pgoff_t pgoff;
/* We don't use vmf->pgoff since that has the fake offset: */
- pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ pgoff = linear_page_delta(vma, vmf->address);
if (omap_obj->pages) {
omap_gem_cpu_sync_page(obj, pgoff);
@@ -407,7 +408,7 @@ static vm_fault_t omap_gem_fault_2d(struct drm_gem_object *obj,
const int m = DIV_ROUND_UP(omap_obj->width << fmt, PAGE_SIZE);
/* We don't use vmf->pgoff since that has the fake offset: */
- pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ pgoff = linear_page_delta(vma, vmf->address);
/*
* Actual address we start mapping at is rounded down to previous slot
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 436394e04812..1d8d27a5ea89 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -13,6 +13,7 @@
#include <linux/dma-buf.h>
#include <linux/iommu.h>
#include <linux/module.h>
+#include <linux/pagemap.h>
#include <linux/vmalloc.h>
#include <drm/drm_drv.h>
@@ -564,7 +565,7 @@ static vm_fault_t tegra_bo_fault(struct vm_fault *vmf)
if (!bo->pages)
return VM_FAULT_SIGBUS;
- offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ offset = linear_page_delta(vma, vmf->address);
page = bo->pages[offset];
return vmf_insert_page(vma, vmf->address, page);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index a80510489c45..88babf435ac2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -32,6 +32,7 @@
#define pr_fmt(fmt) "[TTM] " fmt
#include <linux/export.h>
+#include <linux/pagemap.h>
#include <drm/ttm/ttm_bo.h>
#include <drm/ttm/ttm_placement.h>
@@ -208,9 +209,9 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
if (unlikely(err != 0))
return VM_FAULT_SIGBUS;
- page_offset = ((address - vma->vm_start) >> PAGE_SHIFT) +
- vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node);
- page_last = vma_pages(vma) + vma->vm_pgoff -
+ page_offset = linear_page_index(vma, address) -
+ drm_vma_node_start(&bo->base.vma_node);
+ page_last = vma_end_pgoff(vma) -
drm_vma_node_start(&bo->base.vma_node);
if (unlikely(page_offset >= PFN_UP(bo->base.size)))
diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index d07dcacb76bd..963fd8ded20d 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -11,6 +11,7 @@
#include <linux/jiffies.h>
#include <linux/sched.h>
#include <linux/pci-p2pdma.h>
+#include <linux/pagemap.h>
#include <linux/pm_runtime.h>
#include <linux/memory-failure.h>
@@ -385,7 +386,7 @@ static unsigned long addr_to_pgoff(struct vm_area_struct *vma,
u64 pgoff = vma->vm_pgoff &
((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
- return ((addr - vma->vm_start) >> PAGE_SHIFT) + pgoff;
+ return linear_page_delta(vma, addr) + pgoff;
}
static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a28f1e99362c..55d4937d495a 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -19,6 +19,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/notifier.h>
+#include <linux/pagemap.h>
#include <linux/pci.h>
#include <linux/pm_runtime.h>
#include <linux/slab.h>
@@ -1727,7 +1728,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
struct vm_area_struct *vma = vmf->vma;
struct vfio_pci_core_device *vdev = vma->vm_private_data;
unsigned long addr = vmf->address & ~((PAGE_SIZE << order) - 1);
- unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
+ unsigned long pgoff = linear_page_delta(vma, addr);
unsigned long pfn = vma_to_pfn(vma) + pgoff;
vm_fault_t ret = VM_FAULT_FALLBACK;
diff --git a/mm/nommu.c b/mm/nommu.c
index 60560b2c457e..7333d855e974 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1332,7 +1332,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
*region = *vma->vm_region;
new->vm_region = region;
- npages = (addr - vma->vm_start) >> PAGE_SHIFT;
+ npages = linear_page_delta(vma, addr);
if (new_below) {
region->vm_top = region->vm_end = new->vm_end = addr;
diff --git a/mm/vma.c b/mm/vma.c
index ee3a8ca13d07..185d07397ca6 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -517,7 +517,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
new->vm_end = addr;
} else {
new->vm_start = addr;
- new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+ new->vm_pgoff += linear_page_delta(vma, addr);
}
err = -ENOMEM;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index db57c5766ab6..f0e5da490866 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -440,7 +440,7 @@ static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpo
static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
unsigned long addr, pgoff_t *ilx)
{
- pgoff_t pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ pgoff_t pgoff = linear_page_index(vma, addr);
struct inode *inode = file_inode(vma->vm_file);
*ilx = inode->i_ino;
--
2.54.0
^ permalink raw reply related
* [PATCH 18/30] mm/vma: remove duplicative vma_pgoff_offset() helper
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
This is doing what linear_page_index() does, so eliminate it and replace it
with linear_page_index().
Update the VMA userland tests to reflect this change.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/vma.h | 9 +--------
tools/testing/vma/include/dup.h | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/mm/vma.h b/mm/vma.h
index 527716c8739d..2342516ce00e 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -247,13 +247,6 @@ static inline pgoff_t vmg_end_pgoff(const struct vma_merge_struct *vmg)
return vmg_start_pgoff(vmg) + vmg_pages(vmg);
}
-/* Assumes addr >= vma->vm_start. */
-static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
- unsigned long addr)
-{
- return vma->vm_pgoff + PHYS_PFN(addr - vma->vm_start);
-}
-
#define VMG_STATE(name, mm_, vmi_, start_, end_, vma_flags_, pgoff_) \
struct vma_merge_struct name = { \
.mm = mm_, \
@@ -275,7 +268,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
.start = start_, \
.end = end_, \
.vm_flags = vma_->vm_flags, \
- .pgoff = vma_pgoff_offset(vma_, start_), \
+ .pgoff = linear_page_index(vma_, start_), \
.file = vma_->vm_file, \
.anon_vma = vma_->anon_vma, \
.policy = vma_policy(vma_), \
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 535747d7fee4..7ed165c8d9bc 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -1548,3 +1548,19 @@ static inline pgprot_t vma_get_page_prot(vma_flags_t vma_flags)
return vm_get_page_prot(vm_flags);
}
+
+static inline pgoff_t linear_page_delta(const struct vm_area_struct *vma,
+ const unsigned long address)
+{
+ return (address - vma->vm_start) >> PAGE_SHIFT;
+}
+
+static inline pgoff_t linear_page_index(const struct vm_area_struct *vma,
+ const unsigned long address)
+{
+ pgoff_t pgoff;
+
+ pgoff = linear_page_delta(vma, address);
+ pgoff += vma_start_pgoff(vma);
+ return pgoff;
+}
--
2.54.0
^ permalink raw reply related
* [PATCH 17/30] mm: prefer vma_[start,end]_pgoff() to vma->vm_pgoff in kernel/
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
Be consistent in using vma_start_pgoff() and vma_end_pgoff(), which clearly
indicates which part of the VMA the page offset refers to and aids
greppability.
This is part of a broader series laying the ground to provide a virtual
page offset for MAP_PRIVATE-file backed anon folios.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
kernel/dma/coherent.c | 7 ++++---
kernel/dma/direct.c | 6 ++++--
kernel/dma/mapping.c | 8 +++++---
kernel/dma/ops_helpers.c | 4 ++--
kernel/events/core.c | 20 +++++++++++---------
kernel/events/uprobes.c | 11 +++++++----
kernel/kcov.c | 2 +-
kernel/trace/ring_buffer.c | 3 ++-
8 files changed, 36 insertions(+), 25 deletions(-)
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index bcdc0f76d2e8..2d3195eb7e83 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -236,14 +236,15 @@ static int __dma_mmap_from_coherent(struct dma_coherent_mem *mem,
{
if (mem && vaddr >= mem->virt_base && vaddr + size <=
(mem->virt_base + ((dma_addr_t)mem->size << PAGE_SHIFT))) {
- unsigned long off = vma->vm_pgoff;
+ const pgoff_t pgoff_start = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
int start = (vaddr - mem->virt_base) >> PAGE_SHIFT;
unsigned long user_count = vma_pages(vma);
int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
*ret = -ENXIO;
- if (off < count && user_count <= count - off) {
- unsigned long pfn = mem->pfn_base + start + off;
+ if (pgoff_start < count && pgoff_end <= count) {
+ unsigned long pfn = mem->pfn_base + start + pgoff_start;
*ret = remap_pfn_range(vma, vma->vm_start, pfn,
user_count << PAGE_SHIFT,
vma->vm_page_prot);
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4391b797d4db..436310d6e4a2 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -534,6 +534,8 @@ int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
unsigned long user_count = vma_pages(vma);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
unsigned long pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));
+ const pgoff_t pgoff_start = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
int ret = -ENXIO;
vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
@@ -545,9 +547,9 @@ int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
if (dma_mmap_from_global_coherent(vma, cpu_addr, size, &ret))
return ret;
- if (vma->vm_pgoff >= count || user_count > count - vma->vm_pgoff)
+ if (pgoff_start >= count || pgoff_end > count)
return -ENXIO;
- return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+ return remap_pfn_range(vma, vma->vm_start, pfn + pgoff_start,
user_count << PAGE_SHIFT, vma->vm_page_prot);
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4fe04669e5e6..c986639044e9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -761,12 +761,14 @@ EXPORT_SYMBOL_GPL(dma_free_pages);
int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
size_t size, struct page *page)
{
- unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+ const pgoff_t pgoff_start = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
+ const unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
- if (vma->vm_pgoff >= count || vma_pages(vma) > count - vma->vm_pgoff)
+ if (pgoff_start >= count || pgoff_end > count)
return -ENXIO;
return remap_pfn_range(vma, vma->vm_start,
- page_to_pfn(page) + vma->vm_pgoff,
+ page_to_pfn(page) + pgoff_start,
vma_pages(vma) << PAGE_SHIFT, vma->vm_page_prot);
}
EXPORT_SYMBOL_GPL(dma_mmap_pages);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 20caf9cabf69..6b5f9208d31c 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -39,7 +39,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
#ifdef CONFIG_MMU
unsigned long user_count = vma_pages(vma);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
- unsigned long off = vma->vm_pgoff;
+ unsigned long off = vma_start_pgoff(vma);
struct page *page = dma_common_vaddr_to_page(cpu_addr);
int ret = -ENXIO;
@@ -52,7 +52,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
return -ENXIO;
return remap_pfn_range(vma, vma->vm_start,
- page_to_pfn(page) + vma->vm_pgoff,
+ page_to_pfn(page) + vma_start_pgoff(vma),
user_count << PAGE_SHIFT, vma->vm_page_prot);
#else
return -ENXIO;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 954c36e28101..d6d2d557ccb8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6998,7 +6998,7 @@ static void perf_mmap_open(struct vm_area_struct *vma)
refcount_inc(&event->mmap_count);
refcount_inc(&event->rb->mmap_count);
- if (vma->vm_pgoff)
+ if (vma_start_pgoff(vma))
refcount_inc(&event->rb->aux_mmap_count);
if (mapped)
@@ -7032,7 +7032,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
* The AUX buffer is strictly a sub-buffer, serialize using aux_mutex
* to avoid complications.
*/
- if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
+ if (rb_has_aux(rb) && vma_start_pgoff(vma) == rb->aux_pgoff &&
refcount_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
/*
* Stop all AUX events that are writing to this buffer,
@@ -7190,7 +7190,8 @@ static int map_range(struct perf_buffer *rb, struct vm_area_struct *vma)
*/
for (pagenum = 0; pagenum < nr_pages; pagenum++) {
unsigned long va = vma->vm_start + PAGE_SIZE * pagenum;
- struct page *page = perf_mmap_to_page(rb, vma->vm_pgoff + pagenum);
+ struct page *page = perf_mmap_to_page(rb,
+ vma_start_pgoff(vma) + pagenum);
if (page == NULL) {
err = -EINVAL;
@@ -7348,6 +7349,7 @@ static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
u64 aux_offset, aux_size;
struct perf_buffer *rb;
int ret, rb_flags = 0;
+ const pgoff_t pgoff_start = vma_start_pgoff(vma);
rb = event->rb;
if (!rb)
@@ -7366,11 +7368,11 @@ static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
return -EINVAL;
- if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
+ if (aux_offset != pgoff_start << PAGE_SHIFT)
return -EINVAL;
/* already mapped with a different offset */
- if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
+ if (rb_has_aux(rb) && rb->aux_pgoff != pgoff_start)
return -EINVAL;
if (aux_size != nr_pages * PAGE_SIZE)
@@ -7400,7 +7402,7 @@ static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
if (vma->vm_flags & VM_WRITE)
rb_flags |= RING_BUFFER_WRITABLE;
- ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
+ ret = rb_alloc_aux(rb, event, pgoff_start, nr_pages,
event->attr.aux_watermark, rb_flags);
if (ret) {
refcount_dec(&rb->mmap_count);
@@ -7457,7 +7459,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (event->state <= PERF_EVENT_STATE_REVOKED)
return -ENODEV;
- if (vma->vm_pgoff == 0)
+ if (!vma_start_pgoff(vma))
ret = perf_mmap_rb(vma, event, nr_pages);
else
ret = perf_mmap_aux(vma, event, nr_pages);
@@ -9884,7 +9886,7 @@ static bool perf_addr_filter_vma_adjust(struct perf_addr_filter *filter,
struct perf_addr_filter_range *fr)
{
unsigned long vma_size = vma->vm_end - vma->vm_start;
- unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+ unsigned long off = vma_start_pgoff(vma) << PAGE_SHIFT;
struct file *file = vma->vm_file;
if (!perf_addr_filter_match(filter, file, off, vma_size))
@@ -9974,7 +9976,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
/* .tid */
.start = vma->vm_start,
.len = vma->vm_end - vma->vm_start,
- .pgoff = (u64)vma->vm_pgoff << PAGE_SHIFT,
+ .pgoff = (u64)vma_start_pgoff(vma) << PAGE_SHIFT,
},
/* .maj (attr_mmap2 only) */
/* .min (attr_mmap2 only) */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f23cebacbc6d..244651380ca1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -144,12 +144,14 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
{
- return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ return vma->vm_start + offset -
+ ((loff_t)vma_start_pgoff(vma) << PAGE_SHIFT);
}
static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
{
- return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
+ return ((loff_t)vma_start_pgoff(vma) << PAGE_SHIFT) +
+ (vaddr - vma->vm_start);
}
/**
@@ -1482,7 +1484,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
file_inode(vma->vm_file) != uprobe->inode)
continue;
- offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+ offset = (loff_t)vma_start_pgoff(vma) << PAGE_SHIFT;
if (uprobe->offset < offset ||
uprobe->offset >= offset + vma->vm_end - vma->vm_start)
continue;
@@ -2453,7 +2455,8 @@ static struct uprobe *find_active_uprobe_speculative(unsigned long bp_vaddr)
if (!vm_file)
return NULL;
- offset = (loff_t)(vma->vm_pgoff << PAGE_SHIFT) + (bp_vaddr - vma->vm_start);
+ offset = (loff_t)(vma_start_pgoff(vma) << PAGE_SHIFT) +
+ (bp_vaddr - vma->vm_start);
uprobe = find_uprobe_rcu(vm_file->f_inode, offset);
if (!uprobe)
return NULL;
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 1df373fb562b..b19b473c366a 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -512,7 +512,7 @@ static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
spin_lock_irqsave(&kcov->lock, flags);
size = kcov->size * sizeof(unsigned long);
- if (kcov->area == NULL || vma->vm_pgoff != 0 ||
+ if (kcov->area == NULL || vma_start_pgoff(vma) ||
vma->vm_end - vma->vm_start != size) {
res = -EINVAL;
goto exit;
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 56a328e94395..dfa493d54ef9 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7613,7 +7613,8 @@ static int __rb_inc_dec_mapped(struct ring_buffer_per_cpu *cpu_buffer,
static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
struct vm_area_struct *vma)
{
- unsigned long nr_subbufs, nr_pages, nr_vma_pages, pgoff = vma->vm_pgoff;
+ unsigned long nr_subbufs, nr_pages, nr_vma_pages;
+ pgoff_t pgoff = vma_start_pgoff(vma);
unsigned int subbuf_pages, subbuf_order;
struct page **pages __free(kfree) = NULL;
int p = 0, s = 0;
--
2.54.0
^ permalink raw reply related
* [PATCH 16/30] mm/vma: use vma_start_pgoff(), linear_page_index() in mm code
From: Lorenzo Stoakes @ 2026-06-29 12:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Russell King, Dinh Nguyen, Simon Schuster,
James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
Thierry Reding, Mikko Perttunen, Jonathan Hunter,
Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
Harry Yoo, Jann Horn
In-Reply-To: <cover.1782735110.git.ljs@kernel.org>
There are many instances in which linear_page_index() (as well as
linear_page_delta()) is open-coded, which is confusing and inconsistent.
Additionally, vma->vm_pgoff doesn't necessarily make it clear that this is
the page offset of the start of the VMA range.
Doing so also aids greppability.
So use vma_start_pgoff() in favour of directly accessing vma->vm_pgoff, and
linear_page_index() where we can.
This also lays the ground for future changes which will add an anonymous
page offset in order to be able to index MAP_PRIVATE-file backed anon
folios in terms of their virtual page offset.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
include/linux/huge_mm.h | 1 +
include/linux/hugetlb.h | 3 +--
include/linux/pagemap.h | 2 +-
mm/damon/vaddr.c | 5 +++--
mm/debug.c | 2 +-
mm/filemap.c | 7 ++++---
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 11 ++++-------
mm/internal.h | 24 ++++++++++++++----------
mm/khugepaged.c | 3 ++-
mm/madvise.c | 6 +++---
mm/mapping_dirty_helpers.c | 2 +-
mm/memory.c | 25 +++++++++++++------------
mm/mempolicy.c | 13 +++++++------
mm/mremap.c | 12 ++++--------
mm/msync.c | 4 ++--
mm/nommu.c | 7 ++++---
mm/pagewalk.c | 2 +-
mm/shmem.c | 9 +++++----
mm/userfaultfd.c | 4 ++--
mm/util.c | 4 ++--
mm/vma.c | 15 +++++++--------
mm/vma_exec.c | 4 ++--
mm/vma_init.c | 2 +-
24 files changed, 86 insertions(+), 83 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ad20f7f8c179..653b81d08fe7 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -230,6 +230,7 @@ static inline bool thp_vma_suitable_order(struct vm_area_struct *vma,
/* Don't have to check pgoff for anonymous vma */
if (!vma_is_anonymous(vma)) {
+ /* vma_start_pgoff() in mm.h so not available. */
if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
hpage_size >> PAGE_SHIFT))
return false;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2abaf99321e9..8390f50604d6 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -792,8 +792,7 @@ static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
{
struct hstate *h = hstate_vma(vma);
- return ((address - vma->vm_start) >> huge_page_shift(h)) +
- (vma->vm_pgoff >> huge_page_order(h));
+ return linear_page_index(vma, address) >> huge_page_order(h);
}
static inline bool order_is_gigantic(unsigned int order)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 644c0f25ae73..68a88d34a468 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1101,7 +1101,7 @@ static inline pgoff_t linear_page_index(const struct vm_area_struct *vma,
pgoff_t pgoff;
pgoff = linear_page_delta(vma, address);
- pgoff += vma->vm_pgoff;
+ pgoff += vma_start_pgoff(vma);
return pgoff;
}
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index d27147603564..faa44aa3219b 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -12,6 +12,7 @@
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
#include <linux/page_idle.h>
+#include <linux/pagemap.h>
#include <linux/pagewalk.h>
#include <linux/sched/mm.h>
@@ -627,8 +628,8 @@ static void damos_va_migrate_dests_add(struct folio *folio,
}
order = folio_order(folio);
- ilx = vma->vm_pgoff >> order;
- ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
+ ilx = vma_start_pgoff(vma) >> order;
+ ilx += linear_page_delta(vma, addr) >> order;
for (i = 0; i < dests->nr_dests; i++)
weight_total += dests->weight_arr[i];
diff --git a/mm/debug.c b/mm/debug.c
index 77fa8fe1d641..497654b36f1a 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -163,7 +163,7 @@ void dump_vma(const struct vm_area_struct *vma)
"flags: %#lx(%pGv)\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
(unsigned long)pgprot_val(vma->vm_page_prot),
- vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
+ vma->anon_vma, vma->vm_ops, vma_start_pgoff(vma),
vma->vm_file, vma->vm_private_data,
#ifdef CONFIG_PER_VMA_LOCK
refcount_read(&vma->vm_refcnt),
diff --git a/mm/filemap.c b/mm/filemap.c
index 5af62e6abca5..bcb07b21a685 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3402,8 +3402,8 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
* of memory.
*/
struct vm_area_struct *vma = vmf->vma;
- unsigned long start = vma->vm_pgoff;
- unsigned long end = start + vma_pages(vma);
+ const unsigned long start = vma_start_pgoff(vma);
+ const unsigned long end = vma_end_pgoff(vma);
unsigned long ra_end;
ra->order = exec_folio_order();
@@ -3921,7 +3921,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
goto out;
}
- addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ addr = vma->vm_start +
+ ((start_pgoff - vma_start_pgoff(vma)) << PAGE_SHIFT);
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
if (!vmf->pte) {
folio_unlock(folio);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2bccb0a53a0a..e94f56487225 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -180,7 +180,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
*/
if (!in_pf && shmem_file(vma->vm_file))
return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
- vma, vma->vm_pgoff, 0,
+ vma, vma_start_pgoff(vma), 0,
forced_collapse);
if (!vma_is_anonymous(vma)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f45000149a78..d44a3ac5ee0a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1011,8 +1011,7 @@ static long region_count(struct resv_map *resv, long f, long t)
static pgoff_t vma_hugecache_offset(struct hstate *h,
struct vm_area_struct *vma, unsigned long address)
{
- return ((address - vma->vm_start) >> huge_page_shift(h)) +
- (vma->vm_pgoff >> huge_page_order(h));
+ return linear_page_index(vma, address) >> huge_page_order(h);
}
/*
@@ -5372,8 +5371,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* from page cache lookup which is in HPAGE_SIZE units.
*/
address = address & huge_page_mask(h);
- pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) +
- vma->vm_pgoff;
+ pgoff = linear_page_index(vma, address);
mapping = vma->vm_file->f_mapping;
/*
@@ -6771,7 +6769,7 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
struct vm_area_struct *vma,
unsigned long addr, pgoff_t idx)
{
- unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) +
+ unsigned long saddr = ((idx - vma_start_pgoff(svma)) << PAGE_SHIFT) +
svma->vm_start;
unsigned long sbase = saddr & PUD_MASK;
unsigned long s_end = sbase + PUD_SIZE;
@@ -6856,8 +6854,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pud_t *pud)
{
struct address_space *mapping = vma->vm_file->f_mapping;
- pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
- vma->vm_pgoff;
+ const pgoff_t idx = linear_page_index(vma, addr);
struct vm_area_struct *svma;
unsigned long saddr;
pte_t *spte = NULL;
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a2..89e5b7efe256 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1143,26 +1143,28 @@ static inline bool
folio_within_range(struct folio *folio, struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- pgoff_t pgoff, addr;
- unsigned long vma_pglen = vma_pages(vma);
+ const unsigned long vma_pglen = vma_pages(vma);
+ pgoff_t pgoff_folio, pgoff_vma_start;
+ unsigned long addr;
VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio);
if (start > end)
return false;
+ pgoff_folio = folio_pgoff(folio);
+ pgoff_vma_start = vma_start_pgoff(vma);
+
if (start < vma->vm_start)
start = vma->vm_start;
if (end > vma->vm_end)
end = vma->vm_end;
- pgoff = folio_pgoff(folio);
-
/* if folio start address is not in vma range */
- if (!in_range(pgoff, vma->vm_pgoff, vma_pglen))
+ if (!in_range(pgoff_folio, pgoff_vma_start, vma_pglen))
return false;
- addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ addr = vma->vm_start + ((pgoff_folio - pgoff_vma_start) << PAGE_SHIFT);
return !(addr < start || end - addr < folio_size(folio));
}
@@ -1234,15 +1236,16 @@ extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
static inline unsigned long vma_address(const struct vm_area_struct *vma,
pgoff_t pgoff, unsigned long nr_pages)
{
+ const pgoff_t pgoff_start = vma_start_pgoff(vma);
unsigned long address;
- if (pgoff >= vma->vm_pgoff) {
+ if (pgoff >= pgoff_start) {
address = vma->vm_start +
- ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ ((pgoff - pgoff_start) << PAGE_SHIFT);
/* Check for address beyond vma (or wrapped through 0?) */
if (address < vma->vm_start || address >= vma->vm_end)
address = -EFAULT;
- } else if (pgoff + nr_pages - 1 >= vma->vm_pgoff) {
+ } else if (pgoff + nr_pages - 1 >= pgoff_start) {
/* Test above avoids possibility of wrap to 0 on 32-bit */
address = vma->vm_start;
} else {
@@ -1266,7 +1269,8 @@ static inline unsigned long vma_address_end(struct page_vma_mapped_walk *pvmw)
return pvmw->address + PAGE_SIZE;
pgoff = pvmw->pgoff + pvmw->nr_pages;
- address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ address = vma->vm_start +
+ ((pgoff - vma_start_pgoff(vma)) << PAGE_SHIFT);
/* Check for address beyond vma (or wrapped through 0?) */
if (address < vma->vm_start || address > vma->vm_end)
address = vma->vm_end;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index bd5f86cf4bd8..ffef738d826c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2145,7 +2145,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
spinlock_t *ptl;
bool success = false;
- addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ addr = vma->vm_start +
+ ((pgoff - vma_start_pgoff(vma)) << PAGE_SHIFT);
if (addr & ~HPAGE_PMD_MASK ||
vma->vm_end < addr + HPAGE_PMD_SIZE)
continue;
diff --git a/mm/madvise.c b/mm/madvise.c
index cd9bb077072c..6730c4200a93 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -253,7 +253,7 @@ static void shmem_swapin_range(struct vm_area_struct *vma,
continue;
addr = vma->vm_start +
- ((xas.xa_index - vma->vm_pgoff) << PAGE_SHIFT);
+ ((xas.xa_index - vma_start_pgoff(vma)) << PAGE_SHIFT);
xas_pause(&xas);
rcu_read_unlock();
@@ -318,7 +318,7 @@ static long madvise_willneed(struct madvise_behavior *madv_behavior)
mark_mmap_lock_dropped(madv_behavior);
get_file(file);
offset = (loff_t)(start - vma->vm_start)
- + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ + ((loff_t)vma_start_pgoff(vma) << PAGE_SHIFT);
mmap_read_unlock(mm);
vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
fput(file);
@@ -1023,7 +1023,7 @@ static long madvise_remove(struct madvise_behavior *madv_behavior)
return -EACCES;
offset = (loff_t)(start - vma->vm_start)
- + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ + ((loff_t)vma_start_pgoff(vma) << PAGE_SHIFT);
/*
* Filesystem's fallocate may need to take i_rwsem. We need to
diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c
index 737c407f4081..e0efa36e0a07 100644
--- a/mm/mapping_dirty_helpers.c
+++ b/mm/mapping_dirty_helpers.c
@@ -95,7 +95,7 @@ static int clean_record_pte(pte_t *pte, unsigned long addr,
if (pte_dirty(ptent)) {
pgoff_t pgoff = ((addr - walk->vma->vm_start) >> PAGE_SHIFT) +
- walk->vma->vm_pgoff - cwalk->bitmap_pgoff;
+ vma_start_pgoff(walk->vma) - cwalk->bitmap_pgoff;
pte_t old_pte = ptep_modify_prot_start(walk->vma, addr, pte);
ptent = pte_mkclean(old_pte);
diff --git a/mm/memory.c b/mm/memory.c
index 98c1a245f45a..f5eb06544ba4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -725,10 +725,10 @@ static inline struct page *__vm_normal_page(struct vm_area_struct *vma,
if (!pfn_valid(pfn))
return NULL;
} else {
- unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
+ const pgoff_t index = linear_page_index(vma, addr);
/* Only CoW'ed anon folios are "normal". */
- if (pfn == vma->vm_pgoff + off)
+ if (pfn == index)
return NULL;
if (!is_cow_mapping(vma->vm_flags))
return NULL;
@@ -2643,7 +2643,7 @@ static int __vm_map_pages(struct vm_area_struct *vma, struct page **pages,
int vm_map_pages(struct vm_area_struct *vma, struct page **pages,
unsigned long num)
{
- return __vm_map_pages(vma, pages, num, vma->vm_pgoff);
+ return __vm_map_pages(vma, pages, num, vma_start_pgoff(vma));
}
EXPORT_SYMBOL(vm_map_pages);
@@ -3298,7 +3298,8 @@ int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long
unsigned long pfn;
int err;
- err = __simple_ioremap_prep(vm_len, vma->vm_pgoff, start, len, &pfn);
+ err = __simple_ioremap_prep(vm_len, vma_start_pgoff(vma), start, len,
+ &pfn);
if (err)
return err;
@@ -4342,15 +4343,15 @@ static inline void unmap_mapping_range_tree(struct address_space *mapping,
struct zap_details *details)
{
struct vm_area_struct *vma;
- unsigned long start, size;
struct mmu_gather tlb;
mapping_interval_tree_foreach(vma, mapping, first_index, last_index) {
- const pgoff_t start_idx = max(first_index, vma->vm_pgoff);
+ const pgoff_t start_idx = max(first_index, vma_start_pgoff(vma));
const pgoff_t end_idx = min(last_index, vma_last_pgoff(vma)) + 1;
-
- start = vma->vm_start + ((start_idx - vma->vm_pgoff) << PAGE_SHIFT);
- size = (end_idx - start_idx) << PAGE_SHIFT;
+ const pgoff_t offset = start_idx - vma_start_pgoff(vma);
+ const unsigned long offset_bytes = offset << PAGE_SHIFT;
+ const unsigned long start = vma->vm_start + offset_bytes;
+ const unsigned long size = (end_idx - start_idx) << PAGE_SHIFT;
tlb_gather_mmu(&tlb, vma->vm_mm);
zap_vma_range_batched(&tlb, vma, start, size, details);
@@ -5684,7 +5685,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
} else if (nr_pages > 1) {
pgoff_t idx = folio_page_idx(folio, page);
/* The page offset of vmf->address within the VMA. */
- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
+ pgoff_t vma_off = vmf->pgoff - vma_start_pgoff(vmf->vma);
/* The index of the entry in the pagetable for fault page. */
pgoff_t pte_off = pte_index(vmf->address);
@@ -5796,7 +5797,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
pgoff_t nr_pages = READ_ONCE(fault_around_pages);
pgoff_t pte_off = pte_index(vmf->address);
/* The page offset of vmf->address within the VMA. */
- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
+ pgoff_t vma_off = vmf->pgoff - vma_start_pgoff(vmf->vma);
pgoff_t from_pte, to_pte;
vm_fault_t ret;
@@ -7274,7 +7275,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
if (vma && vma->vm_file) {
struct file *f = vma->vm_file;
ip -= vma->vm_start;
- ip += vma->vm_pgoff << PAGE_SHIFT;
+ ip += vma_start_pgoff(vma) << PAGE_SHIFT;
printk("%s%pD[%lx,%lx+%lx]", prefix, f, ip,
vma->vm_start,
vma->vm_end - vma->vm_start);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36699fabd3c2..650cdb23354a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2048,8 +2048,8 @@ struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
pol = get_task_policy(current);
if (pol->mode == MPOL_INTERLEAVE ||
pol->mode == MPOL_WEIGHTED_INTERLEAVE) {
- *ilx += vma->vm_pgoff >> order;
- *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
+ *ilx += vma_start_pgoff(vma) >> order;
+ *ilx += linear_page_delta(vma, addr) >> order;
}
return pol;
}
@@ -3250,16 +3250,17 @@ EXPORT_SYMBOL_FOR_MODULES(mpol_shared_policy_init, "kvm");
int mpol_set_shared_policy(struct shared_policy *sp,
struct vm_area_struct *vma, struct mempolicy *pol)
{
- int err;
+ const pgoff_t pgoff = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
struct sp_node *new = NULL;
- unsigned long sz = vma_pages(vma);
+ int err;
if (pol) {
- new = sp_alloc(vma->vm_pgoff, vma->vm_pgoff + sz, pol);
+ new = sp_alloc(pgoff, pgoff_end, pol);
if (!new)
return -ENOMEM;
}
- err = shared_policy_replace(sp, vma->vm_pgoff, vma->vm_pgoff + sz, new);
+ err = shared_policy_replace(sp, pgoff, pgoff_end, new);
if (err && new)
sp_free(new);
return err;
diff --git a/mm/mremap.c b/mm/mremap.c
index e9c8b1d05832..079a0ba0c4a7 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -948,8 +948,7 @@ static unsigned long vrm_set_new_addr(struct vma_remap_struct *vrm)
struct vm_area_struct *vma = vrm->vma;
unsigned long map_flags = 0;
/* Page Offset _into_ the VMA. */
- pgoff_t internal_pgoff = (vrm->addr - vma->vm_start) >> PAGE_SHIFT;
- pgoff_t pgoff = vma->vm_pgoff + internal_pgoff;
+ const pgoff_t pgoff = linear_page_index(vma, vrm->addr);
unsigned long new_addr = vrm_implies_new_addr(vrm) ? vrm->new_addr : 0;
unsigned long res;
@@ -1255,12 +1254,10 @@ static void unmap_source_vma(struct vma_remap_struct *vrm)
static int copy_vma_and_data(struct vma_remap_struct *vrm,
struct vm_area_struct **new_vma_ptr)
{
- unsigned long internal_offset = vrm->addr - vrm->vma->vm_start;
- unsigned long internal_pgoff = internal_offset >> PAGE_SHIFT;
- unsigned long new_pgoff = vrm->vma->vm_pgoff + internal_pgoff;
- unsigned long moved_len;
+ const unsigned long new_pgoff = linear_page_index(vrm->vma, vrm->addr);
struct vm_area_struct *vma = vrm->vma;
struct vm_area_struct *new_vma;
+ unsigned long moved_len;
int err = 0;
PAGETABLE_MOVE(pmc, NULL, NULL, vrm->addr, vrm->new_addr, vrm->old_len);
@@ -1802,8 +1799,7 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
vrm->populate_expand = true;
/* Need to be careful about a growing mapping */
- pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
- pgoff += vma->vm_pgoff;
+ pgoff = linear_page_index(vma, addr);
if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
return -EINVAL;
diff --git a/mm/msync.c b/mm/msync.c
index ac4c9bfea2e7..90b491a27a14 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -12,6 +12,7 @@
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/file.h>
+#include <linux/pagemap.h>
#include <linux/syscalls.h>
#include <linux/sched.h>
@@ -85,8 +86,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
goto out_unlock;
}
file = vma->vm_file;
- fstart = (start - vma->vm_start) +
- ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ fstart = (loff_t)linear_page_index(vma, start) << PAGE_SHIFT;
fend = fstart + (min(end, vma->vm_end) - start) - 1;
start = vma->vm_end;
if ((flags & MS_SYNC) && file &&
diff --git a/mm/nommu.c b/mm/nommu.c
index 6d168f69763f..60560b2c457e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -975,7 +975,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
/* read the contents of a file into the copy */
loff_t fpos;
- fpos = vma->vm_pgoff;
+ fpos = vma_start_pgoff(vma);
fpos <<= PAGE_SHIFT;
ret = kernel_read(vma->vm_file, base, len, &fpos);
@@ -1355,7 +1355,8 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
delete_nommu_region(vma->vm_region);
if (new_below) {
vma->vm_region->vm_start = vma->vm_start = addr;
- vma->vm_region->vm_pgoff = vma->vm_pgoff += npages;
+ vma->vm_pgoff += npages;
+ vma->vm_region->vm_pgoff = vma_start_pgoff(vma);
} else {
vma->vm_region->vm_end = vma->vm_end = addr;
vma->vm_region->vm_top = addr;
@@ -1603,7 +1604,7 @@ int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long
unsigned long pfn = start >> PAGE_SHIFT;
unsigned long vm_len = vma->vm_end - vma->vm_start;
- pfn += vma->vm_pgoff;
+ pfn += vma_start_pgoff(vma);
return io_remap_pfn_range(vma, vma->vm_start, pfn, vm_len, vma->vm_page_prot);
}
EXPORT_SYMBOL(vm_iomap_memory);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 98d090ede077..0a3bbff57d46 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -813,7 +813,7 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
mapping_interval_tree_foreach(vma, mapping, first_index,
first_index + nr - 1) {
/* Clip to the vma */
- vba = vma->vm_pgoff;
+ vba = vma_start_pgoff(vma);
vea = vba + vma_pages(vma);
cba = first_index;
cba = max(cba, vba);
diff --git a/mm/shmem.c b/mm/shmem.c
index b51f83c970bb..4e7f6bc7a389 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1032,6 +1032,8 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma)
struct inode *inode = file_inode(vma->vm_file);
struct shmem_inode_info *info = SHMEM_I(inode);
struct address_space *mapping = inode->i_mapping;
+ const pgoff_t pgoff = vma_start_pgoff(vma);
+ const pgoff_t pgoff_end = vma_end_pgoff(vma);
unsigned long swapped;
/* Be careful as we don't hold info->lock */
@@ -1045,12 +1047,11 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma)
if (!swapped)
return 0;
- if (!vma->vm_pgoff && vma->vm_end - vma->vm_start >= inode->i_size)
+ if (!pgoff && vma->vm_end - vma->vm_start >= inode->i_size)
return swapped << PAGE_SHIFT;
/* Here comes the more involved part */
- return shmem_partial_swap_usage(mapping, vma->vm_pgoff,
- vma->vm_pgoff + vma_pages(vma));
+ return shmem_partial_swap_usage(mapping, pgoff, pgoff_end);
}
/*
@@ -2839,7 +2840,7 @@ static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
* by page order, as in shmem_get_pgoff_policy() and get_vma_policy()).
*/
*ilx = inode->i_ino;
- index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+ index = linear_page_index(vma, addr);
return mpol_shared_policy_lookup(&SHMEM_I(inode)->policy, index);
}
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 246af12bf801..bf4518f4449d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -481,7 +481,7 @@ static void mfill_retry_state_save(struct mfill_retry_state *s,
{
s->flags = vma_flags_and_mask(&vma->flags, MFILL_RETRY_STATE_VMA_FLAGS);
s->ops = vma_uffd_ops(vma);
- s->pgoff = vma->vm_pgoff;
+ s->pgoff = vma_start_pgoff(vma);
if (vma->vm_file)
s->file = get_file(vma->vm_file);
@@ -507,7 +507,7 @@ static bool mfill_retry_state_changed(struct mfill_retry_state *state,
/* VMA was file backed, but file, inode or offset has changed */
if (!vma->vm_file || vma->vm_file->f_inode != state->file->f_inode ||
- state->file != vma->vm_file || vma->vm_pgoff != state->pgoff)
+ state->file != vma->vm_file || vma_start_pgoff(vma) != state->pgoff)
return true;
return false;
diff --git a/mm/util.c b/mm/util.c
index af2c2103f0d9..61e6d32b2c16 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1188,7 +1188,7 @@ void compat_set_desc_from_vma(struct vm_area_desc *desc,
desc->start = vma->vm_start;
desc->end = vma->vm_end;
- desc->pgoff = vma->vm_pgoff;
+ desc->pgoff = vma_start_pgoff(vma);
desc->vm_file = vma->vm_file;
desc->vma_flags = vma->flags;
desc->page_prot = vma->vm_page_prot;
@@ -1379,7 +1379,7 @@ static int call_vma_mapped(struct vm_area_struct *vma)
if (!vm_ops || !vm_ops->mapped)
return 0;
- err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
+ err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma_start_pgoff(vma),
vma->vm_file, &vm_private_data);
if (err)
return err;
diff --git a/mm/vma.c b/mm/vma.c
index dc4c2c1077f4..ee3a8ca13d07 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -967,10 +967,9 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
* prev middle next
* extend delete delete
*/
-
vmg->start = prev->vm_start;
vmg->end = next->vm_end;
- vmg->pgoff = prev->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(prev);
/*
* We already ensured anon_vma compatibility above, so now it's
@@ -987,9 +986,8 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
* prev middle
* extend shrink/delete
*/
-
vmg->start = prev->vm_start;
- vmg->pgoff = prev->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(prev);
if (!vmg->__remove_middle)
vmg->__adjust_middle_start = true;
@@ -1011,13 +1009,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (vmg->__remove_middle) {
vmg->end = next->vm_end;
- vmg->pgoff = next->vm_pgoff - pglen;
+ vmg->pgoff = vma_start_pgoff(next) - pglen;
} else {
/* We shrink middle and expand next. */
vmg->__adjust_next_start = true;
vmg->start = middle->vm_start;
vmg->end = start;
- vmg->pgoff = middle->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(middle);
}
err = dup_anon_vma(next, middle, &anon_dup);
@@ -1126,7 +1124,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
if (can_merge_left) {
vmg->start = prev->vm_start;
vmg->target = prev;
- vmg->pgoff = prev->vm_pgoff;
+ vmg->pgoff = vma_start_pgoff(prev);
/*
* If this merge would result in removal of the next VMA but we
@@ -1957,7 +1955,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
VM_BUG_ON_VMA(faulted_in_anon_vma, new_vma);
*vmap = vma = new_vma;
}
- *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff);
+ *need_rmap_locks =
+ (vma_start_pgoff(new_vma) <= vma_start_pgoff(vma));
} else {
new_vma = vm_area_dup(vma);
if (!new_vma)
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 5cee8b7efa0f..e3644a3042e2 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -37,7 +37,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
unsigned long new_end = old_end - shift;
VMA_ITERATOR(vmi, mm, new_start);
VMG_STATE(vmg, mm, &vmi, new_start, old_end, EMPTY_VMA_FLAGS,
- vma->vm_pgoff);
+ vma_start_pgoff(vma));
struct vm_area_struct *next;
struct mmu_gather tlb;
PAGETABLE_MOVE(pmc, vma, vma, old_start, new_start, length);
@@ -89,7 +89,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
vma_prev(&vmi);
/* Shrink the vma to just the new range */
- return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
+ return vma_shrink(&vmi, vma, new_start, new_end, vma_start_pgoff(vma));
}
/*
diff --git a/mm/vma_init.c b/mm/vma_init.c
index 3c0b65950510..a459669a1654 100644
--- a/mm/vma_init.c
+++ b/mm/vma_init.c
@@ -46,7 +46,7 @@ static void vm_area_init_from(const struct vm_area_struct *src,
dest->vm_start = src->vm_start;
dest->vm_end = src->vm_end;
dest->anon_vma = src->anon_vma;
- dest->vm_pgoff = src->vm_pgoff;
+ dest->vm_pgoff = vma_start_pgoff(src);
dest->vm_file = src->vm_file;
dest->vm_private_data = src->vm_private_data;
vm_flags_init(dest, src->vm_flags);
--
2.54.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox