* [syzbot ci] Re: Implement a new generic pagewalk API
2026-04-12 17:42 [RFC PATCH 0/7] " Oscar Salvador
@ 2026-04-13 7:38 ` syzbot ci
0 siblings, 0 replies; 11+ messages in thread
From: syzbot ci @ 2026-04-13 7:38 UTC (permalink / raw)
To: akpm, david, david, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, osalvador, vbabka
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] Implement a new generic pagewalk API
https://lore.kernel.org/all/20260412174244.133715-1-osalvador@suse.de
* [RFC PATCH 1/7] mm: Add softleaf_from_pud
* [RFC PATCH 2/7] mm: Add {pmd,pud}_huge_lock helper
* [RFC PATCH 3/7] mm: Implement folio_pmd_batch
* [RFC PATCH 4/7] mm: Implement pt_range_walk
* [RFC PATCH 5/7] mm: Make /proc/pid/smaps use the new generic pagewalk API
* [RFC PATCH 6/7] mm: Make /proc/pid/numa_maps use the new generic pagewalk API
* [RFC PATCH 7/7] mm: Make /proc/pid/pagemap use the new generic pagewalk API
and found the following issues:
* KASAN: slab-out-of-bounds Write in pagemap_read
* WARNING in pt_range_walk
Full report is available here:
https://ci.syzbot.org/series/1f85248a-1ac0-48e8-8ce3-edb89a6b9ee5
***
KASAN: slab-out-of-bounds Write in pagemap_read
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: 857fa8f2a5b184c206c703a3d9ce05cea683cfed
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/932ed80d-9fb1-4c99-8096-4b7a9324bb7c/config
syz repro: https://ci.syzbot.org/findings/1083a63d-0470-4ce7-8943-0a60046b9269/syz_repro
==================================================================
BUG: KASAN: slab-out-of-bounds in add_to_pagemap fs/proc/task_mmu.c:1740 [inline]
BUG: KASAN: slab-out-of-bounds in pagemap_read_walk_range fs/proc/task_mmu.c:2736 [inline]
BUG: KASAN: slab-out-of-bounds in pagemap_read+0x19bc/0x21a0 fs/proc/task_mmu.c:2829
Write of size 8 at addr ffff88816d32b000 by task syz.0.17/5958
CPU: 0 UID: 0 PID: 5958 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xba/0x230 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
add_to_pagemap fs/proc/task_mmu.c:1740 [inline]
pagemap_read_walk_range fs/proc/task_mmu.c:2736 [inline]
pagemap_read+0x19bc/0x21a0 fs/proc/task_mmu.c:2829
vfs_read+0x20c/0xa70 fs/read_write.c:572
ksys_pread64 fs/read_write.c:765 [inline]
__do_sys_pread64 fs/read_write.c:773 [inline]
__se_sys_pread64 fs/read_write.c:770 [inline]
__x64_sys_pread64+0x199/0x230 fs/read_write.c:770
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd47239c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd473284028 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 00007fd472615fa0 RCX: 00007fd47239c819
RDX: 0000000000019000 RSI: 0000200000000200 RDI: 0000000000000003
RBP: 00007fd472432c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000001000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fd472616038 R14: 00007fd472615fa0 R15: 00007ffe3c81aa88
</TASK>
Allocated by task 5958:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
kasan_kmalloc include/linux/kasan.h:263 [inline]
__kmalloc_cache_noprof+0x31c/0x660 mm/slub.c:5339
kmalloc_noprof include/linux/slab.h:962 [inline]
kmalloc_array_noprof include/linux/slab.h:1109 [inline]
pagemap_read+0x287/0x21a0 fs/proc/task_mmu.c:2781
vfs_read+0x20c/0xa70 fs/read_write.c:572
ksys_pread64 fs/read_write.c:765 [inline]
__do_sys_pread64 fs/read_write.c:773 [inline]
__se_sys_pread64 fs/read_write.c:770 [inline]
__x64_sys_pread64+0x199/0x230 fs/read_write.c:770
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff88816d32a000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 0 bytes to the right of
allocated 4096-byte region [ffff88816d32a000, ffff88816d32b000)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16d328
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x57ff00000000040(head|node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000040 ffff888100042140 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000040004 00000000f5000000 0000000000000000
head: 057ff00000000040 ffff888100042140 dead000000000100 dead000000000122
head: 0000000000000000 0000000000040004 00000000f5000000 0000000000000000
head: 057ff00000000003 ffffea0005b4ca01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 20801987900, free_ts 13278585415
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x231/0x280 mm/page_alloc.c:1889
prep_new_page mm/page_alloc.c:1897 [inline]
get_page_from_freelist+0x24dc/0x2580 mm/page_alloc.c:3962
__alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5250
alloc_slab_page mm/slub.c:3255 [inline]
allocate_slab+0x77/0x660 mm/slub.c:3444
new_slab mm/slub.c:3502 [inline]
refill_objects+0x331/0x3c0 mm/slub.c:7134
refill_sheaf mm/slub.c:2804 [inline]
__pcs_replace_empty_main+0x2b9/0x620 mm/slub.c:4578
alloc_from_pcs mm/slub.c:4681 [inline]
slab_alloc_node mm/slub.c:4815 [inline]
__kmalloc_cache_noprof+0x392/0x660 mm/slub.c:5334
kmalloc_noprof include/linux/slab.h:962 [inline]
kzalloc_noprof include/linux/slab.h:1200 [inline]
kobject_uevent_env+0x28c/0x9e0 lib/kobject_uevent.c:540
driver_register+0x2d4/0x320 drivers/base/driver.c:257
usb_register_driver+0x1e4/0x390 drivers/usb/core/driver.c:1078
hid_init+0x39/0x70 drivers/hid/usbhid/hid-core.c:1710
do_one_initcall+0x250/0x8d0 init/main.c:1382
do_initcall_level+0x104/0x190 init/main.c:1444
do_initcalls+0x59/0xa0 init/main.c:1460
kernel_init_freeable+0x2a6/0x3e0 init/main.c:1692
kernel_init+0x1d/0x1d0 init/main.c:1582
page last free pid 10 tgid 10 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
__free_pages_prepare mm/page_alloc.c:1433 [inline]
__free_frozen_pages+0xc2b/0xdb0 mm/page_alloc.c:2978
vfree+0x25a/0x400 mm/vmalloc.c:3479
delayed_vfree_work+0x55/0x80 mm/vmalloc.c:3398
process_one_work kernel/workqueue.c:3275 [inline]
process_scheduled_works+0xb02/0x1830 kernel/workqueue.c:3358
worker_thread+0xa50/0xfc0 kernel/workqueue.c:3439
kthread+0x388/0x470 kernel/kthread.c:467
ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Memory state around the buggy address:
ffff88816d32af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88816d32af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88816d32b000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88816d32b080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88816d32b100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
***
WARNING in pt_range_walk
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: 857fa8f2a5b184c206c703a3d9ce05cea683cfed
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/932ed80d-9fb1-4c99-8096-4b7a9324bb7c/config
syz repro: https://ci.syzbot.org/findings/e7c203a3-133f-4435-b9ed-ee292b6685fe/syz_repro
------------[ cut here ]------------
next_addr < vma->vm_start || next_addr >= vma->vm_end
WARNING: mm/pagewalk.c:1052 at pt_range_walk+0x145/0x35f0 mm/pagewalk.c:1052, CPU#1: syz.1.18/6005
Modules linked in:
CPU: 1 UID: 0 PID: 6005 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:pt_range_walk+0x145/0x35f0 mm/pagewalk.c:1052
Code: df e8 9f 1a 15 00 49 89 dc 48 8b 1b 4c 89 ff 48 89 de e8 7e a5 aa ff 49 39 df 4c 89 b4 24 38 01 00 00 73 14 e8 0c a3 aa ff 90 <0f> 0b 90 41 be 01 00 00 00 e9 e5 21 00 00 49 8d 5c 24 08 48 89 d8
RSP: 0018:ffffc90003a279a0 EFLAGS: 00010293
RAX: ffffffff821b140c RBX: 0000200001000000 RCX: ffff8881027a5700
RDX: 0000000000000000 RSI: 0000200001000000 RDI: 0000200001000000
RBP: ffffc90003a27bb0 R08: 00000000000000ff R09: 0000000000000003
R10: 0000000000000002 R11: 0000000000000000 R12: ffff888105317380
R13: dffffc0000000000 R14: 1ffff92000744f60 R15: 0000200001000000
FS: 00007f0ccbd736c0(0000) GS:ffff8882a9467000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0ccb04edd5 CR3: 0000000115dba000 CR4: 00000000000006f0
Call Trace:
<TASK>
pagemap_scan_walk fs/proc/task_mmu.c:2479 [inline]
do_pagemap_scan fs/proc/task_mmu.c:2573 [inline]
do_pagemap_cmd+0xfd5/0x2600 fs/proc/task_mmu.c:2869
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0ccaf9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0ccbd73028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f0ccb215fa0 RCX: 00007f0ccaf9c819
RDX: 0000200000000100 RSI: 00000000c0606610 RDI: 0000000000000003
RBP: 00007f0ccb032c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f0ccb216038 R14: 00007f0ccb215fa0 R15: 00007ffd9c7ebb48
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH v2 0/7] Implement a new generic pagewalk API
@ 2026-04-26 12:57 Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 1/7] mm: Add softleaf_from_pud Oscar Salvador
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
Changelog:
rfc -> rfcv2:
- Add pte_hole functionality
- Fix pagemap issues
- Fix shmem in smap
- Testing with pagemap "testsuite"
[WARNING]
This is not yet fully complete, but before investing more time into it I would like
to know whether 1) this is heading into the right direction and 2) this is something
we are still interested in.
E.g: one of the things that still needs work is make the new API being able to
take other locks like i_mmap, since that one is needed for hugetlb to protect
WP vs pmd-sharing in pagemap_scan.
That is already a WIP, but I still need to make a small adjustments.
Another thing is to convert "make_uffd_wp_huge_pte" to normal non-hugetlb specific
code, and that is too a WIP thing.
Kudos go to David, who was the person suggesting the interface and
he gave me some ideas where to begin, besides providing feedback
on early stages (in case there is something stupid don't blame him, blame me)
Also, I would like to thank Vlastimil, who helped me running this
patchset quite a few times through Claude, to catch some fixes.
[/WARNING]
[TESTING]
So far, tools/mm/page-types.c reports the right outcome (compared to the old API),
and tools/testing/selftests/mm/pagemap_ioctl.c only reports 4 failing tests.
Although to be honest, I do not how much should I trust that one because if I
add a few delays in the userspace code, some tests that were failing before are not
now, so yeah.
localhost:~/workspace # ./page-types -p 1168
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000800 1 0 ___________M_______________________________ mmap
0x0000000000000828 2 0 ___U_l_____M_______________________________ uptodate,lru,mmap
0x000000000000082c 1 0 __RU_l_____M_______________________________ referenced,uptodate,lru,mmap
0x0000000000004838 1 0 ___UDl_____M__b____________________________ uptodate,dirty,lru,mmap,swapbacked
0x000000000000086c 423 1 __RU_lA____M_______________________________ referenced,uptodate,lru,active,mmap
0x0000000000205828 29 0 ___U_l_____Ma_b______x_____________________ uptodate,lru,mmap,anonymous,swapbacked,ksm
0x000000000020586c 1 0 __RU_lA____Ma_b______x_____________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
total 458 1
localhost:~/workspace # ./page-types_lab -p 1168
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000804 1 0 __R________M_______________________________ referenced,mmap
0x0000000000000828 2 0 ___U_l_____M_______________________________ uptodate,lru,mmap
0x000000000000082c 1 0 __RU_l_____M_______________________________ referenced,uptodate,lru,mmap
0x0000000000004838 1 0 ___UDl_____M__b____________________________ uptodate,dirty,lru,mmap,swapbacked
0x000000000000086c 423 1 __RU_lA____M_______________________________ referenced,uptodate,lru,active,mmap
0x0000000000205828 29 0 ___U_l_____Ma_b______x_____________________ uptodate,lru,mmap,anonymous,swapbacked,ksm
0x000000000020586c 1 0 __RU_lA____Ma_b______x_____________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
total 458 1
page-types being using the new API and page-types_lab the old one.
# ./pagemap_ioctl
TAP version 13
1..117
ok 1 sanity_tests_sd Zero range size is valid
ok 2 sanity_tests_sd output buffer must be specified with size
ok 3 sanity_tests_sd output buffer can be 0
ok 4 sanity_tests_sd output buffer can be 0
ok 5 sanity_tests_sd wrong flag specified
ok 6 sanity_tests_sd flag has extra bits specified
ok 7 sanity_tests_sd no selection mask is specified
ok 8 sanity_tests_sd no return mask is specified
ok 9 sanity_tests_sd wrong return mask specified
ok 10 sanity_tests_sd mixture of correct and wrong flag
ok 11 sanity_tests_sd PAGEMAP_BITS_ALL can be specified with PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 12 sanity_tests_sd Clear area with larger vec size
ok 13 sanity_tests_sd Repeated pattern of written and non-written pages
ok 14 sanity_tests_sd Repeated pattern of written and non-written pages in parts 498 2 2
ok 15 sanity_tests_sd Repeated pattern of written and non-written pages max_pages
ok 16 sanity_tests_sd only get 2 written pages and clear them as well
ok 17 sanity_tests_sd Two regions
ok 18 sanity_tests_sd Smaller max_pages
ok 19 Smaller vec
ok 20 Walk_end: Same start and end address
ok 21 Walk_end: Same start and end with WP
ok 22 Walk_end: Same start and end with 0 output buffer
ok 23 Walk_end: Big vec
ok 24 Walk_end: vec of minimum length
ok 25 Walk_end: Max pages specified
ok 26 Walk_end: Half max pages
ok 27 Walk_end: 1 max page
ok 28 Walk_end: max pages
ok 29 Walk_end sparse: Big vec
ok 30 Walk_end sparse: vec of minimum length
ok 31 Walk_end sparse: Max pages specified
ok 32 Walk_end sparse: Max pages specified
ok 33 Walk_end sparse: Max pages specified
ok 34 Walk_endsparse : Half max pages
ok 35 Walk_end: 1 max page
ok 36 Page testing: all new pages must not be written (dirty)
ok 37 Page testing: all pages must be written (dirty)
ok 38 Page testing: all pages dirty other than first and the last one
ok 39 Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 40 Page testing: only middle page dirty
ok 41 Page testing: only two middle pages dirty
ok 42 Large Page testing: all new pages must not be written (dirty)
ok 43 Large Page testing: all pages must be written (dirty)
ok 44 Large Page testing: all pages dirty other than first and the last one
ok 45 Large Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 46 Large Page testing: only middle page dirty
ok 47 Large Page testing: only two middle pages dirty
ok 48 Huge page testing: all new pages must not be written (dirty)
ok 49 Huge page testing: all pages must be written (dirty)
ok 50 Huge page testing: all pages dirty other than first and the last one
ok 51 Huge page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 52 Huge page testing: only middle page dirty
ok 53 Huge page testing: only two middle pages dirty
ok 54 Hugetlb shmem testing: all new pages must not be written (dirty)
ok 55 Hugetlb shmem testing: all pages must be written (dirty)
ok 56 Hugetlb shmem testing: all pages dirty other than first and the last one
ok 57 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 58 Hugetlb shmem testing: only middle page dirty
not ok 59 Hugetlb shmem testing: only two middle pages dirty
ok 60 Hugetlb mem testing: all new pages must not be written (dirty)
ok 61 Hugetlb mem testing: all pages must be written (dirty)
ok 62 Hugetlb mem testing: all pages dirty other than first and the last one
ok 63 Hugetlb mem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 64 Hugetlb mem testing: only middle page dirty
not ok 65 Hugetlb mem testing: only two middle pages dirty
ok 66 Hugetlb shmem testing: all new pages must not be written (dirty)
ok 67 Hugetlb shmem testing: all pages must be written (dirty)
ok 68 Hugetlb shmem testing: all pages dirty other than first and the last one
ok 69 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 70 Hugetlb shmem testing: only middle page dirty
not ok 71 Hugetlb shmem testing: only two middle pages dirty
ok 72 File memory testing: all new pages must not be written (dirty)
ok 73 File memory testing: all pages must be written (dirty)
ok 74 File memory testing: all pages dirty other than first and the last one
ok 75 File memory testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 76 File memory testing: only middle page dirty
ok 77 File memory testing: only two middle pages dirty
ok 78 File anonymous memory testing: all new pages must not be written (dirty)
ok 79 File anonymous memory testing: all pages must be written (dirty)
ok 80 File anonymous memory testing: all pages dirty other than first and the last one
ok 81 File anonymous memory testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 82 File anonymous memory testing: only middle page dirty
ok 83 File anonymous memory testing: only two middle pages dirty
ok 84 hpage_unit_tests all new huge page must not be written (dirty)
ok 85 hpage_unit_tests all the huge page must not be written
ok 86 hpage_unit_tests all the huge page must be written and clear
ok 87 hpage_unit_tests only middle page written
not ok 88 hpage_unit_tests clear first half of huge page
ok 89 hpage_unit_tests clear first half of huge page with limited buffer
ok 90 hpage_unit_tests clear second half huge page
ok 91 hpage_unit_tests get half huge page
ok 92 hpage_unit_tests get half huge page
ok 93 Test test_simple
ok 94 mprotect_tests Both pages written
ok 95 mprotect_tests Both pages are not written (dirty)
ok 96 mprotect_tests Both pages written after remap and mprotect
ok 97 mprotect_tests Clear and make the pages written
ok 98 transact_test count 192
ok 99 transact_test count 0
ok 100 transact_test Extra pages 143 (0.3%), extra thread faults 143.
ok 101 sanity_tests WP op can be specified with !PAGE_IS_WRITTEN
ok 102 sanity_tests required_mask specified
ok 103 sanity_tests anyof_mask specified
ok 104 sanity_tests excluded_mask specified
ok 105 sanity_tests required_mask and anyof_mask specified
ok 106 sanity_tests Get sd and present pages with anyof_mask
ok 107 sanity_tests Get all the pages with required_mask
ok 108 sanity_tests Get sd and present pages with required_mask and anyof_mask
ok 109 sanity_tests Don't get sd pages
ok 110 sanity_tests Don't get present pages
ok 111 sanity_tests Find written present pages with return mask
ok 112 sanity_tests Memory mapped file
ok 113 sanity_tests Read/write to memory
ok 114 unmapped_region_tests Get status of pages
ok 115 userfaultfd_tests all new pages must not be written (dirty)
ok 116 zeropfn_tests all pages must have PFNZERO set
ok 117 zeropfn_tests all huge pages must have PFNZERO set
# Totals: pass:113 fail:4 xfail:0 xpass:0 skip:0 error:0
/proc/$$/numa_maps and /proc/$$/smaps have been tested too, comparing
the outcome with the old API.
[/TESTING]
In the LSFMM/BFP 2025, there was a general agreement that we 1) would like to have
a generic pagewalk API 2) that replaces the existing one with callbacks if possible
and 3) that HugeTLB can use without the need to special case it (e.g: not having to
depend on .hugetlb_entry callbacks)., which means having a lot of duplicated
code and also having a lot of special casing just because hugetlb lore.
pt_range_walk API tries to do that and replaces the old behaviour of "in
HugeTLB world everything reads as a PTE" and starts reading HugeTLB entries
the way they really are, that means interpreting them as PMD/PUD entries and
contiguous-PMD/PTE entries.
In order to achieve that, we need some infrastructure we did not really need until
know, in order to be able to read HugeTLB pages as PUD/PMD entries.
E.g: softleaf_from_pud had to be added and some other pud_* functions.
In a few words, this API goes through an address range and returns
whatever it is in there (swap/hwpoison/migration/marker entries, folios,
pfn and device entries, or nothing).
These are the internal return types the API uses:
PT_TYPE_NONE
PT_TYPE_FOLIO
PT_TYPE_MARKER
PT_TYPE_PFN
PT_TYPE_SWAP
PT_TYPE_MIGRATION
PT_TYPE_DEVICE
PT_TYPE_HWPOISON
The API also handles locking and batching itself, so the caller
does not really need to bother with that.
In order to handle contiguous-PMD mapped hugetlb pages, folio_pmd_batch,
which is an analogous of folio_pte_batch, has been implemented.
More information about the API can be found in patch #4.
This was tested on x86_64 and arm64, but as I said, it is still
incomplete, therefore the RFC, to gather some initial feedback before
investing more time into this.
For now, only the /proc/pid/(smaps|numa_maps|pagemap) have been replaced
to use this new API.
Thanks in advance
Oscar Salvador (7):
mm: Add softleaf_from_pud
mm: Add {pmd,pud}_huge_lock helper
mm: Implement folio_pmd_batch
mm: Implement pt_range_walk
mm: Make /proc/pid/smaps use the new generic pagewalk API
mm: Make /proc/pid/numa_maps use the new generic pagewalk API
mm: Make /proc/pid/pagemap use the new generic pagewalk API
arch/arm64/include/asm/pgtable.h | 32 +
arch/loongarch/include/asm/pgtable.h | 1 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +
arch/s390/include/asm/pgtable.h | 38 +
arch/x86/include/asm/pgtable.h | 52 +
arch/x86/include/asm/pgtable_64.h | 2 +
arch/x86/mm/pgtable.c | 18 +-
fs/proc/task_mmu.c | 2212 +++++++++---------
include/asm-generic/pgtable_uffd.h | 15 +
include/linux/leafops.h | 46 +
include/linux/mm.h | 2 +
include/linux/mm_inline.h | 32 +
include/linux/pagewalk.h | 106 +
include/linux/pgtable.h | 95 +
mm/internal.h | 75 +-
mm/memory.c | 22 +
mm/pagewalk.c | 400 ++++
mm/pgtable-generic.c | 10 +
18 files changed, 2024 insertions(+), 1141 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH v2 1/7] mm: Add softleaf_from_pud
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 2/7] mm: Add {pmd,pud}_huge_lock helper Oscar Salvador
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
We want to be able to operate on HugeTLB pages as we do with normal
pages, which means stop pretending everyting is a pte in HugeTLB world
and be able to operate on the right entry level.
Since we can have HugeTLB as PUD entries, we need the infrastructure that
allows us to operate on them, so add softleaf_from_pud(), and the
infrastructure that comes with it.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
arch/arm64/include/asm/pgtable.h | 12 +++++
arch/loongarch/include/asm/pgtable.h | 1 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +++
arch/s390/include/asm/pgtable.h | 38 ++++++++++++++++
arch/x86/include/asm/pgtable.h | 48 ++++++++++++++++++++
arch/x86/include/asm/pgtable_64.h | 2 +
include/asm-generic/pgtable_uffd.h | 15 ++++++
include/linux/leafops.h | 33 ++++++++++++++
include/linux/pgtable.h | 37 +++++++++++++++
9 files changed, 193 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b3e58735c49b..e42ad56a86d4 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -596,6 +596,13 @@ static inline int pmd_protnone(pmd_t pmd)
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define pud_uffd_wp(pud) pte_uffd_wp(pud_pte(pud))
+#define pud_mkuffd_wp(pud) pte_pud(pte_mkuffd_wp(pud_pte(pud)))
+#define pud_clear_uffd_wp(pud) pte_pud(pte_clear_uffd_wp(pud_pte(pud)))
+#define pud_swp_uffd_wp(pud) pte_swp_uffd_wp(pud_pte(pud))
+#define pud_swp_mkuffd_wp(pud) pte_pud(pte_swp_mkuffd_wp(pud_pte(pud)))
+#define pud_swp_clear_uffd_wp(pud) \
+ pte_pud(pte_swp_clear_uffd_wp(pud_pte(pud)))
#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd))
#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd)))
#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd)))
@@ -1528,6 +1535,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
#define __swp_entry_to_pmd(swp) __pmd((swp).val)
#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
+#ifdef CONFIG_HUGETLB_PAGE
+#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val(pud) })
+#define __swp_entry_to_pud(swp) __pud((swp).val)
+#endif
+
/*
* Ensure that there are not more swap files than can be encoded in the kernel
* PTEs.
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index c33b3bcb733e..eba6d20f007f 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -335,6 +335,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
#define __swp_entry_to_pmd(x) __pmd((x).val | _PAGE_HUGE)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
+#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val(pud) })
static inline bool pte_swp_exclusive(pte_t pte)
{
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 1a91762b455d..476781c59d5f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1065,6 +1065,13 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
#define pmd_swp_soft_dirty(pmd) pte_swp_soft_dirty(pmd_pte(pmd))
#define pmd_swp_clear_soft_dirty(pmd) pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)))
#endif
+
+#ifdef CONFIG_HUGETLB_PAGE
+#define pud_swp_mksoft_dirty(pud) pte_pud(pte_swp_mksoft_dirty(pud_pte(pud)))
+#define pud_swp_soft_dirty(pud) pte_swp_soft_dirty(pud_pte(pud))
+#define pud_swp_clear_soft_dirty(pud) pte_pud(pte_swp_clear_soft_dirty(pud_pte(pud)))
+#endif
+
#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
#ifdef CONFIG_NUMA_BALANCING
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1c3c3be93be9..0d1d571215c4 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -901,11 +901,31 @@ static inline pmd_t pmd_clear_soft_dirty(pmd_t pmd)
return clear_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_SOFT_DIRTY));
}
+static inline int pud_soft_dirty(pud_t pud)
+{
+ return pud_val(pud) & _REGION3_ENTRY_SOFT_DIRTY;
+}
+
+static inline pud_t pud_mksoft_dirty(pud_t pud)
+{
+ return set_pud_bit(pud, __pgprot(_REGION3_ENTRY_SOFT_DIRTY));
+}
+
+static inline pud_t pud_clear_soft_dirty(pud_t pud)
+{
+ return clear_pud_bit(pud, __pgprot(_REGION3_ENTRY_SOFT_DIRTY));
+}
+
#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
#define pmd_swp_soft_dirty(pmd) pmd_soft_dirty(pmd)
#define pmd_swp_mksoft_dirty(pmd) pmd_mksoft_dirty(pmd)
#define pmd_swp_clear_soft_dirty(pmd) pmd_clear_soft_dirty(pmd)
#endif
+#ifdef CONFIG_HUGETLB_PAGE
+#define pud_swp_soft_dirty(pud) pud_soft_dirty(pud)
+#define pud_swp_mksoft_dirty(pud) pud_mksoft_dirty(pud)
+#define pud_swp_clear_soft_dirty(pud) pud_clear_soft_dirty(pud)
+#endif
/*
* query functions pte_write/pte_dirty/pte_young only work if
@@ -1901,6 +1921,24 @@ static inline unsigned long __swp_offset_rste(swp_entry_t entry)
* requires conversion of the swap type and offset, and not all the possible
* PTE bits.
*/
+static inline swp_entry_t __pud_to_swp_entry(pud_t pud)
+{
+ swp_entry_t arch_entry;
+ pte_t pte;
+
+ arch_entry = __rste_to_swp_entry(pud_val(pud));
+ pte = mk_swap_pte(__swp_type_rste(arch_entry), __swp_offset_rste(arch_entry));
+ return __pte_to_swp_entry(pte);
+}
+
+static inline pud_t __swp_entry_to_pud(swp_entry_t arch_entry)
+{
+ pud_t pud;
+
+ pud = __pud(mk_swap_rste(__swp_type(arch_entry), __swp_offset(arch_entry)));
+ return pud;
+}
+
static inline swp_entry_t __pmd_to_swp_entry(pmd_t pmd)
{
swp_entry_t arch_entry;
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1662c5a8f445..a68ff339cd56 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -656,6 +656,23 @@ static inline pud_t pud_mkwrite(pud_t pud)
return pud_clear_saveddirty(pud);
}
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline int pud_uffd_wp(pud_t pud)
+{
+ return pud_flags(pud) & _PAGE_UFFD_WP;
+}
+
+static inline pud_t pud_mkuffd_wp(pud_t pud)
+{
+ return pud_wrprotect(pud_set_flags(pud, _PAGE_UFFD_WP));
+}
+
+static inline pud_t pud_clear_uffd_wp(pud_t pud)
+{
+ return pud_clear_flags(pud, _PAGE_UFFD_WP);
+}
+#endif
+
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
static inline int pte_soft_dirty(pte_t pte)
{
@@ -1557,6 +1574,22 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
return pmd_clear_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
}
#endif
+#ifdef CONFIG_HUGETLB_PAGE
+static inline pud_t pud_swp_mksoft_dirty(pud_t pud)
+{
+ return pud_set_flags(pud, _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline int pud_swp_soft_dirty(pud_t pud)
+{
+ return pud_flags(pud) & _PAGE_SWP_SOFT_DIRTY;
+}
+
+static inline pud_t pud_swp_clear_soft_dirty(pud_t pud)
+{
+ return pud_clear_flags(pud, _PAGE_SWP_SOFT_DIRTY);
+}
+#endif
#endif
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
@@ -1589,6 +1622,21 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
{
return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP);
}
+
+static inline pud_t pud_swp_mkuffd_wp(pud_t pud)
+{
+ return pud_set_flags(pud, _PAGE_SWP_UFFD_WP);
+}
+
+static inline int pud_swp_uffd_wp(pud_t pud)
+{
+ return pud_flags(pud) & _PAGE_SWP_UFFD_WP;
+}
+
+static inline pud_t pud_swp_clear_uffd_wp(pud_t pud)
+{
+ return pud_clear_flags(pud, _PAGE_SWP_UFFD_WP);
+}
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
static inline u16 pte_flags_pkey(unsigned long pte_flags)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index f06e5d6a2747..0cf02ddd3d4b 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -236,8 +236,10 @@ static inline void native_pgd_clear(pgd_t *pgd)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val((pmd)) })
+#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val((pud)) })
#define __swp_entry_to_pte(x) (__pte((x).val))
#define __swp_entry_to_pmd(x) (__pmd((x).val))
+#define __swp_entry_to_pud(x) (__pud((x).val))
extern void cleanup_highmap(void);
diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h
index 0d85791efdf7..59c9d6762ec8 100644
--- a/include/asm-generic/pgtable_uffd.h
+++ b/include/asm-generic/pgtable_uffd.h
@@ -78,6 +78,21 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
{
return pmd;
}
+
+static inline pud_t pud_swp_mkuffd_wp(pud_t pud)
+{
+ return pud;
+}
+
+static inline int pud_swp_uffd_wp(pud_t pud)
+{
+ return 0;
+}
+
+static inline pud_t pud_swp_clear_uffd_wp(pud_t pud)
+{
+ return pud;
+}
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
#endif /* _ASM_GENERIC_PGTABLE_UFFD_H */
diff --git a/include/linux/leafops.h b/include/linux/leafops.h
index a9ff94b744f2..122ac50aeb09 100644
--- a/include/linux/leafops.h
+++ b/include/linux/leafops.h
@@ -117,6 +117,39 @@ static inline softleaf_t softleaf_from_pmd(pmd_t pmd)
#endif
+#ifdef CONFIG_HUGETLB_PAGE
+/**
+ * softleaf_from_pud() - Obtain a leaf entry from a PUD entry.
+ * @pud: PUD entry.
+ *
+ * If @pud is present (therefore not a leaf entry) the function returns an empty
+ * leaf entry. Otherwise, it returns a leaf entry.
+ *
+ * Returns: Leaf entry.
+ */
+static inline softleaf_t softleaf_from_pud(pud_t pud)
+{
+ softleaf_t arch_entry;
+
+ if (pud_present(pud) || pud_none(pud))
+ return softleaf_mk_none();
+
+ if (pud_swp_soft_dirty(pud))
+ pud = pud_swp_clear_soft_dirty(pud);
+ if (pud_swp_uffd_wp(pud))
+ pud = pud_swp_clear_uffd_wp(pud);
+ arch_entry = __pud_to_swp_entry(pud);
+
+ /* Temporary until swp_entry_t eliminated. */
+ return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
+}
+#else
+static inline softleaf_t softleaf_from_pud(pud_t pud)
+{
+ return softleaf_mk_none();
+}
+#endif
+
/**
* softleaf_is_none() - Is the leaf entry empty?
* @entry: Leaf entry.
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index a50df42a893f..1abd9c52a4f2 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1761,6 +1761,22 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
return pmd;
}
#endif
+#ifndef CONFIG_HUGETLB_PAGE
+static inline pud_t pud_swp_mksoft_dirty(pud_t pud)
+{
+ return pud;
+}
+
+static inline int pud_swp_soft_dirty(pud_t pud)
+{
+ return 0;
+}
+
+static inline pud_t pud_swp_clear_soft_dirty(pud_t pud)
+{
+ return pud;
+}
+#endif
#else /* !CONFIG_HAVE_ARCH_SOFT_DIRTY */
static inline int pte_soft_dirty(pte_t pte)
{
@@ -1821,6 +1837,21 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
{
return pmd;
}
+
+static inline pud_t pud_swp_mksoft_dirty(pud_t pud)
+{
+ return pud;
+}
+
+static inline int pud_swp_soft_dirty(pud_t pud)
+{
+ return 0;
+}
+
+static inline pud_t pud_swp_clear_soft_dirty(pud_t pud)
+{
+ return pud;
+}
#endif
#ifndef __HAVE_PFNMAP_TRACKING
@@ -2369,4 +2400,10 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) \
} \
EXPORT_SYMBOL(vm_get_page_prot);
+#ifdef CONFIG_HUGETLB_PAGE
+#ifndef __pud_to_swp_entry
+#define __pud_to_swp_entry(pud) ((swp_entry_t) { pud_val(pud) })
+#endif
+#endif
+
#endif /* _LINUX_PGTABLE_H */
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 2/7] mm: Add {pmd,pud}_huge_lock helper
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 1/7] mm: Add softleaf_from_pud Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 3/7] mm: Implement folio_pmd_batch Oscar Salvador
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
HugeTLB and THP use the same lock for pud and pmd,
so create two helpers that can be directly used by both of them,
as they will be used in the generic pagewalkers.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
include/linux/mm_inline.h | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index fa2d6ba811b5..3ac77b50e91f 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -655,4 +655,36 @@ static inline size_t num_pages_contiguous(struct page **pages, size_t nr_pages)
return i;
}
+static inline spinlock_t *pmd_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
+{
+ spinlock_t *ptl;
+
+ if (pmd_present(*pmd) || !pmd_none(*pmd)) {
+ ptl = pmd_lock(vma->vm_mm, pmd);
+ if (pmd_present(*pmd) && pmd_leaf(*pmd))
+ return ptl;
+ else if (!pmd_present(*pmd) && !pmd_none(*pmd))
+ return ptl;
+ spin_unlock(ptl);
+ }
+
+ return NULL;
+}
+
+static inline spinlock_t *pud_huge_lock(pud_t *pud, struct vm_area_struct *vma)
+{
+ spinlock_t *ptl;
+
+ if (pud_present(*pud) || !pud_none(*pud)) {
+ ptl = pud_lock(vma->vm_mm, pud);
+ if (pud_present(*pud) && pud_leaf(*pud))
+ return ptl;
+ else if (!pud_present(*pud) && !pud_none(*pud))
+ return ptl;
+ spin_unlock(ptl);
+ }
+
+ return NULL;
+}
+
#endif
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 3/7] mm: Implement folio_pmd_batch
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 1/7] mm: Add softleaf_from_pud Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 2/7] mm: Add {pmd,pud}_huge_lock helper Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 4/7] mm: Implement pt_range_walk Oscar Salvador
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
HugeTLB can be mapped as contiguous PMDs, so we need a way to be able
to batch them as we do for contiguous PTEs.
Implement folio_pmd_batch in order to do that.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
arch/arm64/include/asm/pgtable.h | 19 ++++++++
include/linux/pgtable.h | 28 ++++++++++++
mm/internal.h | 75 +++++++++++++++++++++++++++++++-
3 files changed, 121 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e42ad56a86d4..5b5490505b94 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -170,6 +170,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
(__boundary - 1 < (end) - 1) ? __boundary : (end); \
})
+#define pmd_valid_cont(pmd) (pmd_valid(pmd) && pmd_cont(pmd))
+
#define pte_hw_dirty(pte) (pte_write(pte) && !pte_rdonly(pte))
#define pte_sw_dirty(pte) (!!(pte_val(pte) & PTE_DIRTY))
#define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
@@ -670,6 +672,12 @@ static inline pgprot_t pmd_pgprot(pmd_t pmd)
return __pgprot(pmd_val(pfn_pmd(pfn, __pgprot(0))) ^ pmd_val(pmd));
}
+#define pmd_advance_pfn pmd_advance_pfn
+static inline pmd_t pmd_advance_pfn(pmd_t pmd, unsigned long nr)
+{
+ return pfn_pmd(pmd_pfn(pmd) + nr, pmd_pgprot(pmd));
+}
+
#define pud_pgprot pud_pgprot
static inline pgprot_t pud_pgprot(pud_t pud)
{
@@ -1645,6 +1653,17 @@ extern void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long ad
pte_t *ptep, pte_t old_pte, pte_t pte,
unsigned int nr);
+#ifdef CONFIG_HUGETLB_PAGE
+#define pmd_batch_hint pmd_batch_hint
+static inline unsigned int pmd_batch_hint(pmd_t *pmdp, pmd_t pmd)
+{
+ if (!pmd_valid_cont(pmd))
+ return 1;
+
+ return CONT_PMDS - (((unsigned long)pmdp >> 3) & (CONT_PMDS - 1));
+}
+#endif
+
#ifdef CONFIG_ARM64_CONTPTE
/*
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1abd9c52a4f2..ab43d0922ec1 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -358,6 +358,34 @@ static inline void lazy_mmu_mode_pause(void) {}
static inline void lazy_mmu_mode_resume(void) {}
#endif
+#ifndef pmd_batch_hint
+/**
+ * pmd_batch_hint - Number of PMD entries that can be added to batch without scanning.
+ * @pmdp: Page table pointer for the entry.
+ * @pmd: Page table entry.
+ *
+ * Some architectures know that a set of contiguous pmds all map the same
+ * contiguous memory with the same permissions. In this case, it can provide a
+ * hint to aid pmd batching without the core code needing to scan every pmd.
+ *
+ * An architecture implementation may ignore the PMD accessed state. Further,
+ * the dirty state must apply atomically to all the PMDs described by the hint.
+ *
+ * May be overridden by the architecture, else pmd_batch_hint is always 1.
+ */
+static inline unsigned int pmd_batch_hint(pmd_t *pmdp, pmd_t pmd)
+{
+ return 1;
+}
+#endif
+
+#ifndef pmd_advance_pfn
+static inline pmd_t pmd_advance_pfn(pmd_t pmd, unsigned long nr)
+{
+ return __pmd(pmd_val(pmd) + (nr << PFN_PTE_SHIFT));
+}
+#endif
+
#ifndef pte_batch_hint
/**
* pte_batch_hint - Number of pages that can be added to batch without scanning.
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..488cb5c1e340 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -269,7 +269,7 @@ static inline int anon_vma_prepare(struct vm_area_struct *vma)
return __anon_vma_prepare(vma);
}
-/* Flags for folio_pte_batch(). */
+/* Flags for folio_{pmd,pte}_batch(). */
typedef int __bitwise fpb_t;
/* Compare PTEs respecting the dirty bit. */
@@ -293,6 +293,79 @@ typedef int __bitwise fpb_t;
*/
#define FPB_MERGE_YOUNG_DIRTY ((__force fpb_t)BIT(4))
+static inline pmd_t __pmd_batch_clear_ignored(pmd_t pmd, fpb_t flags)
+{
+ if (!(flags & FPB_RESPECT_DIRTY))
+ pmd = pmd_mkclean(pmd);
+ if (likely(!(flags & FPB_RESPECT_SOFT_DIRTY)))
+ pmd = pmd_clear_soft_dirty(pmd);
+ if (likely(!(flags & FPB_RESPECT_WRITE)))
+ pmd = pmd_wrprotect(pmd);
+ return pmd_mkold(pmd);
+}
+
+/**
+ * folio_pmd_batch - detect a PMD batch for a large folio.
+ * - The only user of this is hugetlb for contiguous
+ * PMDs
+ **/
+static inline unsigned int folio_pmd_batch(struct folio *folio, pmd_t *pmdp, pmd_t *pmdentp,
+ unsigned int max_nr, fpb_t flags, bool *any_writable,
+ bool *any_young, bool *any_dirty)
+{
+ pmd_t expected_pmd, pmd = *pmdentp;
+ bool writable, young, dirty;
+ unsigned int nr, cur_nr;
+
+ if (any_writable)
+ *any_writable = !!pmd_write(*pmdentp);
+ if (any_young)
+ *any_young = !!pmd_young(*pmdentp);
+ if (any_dirty)
+ *any_dirty = !!pmd_dirty(*pmdentp);
+
+ VM_WARN_ON_FOLIO(!pmd_present(pmd), folio);
+ VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
+ VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pmd_pfn(pmd))) != folio, folio);
+
+ /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+ max_nr = min_t(unsigned long, max_nr,
+ (folio_pfn(folio) + folio_nr_pages(folio) -
+ pmd_pfn(pmd)) >> (PMD_SHIFT - PAGE_SHIFT));
+
+ nr = pmd_batch_hint(pmdp, pmd);
+ expected_pmd = __pmd_batch_clear_ignored(pmd_advance_pfn(pmd, nr << (PMD_SHIFT - PAGE_SHIFT)), flags);
+ pmdp = pmdp + nr;
+
+ while (nr < max_nr) {
+ pmd = pmdp_get(pmdp);
+ if (any_writable)
+ writable = !!pmd_write(pmd);
+ if (any_young)
+ young = !!pmd_young(pmd);
+ if (any_dirty)
+ dirty = !!pmd_dirty(pmd);
+ pmd = __pmd_batch_clear_ignored(pmd, flags);
+
+ if (!pmd_same(pmd, expected_pmd))
+ break;
+
+ if (any_writable)
+ *any_writable |= writable;
+ if (any_young)
+ *any_young |= young;
+ if (any_dirty)
+ *any_dirty |= dirty;
+
+ cur_nr = pmd_batch_hint(pmdp, pmd);
+ expected_pmd = pmd_advance_pfn(expected_pmd, cur_nr << (PMD_SHIFT - PAGE_SHIFT));
+ pmdp += cur_nr;
+ nr += cur_nr;
+ }
+
+ return min(nr, max_nr);
+}
+
static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
{
if (!(flags & FPB_RESPECT_DIRTY))
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 4/7] mm: Implement pt_range_walk
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
` (2 preceding siblings ...)
2026-04-26 12:57 ` [RFC PATCH v2 3/7] mm: Implement folio_pmd_batch Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 5/7] mm: Make /proc/pid/smaps use the new generic pagewalk API Oscar Salvador
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador,
David Hildenbrand
Implement pt_range_walk, which is a pagewalk API that implements locking
and batching itself, and returns a struct containing information
about the address space which is backed by the vma.
It goes through the address range provided, and returns whatever it
find there, softleaf entries, folios, etc. and information about the entry
itself like whether it is dirty, shared, present, size of the entry, pagetable
level of the entry, number of batched entries, etc.
It defines the following types:
#define PT_TYPE_NONE
#define PT_TYPE_FOLIO
#define PT_TYPE_MARKER
#define PT_TYPE_PFN
#define PT_TYPE_SWAP
#define PT_TYPE_MIGRATION
#define PT_TYPE_DEVICE
#define PT_TYPE_HWPOISON
#define PT_TYPE_ALL
and it lets the caller be explicit about what types it is interested in.
If it finds a type, but the caller stated it is not of importance, it keeps
scanning the address range till the next type is found, or till we exhaust
the range.
We have three functions:
.pt_range_walk_start()
.pt_range_walk_next()
.pt_range_walk_done()
pt_range_walk_start() starts scanning the range and it returns the
first type it finds, then we keep calling pt_range_walk_next() until
we get PTW_DONE, which means we exhausted the range, and once that
happens we have to call pt_range_walk_done() in order to cleanup the
pt_range_walk internal state, like locking.
An example below:
´´´´
pt_type_flags_t flags = PT_TYPE_ALL;
type = pt_range_walk_start(&ptw, vma, start, vma->vm_end, flags);
while (type != PTW_DONE) {
do_something
type = pt_range_walk_next(&ptw, vma, start, vma->vm_end, flags);
}
pt_range_walk_done(&ptw);
´´´´
The API manages locking within the interface, and also batching, which means
that it can handle contiguous ptes (or pmds in the case of hugetlb)
itself.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
arch/arm64/include/asm/pgtable.h | 1 +
include/linux/mm.h | 2 +
include/linux/pagewalk.h | 106 ++++++++
mm/memory.c | 22 ++
mm/pagewalk.c | 400 +++++++++++++++++++++++++++++++
5 files changed, 531 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 5b5490505b94..9f8cca8880e0 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -642,6 +642,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
#define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT)
#define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
+#define pud_dirty(pud) pte_dirty(pud_pte(pud))
#define pud_young(pud) pte_young(pud_pte(pud))
#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
#define pud_write(pud) pte_write(pud_pte(pud))
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5be3d8a8f806..c4e7fc558476 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2829,6 +2829,8 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
unsigned long addr, pmd_t pmd);
struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t pmd);
+struct folio *vm_normal_folio_pud(struct vm_area_struct *vma,
+ unsigned long addr, pud_t pud);
struct page *vm_normal_page_pud(struct vm_area_struct *vma, unsigned long addr,
pud_t pud);
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 88e18615dd72..f46780c0310f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -204,4 +204,110 @@ struct folio *folio_walk_start(struct folio_walk *fw,
vma_pgtable_walk_end(__vma); \
} while (0)
+typedef int __bitwise pt_type_flags_t;
+
+/*
+ * Types we are interested in returning. Those which are not explicitly set
+ * will be silently ignored by keep walking the page tables.
+ */
+#define PT_TYPE_NONE ((__force pt_type_flags_t)BIT(0))
+#define PT_TYPE_FOLIO ((__force pt_type_flags_t)BIT(1))
+#define PT_TYPE_MARKER ((__force pt_type_flags_t)BIT(2))
+#define PT_TYPE_PFN ((__force pt_type_flags_t)BIT(3))
+#define PT_TYPE_SWAP ((__force pt_type_flags_t)BIT(4))
+#define PT_TYPE_MIGRATION ((__force pt_type_flags_t)BIT(5))
+#define PT_TYPE_DEVICE ((__force pt_type_flags_t)BIT(6))
+#define PT_TYPE_HWPOISON ((__force pt_type_flags_t)BIT(7))
+#define PT_TYPE_ALL (PT_TYPE_NONE | PT_TYPE_FOLIO | PT_TYPE_MARKER | \
+ PT_TYPE_PFN | PT_TYPE_SWAP | PT_TYPE_MIGRATION | \
+ PT_TYPE_DEVICE | PT_TYPE_HWPOISON)
+
+enum pt_range_walk_level {
+ PTW_PUD_LEVEL,
+ PTW_PMD_LEVEL,
+ PTW_PTE_LEVEL,
+};
+
+enum pt_range_walk_type {
+ PTW_ABORT,
+ PTW_DONE,
+ PTW_NONE,
+ PTW_FOLIO,
+ PTW_MARKER,
+ PTW_PFN,
+ PTW_SWAP,
+ PTW_MIGRATION,
+ PTW_DEVICE,
+ PTW_HWPOISON,
+};
+
+/**
+ * struct pt_range_walk - pt_range_walk()
+ * @page: exact folio page referenced (if applicable)
+ * @folio: folio mapped (if any)
+ * @nr_entries: number of contiguous entries of the same type
+ * @size: stores nr_batched * entry_size
+ * @softleaf_entry: softleaf entry (if any)
+ * @writable: whether it is writable
+ * @young: whether it is young
+ * @dirty: whether it is dirty
+ * @present: whether it is present in the page tables
+ * @vma_locked: whether we are holding the vma lock
+ * @pmd_shared: only used for hugetlb
+ * @curr_addr: current addr we are operating on
+ * @next_addr: next addr to be used walk the page tables
+ * @level: page table level
+ * @pte: copy of the entry value (PTW_PTE_LEVEL).
+ * @pmd: copy of the entry value (PTW_PMD_LEVEL).
+ * @pud: copy of the entry value (PTW_PUD_LEVEL).
+ * @mm: the mm_struct we are walking
+ * @vma: the vma we are walking
+ * @ptl: pointer to the page table lock.
+ */
+
+struct pt_range_walk {
+ struct page *page;
+ struct folio *folio;
+ int nr_entries;
+ unsigned long size;
+ softleaf_t softleaf_entry;
+ bool writable;
+ bool young;
+ bool dirty;
+ bool present;
+ bool vma_locked;
+ bool pmd_shared;
+ bool lock_i_mmap;
+ bool i_mmap_locked;
+ unsigned long curr_addr;
+ unsigned long next_addr;
+ enum pt_range_walk_level level;
+ union {
+ pte_t *ptep;
+ pud_t *pudp;
+ pmd_t *pmdp;
+ };
+ union {
+ pte_t pte;
+ pud_t pud;
+ pmd_t pmd;
+ };
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+ spinlock_t *ptl;
+};
+
+enum pt_range_walk_type pt_range_walk(struct pt_range_walk *ptw,
+ struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ pt_type_flags_t flags);
+enum pt_range_walk_type pt_range_walk_start(struct pt_range_walk *ptw,
+ struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ pt_type_flags_t flags);
+enum pt_range_walk_type pt_range_walk_next(struct pt_range_walk *ptw,
+ struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ pt_type_flags_t flags);
+void pt_range_walk_done(struct pt_range_walk *ptw);
#endif /* _LINUX_PAGEWALK_H */
diff --git a/mm/memory.c b/mm/memory.c
index 07778814b4a8..e016bc7a49d9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -850,6 +850,28 @@ struct page *vm_normal_page_pud(struct vm_area_struct *vma,
return __vm_normal_page(vma, addr, pud_pfn(pud), pud_special(pud),
pud_val(pud), PGTABLE_LEVEL_PUD);
}
+
+/**
+ * vm_normal_folio_pud() - Get the "struct folio" associated with a PUD
+ * @vma: The VMA mapping the @pud.
+ * @addr: The address where the @pud is mapped.
+ * @pud: The PUD.
+ *
+ * Get the "struct folio" associated with a PUD. See __vm_normal_page()
+ * for details on "normal" and "special" mappings.
+ *
+ * Return: Returns the "struct folio" if this is a "normal" mapping. Returns
+ * NULL if this is a "special" mapping.
+ */
+struct folio *vm_normal_folio_pud(struct vm_area_struct *vma,
+ unsigned long addr, pud_t pud)
+{
+ struct page *page = vm_normal_page_pud(vma, addr, pud);
+
+ if (page)
+ return page_folio(page);
+ return NULL;
+}
#endif
/**
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index a94c401ab2cf..b71c2d48acd9 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -1029,3 +1029,403 @@ struct folio *folio_walk_start(struct folio_walk *fw,
fw->ptl = ptl;
return page_folio(page);
}
+
+enum pt_range_walk_type pt_range_walk(struct pt_range_walk *ptw,
+ struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ pt_type_flags_t flags)
+{
+ pgd_t *pgdp;
+ p4d_t *p4dp;
+ pud_t *pudp, pud;
+ pmd_t *pmdp, pmd;
+ pte_t *ptep, pte;
+ int nr_batched = 1;
+ spinlock_t *ptl = NULL;
+ unsigned long entry_size;
+ struct page *page;
+ struct folio *folio;
+ enum pt_range_walk_type ret_type = PTW_DONE;
+ bool writable, young, dirty;
+ unsigned long curr_addr, next_addr = ptw->next_addr ? ptw->next_addr : addr;
+
+ if (WARN_ON_ONCE(next_addr < vma->vm_start || next_addr >= vma->vm_end))
+ return ret_type;
+
+ mmap_assert_locked(ptw->mm);
+
+ if (ptw->ptl) {
+ spin_unlock(ptw->ptl);
+ ptw->ptl = NULL;
+ }
+
+ if (ptw->level == PTW_PTE_LEVEL && ptw->ptep) {
+ pte_unmap(ptw->ptep);
+ ptw->ptep = NULL;
+ }
+
+ if (!ptw->vma_locked) {
+ vma_pgtable_walk_begin(vma);
+ ptw->vma_locked = true;
+ ptw->vma = vma;
+ }
+
+keep_walking:
+ ret_type = PTW_DONE;
+ folio = NULL;
+ page = NULL;
+ writable = young = dirty = false;
+ ptw->present = false;
+ ptw->pmd_shared = false;
+ ptw->folio = NULL;
+ ptw->page = NULL;
+
+ curr_addr = next_addr;
+ if (ptl) {
+ spin_unlock(ptl);
+ ptl = NULL;
+ }
+ /*
+ * If we keep walking the page tables because we are not interested
+ * in the type we found, make sure to check whether we reached the end.
+ */
+ if (curr_addr >= end) {
+ ptw->next_addr = next_addr;
+ return ret_type;
+ }
+again:
+ pgdp = pgd_offset(ptw->mm, curr_addr);
+ next_addr = pgd_addr_end(curr_addr, end);
+
+ if (pgd_none_or_clear_bad(pgdp))
+ /* PTW_ABORT? */
+ goto keep_walking;
+
+ next_addr = p4d_addr_end(curr_addr, end);
+ p4dp = p4d_offset(pgdp, curr_addr);
+ if (p4d_none_or_clear_bad(p4dp))
+ /* PTW_ABORT? */
+ goto keep_walking;
+
+ entry_size = PUD_SIZE;
+ ptw->level = PTW_PUD_LEVEL;
+ next_addr = pud_addr_end(curr_addr, end);
+ pudp = pud_offset(p4dp, curr_addr);
+ pud = pudp_get(pudp);
+ if (pud_none(pud)) {
+ if (!(flags & PT_TYPE_NONE))
+ goto keep_walking;
+ ret_type = PTW_NONE;
+ goto found;
+ }
+ /*
+ * For now, there are no architectures which supports pgd or p4d
+ * leafs, pud is the first level that can be a leaf.
+ */
+ if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) &&
+ (!pud_present(pud) || pud_leaf(pud))) {
+ ptl = pud_huge_lock(pudp, vma);
+ if (!ptl)
+ goto again;
+
+ pud = pudp_get(pudp);
+ ptw->pudp = pudp;
+ ptw->pud = pud;
+ if (pud_none(pud)) {
+ if (!(flags & PT_TYPE_NONE))
+ goto keep_walking;
+ ret_type = PTW_NONE;
+ } else if (pud_present(pud) && !pud_leaf(pud)) {
+ spin_unlock(ptl);
+ ptl = NULL;
+ goto pmd_table;
+ } else if (pud_present(pud)) {
+ /*
+ * We do not support PUD-device or pud-PFNMAP, so
+ * if it is present, we must have a folio (Tm).
+ */
+ page = vm_normal_page_pud(vma, curr_addr, pud);
+ if (!page || !(flags & PT_TYPE_FOLIO))
+ goto keep_walking;
+
+ ret_type = PTW_FOLIO;
+ folio = page_folio(page);
+ ptw->present = true;
+ dirty = !!pud_dirty(pud);
+ young = !!pud_young(pud);
+ writable = !!pud_write(pud);
+ } else if (!pud_none(pud)) {
+ /* PUD-hugetlbs can have special swap entries */
+ const softleaf_t entry = softleaf_from_pud(pud);
+
+ ptw->softleaf_entry = entry;
+
+ if (softleaf_is_marker(entry)) {
+ if (!(flags & PT_TYPE_MARKER))
+ goto keep_walking;
+ ret_type = PTW_MARKER;
+ } else if (softleaf_has_pfn(entry)) {
+ if (softleaf_is_migration(entry)) {
+ if (!(flags & PT_TYPE_MIGRATION))
+ goto keep_walking;
+ ret_type = PTW_MIGRATION;
+ } else if (softleaf_is_hwpoison(entry)) {
+ if (!(flags & PT_TYPE_HWPOISON))
+ goto keep_walking;
+ ret_type = PTW_HWPOISON;
+ }
+
+ page = softleaf_to_page(entry);
+ if (page)
+ folio = page_folio(page);
+ }
+ } else {
+ /* We found nothing, keep going */
+ goto keep_walking;
+ }
+
+ /* We found a type */
+ goto found;
+ }
+pmd_table:
+ entry_size = PMD_SIZE;
+ ptw->level = PTW_PMD_LEVEL;
+ next_addr = pmd_addr_end(curr_addr, end);
+ pmdp = pmd_offset(pudp, curr_addr);
+ pmd = pmdp_get_lockless(pmdp);
+ if (pmd_none(pmd)) {
+ if (!(flags & PT_TYPE_NONE))
+ goto keep_walking;
+ ret_type = PTW_NONE;
+ goto found;
+ }
+
+ if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) &&
+ (!pmd_present(pmd) || pmd_leaf(pmd))) {
+ ptl = pmd_huge_lock(pmdp, vma);
+ if (!ptl)
+ goto again;
+
+ pmd = pmdp_get(pmdp);
+ ptw->pmdp = pmdp;
+ ptw->pmd = pmd;
+ if (pmd_none(pmd)) {
+ if (!(flags & PT_TYPE_NONE))
+ goto keep_walking;
+ ret_type = PTW_NONE;
+ } else if (pmd_present(pmd) && !pmd_leaf(pmd)) {
+ spin_unlock(ptl);
+ ptl = NULL;
+ goto pte_table;
+ } else if (pmd_present(pmd)) {
+ page = vm_normal_page_pmd(vma, curr_addr, pmd);
+ if (page) {
+ if (!(flags & PT_TYPE_FOLIO))
+ goto keep_walking;
+ ret_type = PTW_FOLIO;
+ folio = page_folio(page);
+ if (folio_size(folio) > entry_size) {
+ /* We can batch */
+ int max_nr = folio_size(folio) / entry_size;
+
+ nr_batched = folio_pmd_batch(folio, pmdp, &pmd,
+ max_nr, 0,
+ &writable,
+ &young,
+ &dirty);
+ } else {
+ dirty = !!pmd_dirty(pmd);
+ young = !!pmd_young(pmd);
+ writable = !!pmd_write(pmd);
+ }
+ } else if (!page && (is_huge_zero_pmd(pmd) ||
+ vma->vm_flags & VM_PFNMAP)) {
+ if (!(flags & PT_TYPE_PFN))
+ goto keep_walking;
+ /* Create a subtype to differentiate them? */
+ ret_type = PTW_PFN;
+ } else if (!page) {
+ goto keep_walking;
+ }
+ ptw->present = true;
+ next_addr += (nr_batched * entry_size) - entry_size;
+ } else if (!pmd_none(pmd)) {
+ const softleaf_t entry = softleaf_from_pmd(pmd);
+
+ ptw->softleaf_entry = entry;
+
+ if (softleaf_is_marker(entry)) {
+ if (!(flags & PT_TYPE_MARKER))
+ goto keep_walking;
+ ret_type = PTW_MARKER;
+ } else if (softleaf_has_pfn(entry)) {
+ if (softleaf_is_migration(entry)) {
+ if (!(flags & PT_TYPE_MIGRATION))
+ goto keep_walking;
+ ret_type = PTW_MIGRATION;
+ } else if (softleaf_is_hwpoison(entry)) {
+ if (!(flags & PT_TYPE_HWPOISON))
+ goto keep_walking;
+ ret_type = PTW_HWPOISON;
+ } else if (softleaf_is_device_private(entry) ||
+ softleaf_is_device_exclusive(entry)) {
+ if (!(flags & PT_TYPE_DEVICE))
+ goto keep_walking;
+ ptw->present = true;
+ ret_type = PTW_DEVICE;
+ }
+ page = softleaf_to_page(entry);
+ if (page)
+ folio = page_folio(page);
+ }
+ } else {
+ /* We found nothing, keep going */
+ goto keep_walking;
+ }
+
+ if (ret_type != PTW_NONE && is_vm_hugetlb_page(vma) &&
+ hugetlb_pmd_shared((pte_t *)pmdp))
+ ptw->pmd_shared = true;
+
+ goto found;
+ }
+pte_table:
+ entry_size = PAGE_SIZE;
+ ptw->level = PTW_PTE_LEVEL;
+ next_addr = curr_addr + PAGE_SIZE;
+ ptep = pte_offset_map_lock(vma->vm_mm, pmdp, curr_addr, &ptl);
+ if (!ptep)
+ goto again;
+
+ pte = ptep_get(ptep);
+ ptw->ptep = ptep;
+ ptw->pte = pte;
+ if (pte_none(pte)) {
+ if (!(flags & PT_TYPE_NONE))
+ goto not_found;
+ ret_type = PTW_NONE;
+ } else if (pte_present(pte)) {
+ page = vm_normal_page(vma, curr_addr, pte);
+ if (page) {
+ if (!(flags & PT_TYPE_FOLIO))
+ goto not_found;
+ ret_type = PTW_FOLIO;
+ folio = page_folio(page);
+ if (folio_test_large(folio)) {
+ /* We can batch */
+ unsigned long end_addr = pmd_addr_end(curr_addr, end);
+ int max_nr = (end_addr - curr_addr) >> PAGE_SHIFT;
+
+ nr_batched = folio_pte_batch_flags(folio, vma, ptep, &pte, max_nr,
+ FPB_MERGE_WRITE | FPB_MERGE_YOUNG_DIRTY);
+ }
+ } else if (!page && (is_zero_pfn(pte_pfn(pte)) ||
+ vma->vm_flags & VM_PFNMAP)) {
+ if (!(flags & PT_TYPE_PFN))
+ goto not_found;
+ ret_type = PTW_PFN;
+ }
+
+ dirty = !!pte_dirty(pte);
+ young = !!pte_young(pte);
+ writable = !!pte_write(pte);
+ ptw->present = true;
+ next_addr += (nr_batched * entry_size) - entry_size;
+ } else if (!pte_none(pte)) {
+ const softleaf_t entry = softleaf_from_pte(pte);
+
+ ptw->softleaf_entry = entry;
+
+ if (softleaf_is_marker(entry)) {
+ if (!(flags & PT_TYPE_MARKER))
+ goto not_found;
+ ret_type = PTW_MARKER;
+ } else if (softleaf_is_swap(entry)) {
+ unsigned long end_addr = pmd_addr_end(curr_addr, end);
+ int max_nr = (end_addr - curr_addr) >> PAGE_SHIFT;
+
+ if (!(flags & PT_TYPE_SWAP))
+ goto not_found;
+
+ nr_batched = swap_pte_batch(ptep, max_nr, pte);
+ next_addr += (nr_batched * entry_size) - entry_size;
+ ret_type = PTW_SWAP;
+ } else if (softleaf_has_pfn(entry)) {
+ if (softleaf_is_migration(entry)) {
+ if (!(flags & PT_TYPE_MIGRATION))
+ goto not_found;
+ ret_type = PTW_MIGRATION;
+ } else if (softleaf_is_hwpoison(entry)) {
+ if (!(flags & PT_TYPE_HWPOISON))
+ goto not_found;
+ ret_type = PTW_HWPOISON;
+ } else if (softleaf_is_device_private(entry) ||
+ softleaf_is_device_exclusive(entry)) {
+ if (!(flags & PT_TYPE_DEVICE))
+ goto not_found;
+ ptw->present = true;
+ ret_type = PTW_DEVICE;
+ }
+ page = softleaf_to_page(entry);
+ if (page)
+ folio = page_folio(page);
+ }
+ } else {
+not_found:
+ /* We found nothing, keep going */
+ pte_unmap_unlock(ptep, ptl);
+ ptw->ptep = NULL;
+ ptl = NULL;
+ goto keep_walking;
+ }
+
+found:
+ /* Fill in remaining ptw struct before returning */
+ ptw->ptl = ptl;
+ ptw->curr_addr = curr_addr;
+ ptw->next_addr = next_addr;
+ ptw->writable = writable;
+ ptw->young = young;
+ ptw->dirty = dirty;
+ ptw->nr_entries = nr_batched;
+ ptw->size = nr_batched * entry_size;
+ if (folio) {
+ ptw->folio = folio;
+ ptw->page = page + ((curr_addr & (entry_size - 1)) >> PAGE_SHIFT);
+ }
+ return ret_type;
+}
+
+enum pt_range_walk_type pt_range_walk_start(struct pt_range_walk *ptw,
+ struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ pt_type_flags_t flags)
+{
+ if (!ptw->mm)
+ return PTW_DONE;
+ if (addr >= end)
+ return PTW_DONE;
+ return pt_range_walk(ptw, vma, addr, end, flags);
+}
+
+enum pt_range_walk_type pt_range_walk_next(struct pt_range_walk *ptw,
+ struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ pt_type_flags_t flags)
+{
+ /* We went through the complete range */
+ if (ptw->next_addr >= end)
+ return PTW_DONE;
+ return pt_range_walk(ptw, vma, addr, end, flags);
+}
+
+void pt_range_walk_done(struct pt_range_walk *ptw)
+{
+ if (ptw->ptl)
+ spin_unlock(ptw->ptl);
+ if (ptw->level == PTW_PTE_LEVEL && ptw->ptep)
+ pte_unmap(ptw->ptep);
+ if (ptw->vma_locked)
+ vma_pgtable_walk_end(ptw->vma);
+ cond_resched();
+}
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 5/7] mm: Make /proc/pid/smaps use the new generic pagewalk API
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
` (3 preceding siblings ...)
2026-04-26 12:57 ` [RFC PATCH v2 4/7] mm: Implement pt_range_walk Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 6/7] mm: Make /proc/pid/numa_maps " Oscar Salvador
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
Have /proc/pid/smaps make use of the new generic API, and remove
the code which was using the old one.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
fs/proc/task_mmu.c | 307 +++++++++++++++------------------------------
1 file changed, 101 insertions(+), 206 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e091931d7ca1..382c6b02d0e1 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -915,7 +915,7 @@ static void smaps_page_accumulate(struct mem_size_stats *mss,
static void smaps_account(struct mem_size_stats *mss, struct page *page,
bool compound, bool young, bool dirty, bool locked,
- bool present)
+ bool present, int ssize)
{
struct folio *folio = page_folio(page);
int i, nr = compound ? compound_nr(page) : 1;
@@ -923,6 +923,11 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
bool exclusive;
int mapcount;
+ if (ssize) {
+ nr = ssize / PAGE_SIZE;
+ size = ssize;
+ }
+
/*
* First accumulate quantities that depend only on |size| and the type
* of the compound page.
@@ -988,150 +993,6 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
}
}
-#ifdef CONFIG_SHMEM
-static int smaps_pte_hole(unsigned long addr, unsigned long end,
- __always_unused int depth, struct mm_walk *walk)
-{
- struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = walk->vma;
-
- mss->swap += shmem_partial_swap_usage(walk->vma->vm_file->f_mapping,
- linear_page_index(vma, addr),
- linear_page_index(vma, end));
-
- return 0;
-}
-#else
-#define smaps_pte_hole NULL
-#endif /* CONFIG_SHMEM */
-
-static void smaps_pte_hole_lookup(unsigned long addr, struct mm_walk *walk)
-{
-#ifdef CONFIG_SHMEM
- if (walk->ops->pte_hole) {
- /* depth is not used */
- smaps_pte_hole(addr, addr + PAGE_SIZE, 0, walk);
- }
-#endif
-}
-
-static void smaps_pte_entry(pte_t *pte, unsigned long addr,
- struct mm_walk *walk)
-{
- struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = walk->vma;
- bool locked = !!(vma->vm_flags & VM_LOCKED);
- struct page *page = NULL;
- bool present = false, young = false, dirty = false;
- pte_t ptent = ptep_get(pte);
-
- if (pte_present(ptent)) {
- page = vm_normal_page(vma, addr, ptent);
- young = pte_young(ptent);
- dirty = pte_dirty(ptent);
- present = true;
- } else if (pte_none(ptent)) {
- smaps_pte_hole_lookup(addr, walk);
- } else {
- const softleaf_t entry = softleaf_from_pte(ptent);
-
- if (softleaf_is_swap(entry)) {
- int mapcount;
-
- mss->swap += PAGE_SIZE;
- mapcount = swp_swapcount(entry);
- if (mapcount >= 2) {
- u64 pss_delta = (u64)PAGE_SIZE << PSS_SHIFT;
-
- do_div(pss_delta, mapcount);
- mss->swap_pss += pss_delta;
- } else {
- mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
- }
- } else if (softleaf_has_pfn(entry)) {
- if (softleaf_is_device_private(entry))
- present = true;
- page = softleaf_to_page(entry);
- }
- }
-
- if (!page)
- return;
-
- smaps_account(mss, page, false, young, dirty, locked, present);
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
- struct mm_walk *walk)
-{
- struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = walk->vma;
- bool locked = !!(vma->vm_flags & VM_LOCKED);
- struct page *page = NULL;
- bool present = false;
- struct folio *folio;
-
- if (pmd_none(*pmd))
- return;
- if (pmd_present(*pmd)) {
- page = vm_normal_page_pmd(vma, addr, *pmd);
- present = true;
- } else if (unlikely(thp_migration_supported())) {
- const softleaf_t entry = softleaf_from_pmd(*pmd);
-
- if (softleaf_has_pfn(entry))
- page = softleaf_to_page(entry);
- }
- if (IS_ERR_OR_NULL(page))
- return;
- folio = page_folio(page);
- if (folio_test_anon(folio))
- mss->anonymous_thp += HPAGE_PMD_SIZE;
- else if (folio_test_swapbacked(folio))
- mss->shmem_thp += HPAGE_PMD_SIZE;
- else if (folio_is_zone_device(folio))
- /* pass */;
- else
- mss->file_thp += HPAGE_PMD_SIZE;
-
- smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd),
- locked, present);
-}
-#else
-static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
- struct mm_walk *walk)
-{
-}
-#endif
-
-static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
-{
- struct vm_area_struct *vma = walk->vma;
- pte_t *pte;
- spinlock_t *ptl;
-
- ptl = pmd_trans_huge_lock(pmd, vma);
- if (ptl) {
- smaps_pmd_entry(pmd, addr, walk);
- spin_unlock(ptl);
- goto out;
- }
-
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- if (!pte) {
- walk->action = ACTION_AGAIN;
- return 0;
- }
- for (; addr != end; pte++, addr += PAGE_SIZE)
- smaps_pte_entry(pte, addr, walk);
- pte_unmap_unlock(pte - 1, ptl);
-out:
- cond_resched();
- return 0;
-}
-
static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
{
/*
@@ -1228,58 +1089,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
seq_putc(m, '\n');
}
-#ifdef CONFIG_HUGETLB_PAGE
-static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end,
- struct mm_walk *walk)
-{
- struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = walk->vma;
- struct folio *folio = NULL;
- bool present = false;
- spinlock_t *ptl;
- pte_t ptent;
-
- ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
- ptent = huge_ptep_get(walk->mm, addr, pte);
- if (pte_present(ptent)) {
- folio = page_folio(pte_page(ptent));
- present = true;
- } else {
- const softleaf_t entry = softleaf_from_pte(ptent);
-
- if (softleaf_has_pfn(entry))
- folio = softleaf_to_folio(entry);
- }
-
- if (folio) {
- /* We treat non-present entries as "maybe shared". */
- if (!present || folio_maybe_mapped_shared(folio) ||
- hugetlb_pmd_shared(pte))
- mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
- else
- mss->private_hugetlb += huge_page_size(hstate_vma(vma));
- }
- spin_unlock(ptl);
- return 0;
-}
-#else
-#define smaps_hugetlb_range NULL
-#endif /* HUGETLB_PAGE */
-
-static const struct mm_walk_ops smaps_walk_ops = {
- .pmd_entry = smaps_pte_range,
- .hugetlb_entry = smaps_hugetlb_range,
- .walk_lock = PGWALK_RDLOCK,
-};
-
-static const struct mm_walk_ops smaps_shmem_walk_ops = {
- .pmd_entry = smaps_pte_range,
- .hugetlb_entry = smaps_hugetlb_range,
- .pte_hole = smaps_pte_hole,
- .walk_lock = PGWALK_RDLOCK,
-};
-
/*
* Gather mem stats from @vma with the indicated beginning
* address @start, and keep them in @mss.
@@ -1287,14 +1096,20 @@ static const struct mm_walk_ops smaps_shmem_walk_ops = {
* Use vm_start of @vma as the beginning address if @start is 0.
*/
static void smap_gather_stats(struct vm_area_struct *vma,
- struct mem_size_stats *mss, unsigned long start)
+ struct mem_size_stats *mss,
+ unsigned long start)
{
- const struct mm_walk_ops *ops = &smaps_walk_ops;
+ struct pt_range_walk ptw = {
+ .mm = vma->vm_mm
+ };
+ enum pt_range_walk_type type;
+ pt_type_flags_t flags = PT_TYPE_ALL;
- /* Invalid start */
if (start >= vma->vm_end)
return;
+ flags &= ~(PT_TYPE_NONE|PT_TYPE_PFN);
+
if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) {
/*
* For shared or readonly shmem mappings we know that all
@@ -1309,18 +1124,98 @@ static void smap_gather_stats(struct vm_area_struct *vma,
unsigned long shmem_swapped = shmem_swap_usage(vma);
if (!start && (!shmem_swapped || (vma->vm_flags & VM_SHARED) ||
- !(vma->vm_flags & VM_WRITE))) {
+ !(vma->vm_flags & VM_WRITE))) {
mss->swap += shmem_swapped;
} else {
- ops = &smaps_shmem_walk_ops;
+ flags |= PT_TYPE_NONE;
}
}
- /* mmap_lock is held in m_start */
if (!start)
- walk_page_vma(vma, ops, mss);
- else
- walk_page_range(vma->vm_mm, start, vma->vm_end, ops, mss);
+ start = vma->vm_start;
+
+ type = pt_range_walk_start(&ptw, vma, start, vma->vm_end, flags);
+ while (type != PTW_DONE) {
+ unsigned long curr_addr = ptw.curr_addr;
+ bool locked = !!(vma->vm_flags & VM_LOCKED);
+ bool compound = false, account = false;
+ unsigned long swap_size;
+ int mapcount;
+
+ switch (type) {
+ case PTW_FOLIO:
+ case PTW_MIGRATION:
+ case PTW_HWPOISON:
+ case PTW_DEVICE:
+ /*
+ * We either have a folio because vm_normal_folio was
+ * successful, or because we had a special swap entry
+ * and could retrieve it with softleaf_to_page.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ /* HugeTLB */
+ unsigned long size = huge_page_size(hstate_vma(ptw.vma));
+
+ if (!ptw.present || folio_maybe_mapped_shared(ptw.folio) ||
+ ptw.pmd_shared)
+ mss->shared_hugetlb += size;
+ else
+ mss->private_hugetlb += size;
+ } else {
+ account = true;
+ if (ptw.level == PTW_PMD_LEVEL) {
+ /* THP */
+ compound = true;
+ if (folio_test_anon(ptw.folio))
+ mss->anonymous_thp += ptw.size;
+ else if (folio_test_swapbacked(ptw.folio))
+ mss->shmem_thp += ptw.size;
+ else if (folio_is_zone_device(ptw.folio))
+ /* pass */;
+ else
+ mss->file_thp += ptw.size;
+ } else if (ptw.level == PTW_PTE_LEVEL && ptw.nr_entries > 1) {
+ compound = true;
+ }
+ }
+ break;
+ case PTW_SWAP:
+ account = true;
+ swap_size = PAGE_SIZE * ptw.nr_entries;
+ mss->swap += swap_size;
+ mapcount = swp_swapcount(ptw.softleaf_entry);
+ if (mapcount >= 2) {
+ u64 pss_delta = (u64)swap_size << PSS_SHIFT;
+
+ do_div(pss_delta, mapcount);
+ mss->swap_pss += pss_delta;
+ } else {
+ mss->swap_pss += (u64)swap_size << PSS_SHIFT;
+ }
+ break;
+ case PTW_NONE:
+#ifdef CONFIG_SHMEM
+ unsigned long addr = ptw.curr_addr;
+ unsigned long end = ptw.next_addr;
+
+ if (ptw.level == PTW_PMD_LEVEL || ptw.level == PTW_PTE_LEVEL)
+ mss->swap += shmem_partial_swap_usage(vma->vm_file->f_mapping,
+ linear_page_index(vma, addr),
+ linear_page_index(vma, end));
+#endif
+ break;
+ default:
+ /* Ooops */
+ break;
+ }
+
+ if (account && ptw.folio)
+ smaps_account(mss, ptw.page, compound, ptw.young,
+ ptw.dirty, locked, ptw.present, ptw.size);
+ type = pt_range_walk_next(&ptw, vma, start, vma->vm_end, flags);
+ }
+
+ pt_range_walk_done(&ptw);
}
#define SEQ_PUT_DEC(str, val) \
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 6/7] mm: Make /proc/pid/numa_maps use the new generic pagewalk API
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
` (4 preceding siblings ...)
2026-04-26 12:57 ` [RFC PATCH v2 5/7] mm: Make /proc/pid/smaps use the new generic pagewalk API Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 7/7] mm: Make /proc/pid/pagemap " Oscar Salvador
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
Have /proc/pid/numa_maps make use of the new generic API, and remove
the code which was using the old one
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
fs/proc/task_mmu.c | 159 +++++++++------------------------------------
1 file changed, 32 insertions(+), 127 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 382c6b02d0e1..5c8a4b5250a1 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -3061,131 +3061,6 @@ static void gather_stats(struct page *page, struct numa_maps *md, int pte_dirty,
md->node[folio_nid(folio)] += nr_pages;
}
-static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
- unsigned long addr)
-{
- struct page *page;
- int nid;
-
- if (!pte_present(pte))
- return NULL;
-
- page = vm_normal_page(vma, addr, pte);
- if (!page || is_zone_device_page(page))
- return NULL;
-
- if (PageReserved(page))
- return NULL;
-
- nid = page_to_nid(page);
- if (!node_isset(nid, node_states[N_MEMORY]))
- return NULL;
-
- return page;
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static struct page *can_gather_numa_stats_pmd(pmd_t pmd,
- struct vm_area_struct *vma,
- unsigned long addr)
-{
- struct page *page;
- int nid;
-
- if (!pmd_present(pmd))
- return NULL;
-
- page = vm_normal_page_pmd(vma, addr, pmd);
- if (!page)
- return NULL;
-
- if (PageReserved(page))
- return NULL;
-
- nid = page_to_nid(page);
- if (!node_isset(nid, node_states[N_MEMORY]))
- return NULL;
-
- return page;
-}
-#endif
-
-static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
-{
- struct numa_maps *md = walk->private;
- struct vm_area_struct *vma = walk->vma;
- spinlock_t *ptl;
- pte_t *orig_pte;
- pte_t *pte;
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- ptl = pmd_trans_huge_lock(pmd, vma);
- if (ptl) {
- struct page *page;
-
- page = can_gather_numa_stats_pmd(*pmd, vma, addr);
- if (page)
- gather_stats(page, md, pmd_dirty(*pmd),
- HPAGE_PMD_SIZE/PAGE_SIZE);
- spin_unlock(ptl);
- return 0;
- }
-#endif
- orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
- if (!pte) {
- walk->action = ACTION_AGAIN;
- return 0;
- }
- do {
- pte_t ptent = ptep_get(pte);
- struct page *page = can_gather_numa_stats(ptent, vma, addr);
- if (!page)
- continue;
- gather_stats(page, md, pte_dirty(ptent), 1);
-
- } while (pte++, addr += PAGE_SIZE, addr != end);
- pte_unmap_unlock(orig_pte, ptl);
- cond_resched();
- return 0;
-}
-#ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end, struct mm_walk *walk)
-{
- pte_t huge_pte;
- struct numa_maps *md;
- struct page *page;
- spinlock_t *ptl;
-
- ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
- huge_pte = huge_ptep_get(walk->mm, addr, pte);
- if (!pte_present(huge_pte))
- goto out;
-
- page = pte_page(huge_pte);
-
- md = walk->private;
- gather_stats(page, md, pte_dirty(huge_pte), 1);
-out:
- spin_unlock(ptl);
- return 0;
-}
-
-#else
-static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end, struct mm_walk *walk)
-{
- return 0;
-}
-#endif
-
-static const struct mm_walk_ops show_numa_ops = {
- .hugetlb_entry = gather_hugetlb_stats,
- .pmd_entry = gather_pte_stats,
- .walk_lock = PGWALK_RDLOCK,
-};
-
/*
* Display pages allocated per node and memory policy via /proc.
*/
@@ -3197,9 +3072,15 @@ static int show_numa_map(struct seq_file *m, void *v)
struct numa_maps *md = &numa_priv->md;
struct file *file = vma->vm_file;
struct mm_struct *mm = vma->vm_mm;
+ struct pt_range_walk ptw = {
+ .mm = mm
+ };
+ enum pt_range_walk_type type;
+ pt_type_flags_t flags;
char buffer[64];
struct mempolicy *pol;
pgoff_t ilx;
+ int nr_pages;
int nid;
if (!mm)
@@ -3230,8 +3111,32 @@ static int show_numa_map(struct seq_file *m, void *v)
if (is_vm_hugetlb_page(vma))
seq_puts(m, " huge");
- /* mmap_lock is held by m_start */
- walk_page_vma(vma, &show_numa_ops, md);
+ flags = PT_TYPE_FOLIO;
+ type = pt_range_walk_start(&ptw, vma, vma->vm_start, vma->vm_end, flags);
+ while (type != PTW_DONE) {
+
+ if (!ptw.folio || !ptw.page || PageReserved(ptw.page))
+ goto not_found;
+
+ nid = page_to_nid(ptw.page);
+ if (!node_isset(nid, node_states[N_MEMORY]))
+ goto not_found;
+
+ if (is_vm_hugetlb_page(vma))
+ /*
+ * As opposed to THP, HugeTLB counts the entire huge
+ * page as one unit size.
+ */
+ nr_pages = ptw.nr_entries;
+ else
+ nr_pages = ptw.size / PAGE_SIZE;
+
+ gather_stats(ptw.page, md, ptw.dirty, nr_pages);
+not_found:
+ type = pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, flags);
+
+ }
+ pt_range_walk_done(&ptw);
if (!md->pages)
goto out;
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH v2 7/7] mm: Make /proc/pid/pagemap use the new generic pagewalk API
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
` (5 preceding siblings ...)
2026-04-26 12:57 ` [RFC PATCH v2 6/7] mm: Make /proc/pid/numa_maps " Oscar Salvador
@ 2026-04-26 12:57 ` Oscar Salvador
2026-04-26 13:11 ` [RFC PATCH v2 0/7] Implement a " Andrew Morton
2026-04-26 19:01 ` [syzbot ci] " syzbot ci
8 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2026-04-26 12:57 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Oscar Salvador
Have /proc/pid/pagemap make use of the new generic API, and remove
the code which was using the old one.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
arch/x86/include/asm/pgtable.h | 4 +
arch/x86/mm/pgtable.c | 18 +-
fs/proc/task_mmu.c | 1826 +++++++++++++++++---------------
include/linux/leafops.h | 13 +
include/linux/pgtable.h | 30 +
mm/pgtable-generic.c | 10 +
6 files changed, 1054 insertions(+), 847 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a68ff339cd56..1d18f6177784 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1400,6 +1400,10 @@ static inline pud_t pudp_establish(struct vm_area_struct *vma,
}
#endif
+#define __HAVE_ARCH_PUDP_INVALIDATE_AD
+extern pud_t pudp_invalidate_ad(struct vm_area_struct *vma,
+ unsigned long address, pud_t *pudp);
+
#define __HAVE_ARCH_PMDP_INVALIDATE_AD
extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp);
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 2e5ecfdce73c..828f5ca9195e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -530,8 +530,22 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
}
#endif
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
- defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
+#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
+ defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) || \
+ defined CONFIG_HUGETLB_PAGE
+
+pud_t pudp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp)
+{
+ VM_WARN_ON_ONCE(!pud_present(*pudp));
+
+ /*
+ * No flush is necessary. Once an invalid PUD is established, the PUD's
+ * access and dirty bits cannot be updated.
+ */
+ return pudp_establish(vma, address, pudp, pud_mkinvalid(*pudp));
+}
+
pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
pud_t *pudp)
{
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5c8a4b5250a1..2ba7a5f8c5c6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1786,46 +1786,6 @@ static bool __folio_page_mapped_exclusively(struct folio *folio, struct page *pa
return !folio_maybe_mapped_shared(folio);
}
-static int pagemap_pte_hole(unsigned long start, unsigned long end,
- __always_unused int depth, struct mm_walk *walk)
-{
- struct pagemapread *pm = walk->private;
- unsigned long addr = start;
- int err = 0;
-
- while (addr < end) {
- struct vm_area_struct *vma = find_vma(walk->mm, addr);
- pagemap_entry_t pme = make_pme(0, 0);
- /* End of address space hole, which we mark as non-present. */
- unsigned long hole_end;
-
- if (vma)
- hole_end = min(end, vma->vm_start);
- else
- hole_end = end;
-
- for (; addr < hole_end; addr += PAGE_SIZE) {
- err = add_to_pagemap(&pme, pm);
- if (err)
- goto out;
- }
-
- if (!vma)
- break;
-
- /* Addresses in the VMA. */
- if (vma->vm_flags & VM_SOFTDIRTY)
- pme = make_pme(0, PM_SOFT_DIRTY);
- for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
- err = add_to_pagemap(&pme, pm);
- if (err)
- goto out;
- }
- }
-out:
- return err;
-}
-
static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
struct vm_area_struct *vma, unsigned long addr, pte_t pte)
{
@@ -1892,357 +1852,167 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
return make_pme(frame, flags);
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr,
- unsigned long end, struct vm_area_struct *vma,
- struct pagemapread *pm)
-{
- unsigned int idx = (addr & ~PMD_MASK) >> PAGE_SHIFT;
- u64 flags = 0, frame = 0;
- pmd_t pmd = *pmdp;
- struct page *page = NULL;
- struct folio *folio = NULL;
- int err = 0;
-
- if (vma->vm_flags & VM_SOFTDIRTY)
- flags |= PM_SOFT_DIRTY;
+struct pagemap_scan_private {
+ struct pm_scan_arg arg;
+ unsigned long masks_of_interest, cur_vma_category;
+ struct page_region *vec_buf;
+ unsigned long vec_buf_len, vec_buf_index, found_pages;
+ struct page_region __user *vec_out;
+};
- if (pmd_none(pmd))
- goto populate_pagemap;
+static bool pagemap_scan_is_interesting_page(unsigned long categories,
+ const struct pagemap_scan_private *p)
+{
+ categories ^= p->arg.category_inverted;
+ if ((categories & p->arg.category_mask) != p->arg.category_mask)
+ return false;
+ if (p->arg.category_anyof_mask && !(categories & p->arg.category_anyof_mask))
+ return false;
- if (pmd_present(pmd)) {
- page = pmd_page(pmd);
+ return true;
+}
- flags |= PM_PRESENT;
- if (pmd_soft_dirty(pmd))
- flags |= PM_SOFT_DIRTY;
- if (pmd_uffd_wp(pmd))
- flags |= PM_UFFD_WP;
- if (pm->show_pfn)
- frame = pmd_pfn(pmd) + idx;
- } else if (thp_migration_supported()) {
- const softleaf_t entry = softleaf_from_pmd(pmd);
- unsigned long offset;
+#ifdef CONFIG_HUGETLB_PAGE
+static void make_uffd_wp_pud(struct vm_area_struct *vma,
+ unsigned long addr, pud_t *pudp)
+{
+ pud_t old, pud = *pudp;
- if (pm->show_pfn) {
- if (softleaf_has_pfn(entry))
- offset = softleaf_to_pfn(entry) + idx;
- else
- offset = swp_offset(entry) + idx;
- frame = swp_type(entry) |
- (offset << MAX_SWAPFILES_SHIFT);
- }
- flags |= PM_SWAP;
- if (pmd_swp_soft_dirty(pmd))
- flags |= PM_SOFT_DIRTY;
- if (pmd_swp_uffd_wp(pmd))
- flags |= PM_UFFD_WP;
- VM_WARN_ON_ONCE(!pmd_is_migration_entry(pmd));
- page = softleaf_to_page(entry);
+ if (pud_present(pud)) {
+ old = pudp_invalidate_ad(vma, addr, pudp);
+ pud = pud_mkuffd_wp(old);
+ set_pud_at(vma->vm_mm, addr, pudp, pud);
+ } else if (pud_is_migration_entry(pud)) {
+ pud = pud_swp_mkuffd_wp(pud);
+ set_pud_at(vma->vm_mm, addr, pudp, pud);
}
+}
+#endif
- if (page) {
- folio = page_folio(page);
- if (!folio_test_anon(folio))
- flags |= PM_FILE;
- }
+static void make_uffd_wp_pmd(struct vm_area_struct *vma,
+ unsigned long addr, pmd_t *pmdp)
+{
+ pmd_t old, pmd = *pmdp;
-populate_pagemap:
- for (; addr != end; addr += PAGE_SIZE, idx++) {
- u64 cur_flags = flags;
- pagemap_entry_t pme;
+ if (pmd_present(pmd)) {
+ old = pmdp_invalidate_ad(vma, addr, pmdp);
+ pmd = pmd_mkuffd_wp(old);
+ set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
+ } else if (pmd_is_migration_entry(pmd)) {
+ pmd = pmd_swp_mkuffd_wp(pmd);
+ set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
+ }
+}
- if (folio && (flags & PM_PRESENT) &&
- __folio_page_mapped_exclusively(folio, page))
- cur_flags |= PM_MMAP_EXCLUSIVE;
+static void make_uffd_wp_pte(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *pte, pte_t ptent)
+{
+ if (pte_present(ptent)) {
+ pte_t old_pte;
- pme = make_pme(frame, cur_flags);
- err = add_to_pagemap(&pme, pm);
- if (err)
- break;
- if (pm->show_pfn) {
- if (flags & PM_PRESENT)
- frame++;
- else if (flags & PM_SWAP)
- frame += (1 << MAX_SWAPFILES_SHIFT);
- }
+ old_pte = ptep_modify_prot_start(vma, addr, pte);
+ ptent = pte_mkuffd_wp(old_pte);
+ ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
+ } else if (pte_none(ptent)) {
+ set_pte_at(vma->vm_mm, addr, pte,
+ make_pte_marker(PTE_MARKER_UFFD_WP));
+ } else {
+ ptent = pte_swp_mkuffd_wp(ptent);
+ set_pte_at(vma->vm_mm, addr, pte, ptent);
}
- return err;
}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE)
+static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
+ unsigned long addr, unsigned long end)
{
- struct vm_area_struct *vma = walk->vma;
- struct pagemapread *pm = walk->private;
- spinlock_t *ptl;
- pte_t *pte, *orig_pte;
- int err = 0;
+ struct page_region *cur_buf = &p->vec_buf[p->vec_buf_index];
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- ptl = pmd_trans_huge_lock(pmdp, vma);
- if (ptl) {
- err = pagemap_pmd_range_thp(pmdp, addr, end, vma, pm);
- spin_unlock(ptl);
- return err;
- }
+ if (!p->vec_buf)
+ return;
+
+ if (cur_buf->start != addr)
+ cur_buf->end = addr;
+ else
+ cur_buf->start = cur_buf->end = 0;
+
+ p->found_pages -= (end - addr) / PAGE_SIZE;
+}
#endif
+static bool pagemap_scan_push_range(unsigned long categories,
+ struct pagemap_scan_private *p,
+ unsigned long addr, unsigned long end)
+{
+ struct page_region *cur_buf = &p->vec_buf[p->vec_buf_index];
+
/*
- * We can assume that @vma always points to a valid one and @end never
- * goes beyond vma->vm_end.
+ * When there is no output buffer provided at all, the sentinel values
+ * won't match here. There is no other way for `cur_buf->end` to be
+ * non-zero other than it being non-empty.
*/
- orig_pte = pte = pte_offset_map_lock(walk->mm, pmdp, addr, &ptl);
- if (!pte) {
- walk->action = ACTION_AGAIN;
- return err;
+ if (addr == cur_buf->end && categories == cur_buf->categories) {
+ cur_buf->end = end;
+ return true;
}
- for (; addr < end; pte++, addr += PAGE_SIZE) {
- pagemap_entry_t pme;
- pme = pte_to_pagemap_entry(pm, vma, addr, ptep_get(pte));
- err = add_to_pagemap(&pme, pm);
- if (err)
- break;
+ if (cur_buf->end) {
+ if (p->vec_buf_index >= p->vec_buf_len - 1)
+ return false;
+
+ cur_buf = &p->vec_buf[++p->vec_buf_index];
}
- pte_unmap_unlock(orig_pte, ptl);
- cond_resched();
+ cur_buf->start = addr;
+ cur_buf->end = end;
+ cur_buf->categories = categories;
- return err;
+ return true;
}
-#ifdef CONFIG_HUGETLB_PAGE
-/* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
- unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+static int pagemap_scan_output(unsigned long categories,
+ struct pagemap_scan_private *p,
+ unsigned long addr, unsigned long *end)
{
- struct pagemapread *pm = walk->private;
- struct vm_area_struct *vma = walk->vma;
- u64 flags = 0, frame = 0;
- spinlock_t *ptl;
- int err = 0;
- pte_t pte;
-
- if (vma->vm_flags & VM_SOFTDIRTY)
- flags |= PM_SOFT_DIRTY;
-
- ptl = huge_pte_lock(hstate_vma(vma), walk->mm, ptep);
- pte = huge_ptep_get(walk->mm, addr, ptep);
- if (pte_present(pte)) {
- struct folio *folio = page_folio(pte_page(pte));
+ unsigned long n_pages, total_pages;
+ int ret = 0;
- if (!folio_test_anon(folio))
- flags |= PM_FILE;
+ if (!p->vec_buf)
+ return 0;
- if (!folio_maybe_mapped_shared(folio) &&
- !hugetlb_pmd_shared(ptep))
- flags |= PM_MMAP_EXCLUSIVE;
+ categories &= p->arg.return_mask;
- if (huge_pte_uffd_wp(pte))
- flags |= PM_UFFD_WP;
+ n_pages = (*end - addr) / PAGE_SIZE;
+ if (check_add_overflow(p->found_pages, n_pages, &total_pages) ||
+ total_pages > p->arg.max_pages) {
+ size_t n_too_much = total_pages - p->arg.max_pages;
- flags |= PM_PRESENT;
- if (pm->show_pfn)
- frame = pte_pfn(pte) +
- ((addr & ~hmask) >> PAGE_SHIFT);
- } else if (pte_swp_uffd_wp_any(pte)) {
- flags |= PM_UFFD_WP;
+ *end -= n_too_much * PAGE_SIZE;
+ n_pages -= n_too_much;
+ ret = -ENOSPC;
}
- for (; addr != end; addr += PAGE_SIZE) {
- pagemap_entry_t pme = make_pme(frame, flags);
-
- err = add_to_pagemap(&pme, pm);
- if (err)
- break;
- if (pm->show_pfn && (flags & PM_PRESENT))
- frame++;
+ if (!pagemap_scan_push_range(categories, p, addr, *end)) {
+ *end = addr;
+ n_pages = 0;
+ ret = -ENOSPC;
}
- spin_unlock(ptl);
- cond_resched();
+ p->found_pages += n_pages;
+ if (ret)
+ p->arg.walk_end = *end;
- return err;
+ return ret;
}
-#else
-#define pagemap_hugetlb_range NULL
-#endif /* HUGETLB_PAGE */
-
-static const struct mm_walk_ops pagemap_ops = {
- .pmd_entry = pagemap_pmd_range,
- .pte_hole = pagemap_pte_hole,
- .hugetlb_entry = pagemap_hugetlb_range,
- .walk_lock = PGWALK_RDLOCK,
-};
-/*
- * /proc/pid/pagemap - an array mapping virtual pages to pfns
- *
- * For each page in the address space, this file contains one 64-bit entry
- * consisting of the following:
- *
- * Bits 0-54 page frame number (PFN) if present
- * Bits 0-4 swap type if swapped
- * Bits 5-54 swap offset if swapped
- * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
- * Bit 56 page exclusively mapped
- * Bit 57 pte is uffd-wp write-protected
- * Bit 58 pte is a guard region
- * Bits 59-60 zero
- * Bit 61 page is file-page or shared-anon
- * Bit 62 page swapped
- * Bit 63 page present
- *
- * If the page is not present but in swap, then the PFN contains an
- * encoding of the swap file number and the page's offset into the
- * swap. Unmapped pages return a null PFN. This allows determining
- * precisely which pages are mapped (or in swap) and comparing mapped
- * pages between processes.
- *
- * Efficient users of this interface will use /proc/pid/maps to
- * determine which areas of memory are actually mapped and llseek to
- * skip over unmapped regions.
- */
-static ssize_t pagemap_read(struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
+static unsigned long pagemap_page_category(struct pagemap_scan_private *p,
+ struct vm_area_struct *vma,
+ unsigned long addr, pte_t pte)
{
- struct mm_struct *mm = file->private_data;
- struct pagemapread pm;
- unsigned long src;
- unsigned long svpfn;
- unsigned long start_vaddr;
- unsigned long end_vaddr;
- int ret = 0, copied = 0;
-
- if (!mm || !mmget_not_zero(mm))
- goto out;
+ unsigned long categories;
- ret = -EINVAL;
- /* file position must be aligned */
- if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
- goto out_mm;
-
- ret = 0;
- if (!count)
- goto out_mm;
-
- /* do not disclose physical addresses: attack vector */
- pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
-
- pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
- pm.buffer = kmalloc_array(pm.len, PM_ENTRY_BYTES, GFP_KERNEL);
- ret = -ENOMEM;
- if (!pm.buffer)
- goto out_mm;
-
- src = *ppos;
- svpfn = src / PM_ENTRY_BYTES;
- end_vaddr = mm->task_size;
-
- /* watch out for wraparound */
- start_vaddr = end_vaddr;
- if (svpfn <= (ULONG_MAX >> PAGE_SHIFT)) {
- unsigned long end;
-
- ret = mmap_read_lock_killable(mm);
- if (ret)
- goto out_free;
- start_vaddr = untagged_addr_remote(mm, svpfn << PAGE_SHIFT);
- mmap_read_unlock(mm);
-
- end = start_vaddr + ((count / PM_ENTRY_BYTES) << PAGE_SHIFT);
- if (end >= start_vaddr && end < mm->task_size)
- end_vaddr = end;
- }
-
- /* Ensure the address is inside the task */
- if (start_vaddr > mm->task_size)
- start_vaddr = end_vaddr;
-
- ret = 0;
- while (count && (start_vaddr < end_vaddr)) {
- int len;
- unsigned long end;
-
- pm.pos = 0;
- end = (start_vaddr + PAGEMAP_WALK_SIZE) & PAGEMAP_WALK_MASK;
- /* overflow ? */
- if (end < start_vaddr || end > end_vaddr)
- end = end_vaddr;
- ret = mmap_read_lock_killable(mm);
- if (ret)
- goto out_free;
- ret = walk_page_range(mm, start_vaddr, end, &pagemap_ops, &pm);
- mmap_read_unlock(mm);
- start_vaddr = end;
-
- len = min(count, PM_ENTRY_BYTES * pm.pos);
- if (copy_to_user(buf, pm.buffer, len)) {
- ret = -EFAULT;
- goto out_free;
- }
- copied += len;
- buf += len;
- count -= len;
- }
- *ppos += copied;
- if (!ret || ret == PM_END_OF_BUFFER)
- ret = copied;
-
-out_free:
- kfree(pm.buffer);
-out_mm:
- mmput(mm);
-out:
- return ret;
-}
-
-static int pagemap_open(struct inode *inode, struct file *file)
-{
- struct mm_struct *mm;
-
- mm = proc_mem_open(inode, PTRACE_MODE_READ);
- if (IS_ERR_OR_NULL(mm))
- return mm ? PTR_ERR(mm) : -ESRCH;
- file->private_data = mm;
- return 0;
-}
-
-static int pagemap_release(struct inode *inode, struct file *file)
-{
- struct mm_struct *mm = file->private_data;
-
- if (mm)
- mmdrop(mm);
- return 0;
-}
-
-#define PM_SCAN_CATEGORIES (PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | \
- PAGE_IS_FILE | PAGE_IS_PRESENT | \
- PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \
- PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \
- PAGE_IS_GUARD)
-#define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC)
-
-struct pagemap_scan_private {
- struct pm_scan_arg arg;
- unsigned long masks_of_interest, cur_vma_category;
- struct page_region *vec_buf;
- unsigned long vec_buf_len, vec_buf_index, found_pages;
- struct page_region __user *vec_out;
-};
-
-static unsigned long pagemap_page_category(struct pagemap_scan_private *p,
- struct vm_area_struct *vma,
- unsigned long addr, pte_t pte)
-{
- unsigned long categories;
-
- if (pte_none(pte))
- return 0;
+ if (pte_none(pte))
+ return 0;
if (pte_present(pte)) {
struct page *page;
@@ -2285,122 +2055,6 @@ static unsigned long pagemap_page_category(struct pagemap_scan_private *p,
return categories;
}
-static void make_uffd_wp_pte(struct vm_area_struct *vma,
- unsigned long addr, pte_t *pte, pte_t ptent)
-{
- if (pte_present(ptent)) {
- pte_t old_pte;
-
- old_pte = ptep_modify_prot_start(vma, addr, pte);
- ptent = pte_mkuffd_wp(old_pte);
- ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
- } else if (pte_none(ptent)) {
- set_pte_at(vma->vm_mm, addr, pte,
- make_pte_marker(PTE_MARKER_UFFD_WP));
- } else {
- ptent = pte_swp_mkuffd_wp(ptent);
- set_pte_at(vma->vm_mm, addr, pte, ptent);
- }
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static unsigned long pagemap_thp_category(struct pagemap_scan_private *p,
- struct vm_area_struct *vma,
- unsigned long addr, pmd_t pmd)
-{
- unsigned long categories = PAGE_IS_HUGE;
-
- if (pmd_none(pmd))
- return categories;
-
- if (pmd_present(pmd)) {
- struct page *page;
-
- categories |= PAGE_IS_PRESENT;
- if (!pmd_uffd_wp(pmd))
- categories |= PAGE_IS_WRITTEN;
-
- if (p->masks_of_interest & PAGE_IS_FILE) {
- page = vm_normal_page_pmd(vma, addr, pmd);
- if (page && !PageAnon(page))
- categories |= PAGE_IS_FILE;
- }
-
- if (is_huge_zero_pmd(pmd))
- categories |= PAGE_IS_PFNZERO;
- if (pmd_soft_dirty(pmd))
- categories |= PAGE_IS_SOFT_DIRTY;
- } else {
- categories |= PAGE_IS_SWAPPED;
- if (!pmd_swp_uffd_wp(pmd))
- categories |= PAGE_IS_WRITTEN;
- if (pmd_swp_soft_dirty(pmd))
- categories |= PAGE_IS_SOFT_DIRTY;
-
- if (p->masks_of_interest & PAGE_IS_FILE) {
- const softleaf_t entry = softleaf_from_pmd(pmd);
-
- if (softleaf_has_pfn(entry) &&
- !folio_test_anon(softleaf_to_folio(entry)))
- categories |= PAGE_IS_FILE;
- }
- }
-
- return categories;
-}
-
-static void make_uffd_wp_pmd(struct vm_area_struct *vma,
- unsigned long addr, pmd_t *pmdp)
-{
- pmd_t old, pmd = *pmdp;
-
- if (pmd_present(pmd)) {
- old = pmdp_invalidate_ad(vma, addr, pmdp);
- pmd = pmd_mkuffd_wp(old);
- set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
- } else if (pmd_is_migration_entry(pmd)) {
- pmd = pmd_swp_mkuffd_wp(pmd);
- set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
- }
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
-#ifdef CONFIG_HUGETLB_PAGE
-static unsigned long pagemap_hugetlb_category(pte_t pte)
-{
- unsigned long categories = PAGE_IS_HUGE;
-
- if (pte_none(pte))
- return categories;
-
- /*
- * According to pagemap_hugetlb_range(), file-backed HugeTLB
- * page cannot be swapped. So PAGE_IS_FILE is not checked for
- * swapped pages.
- */
- if (pte_present(pte)) {
- categories |= PAGE_IS_PRESENT;
-
- if (!huge_pte_uffd_wp(pte))
- categories |= PAGE_IS_WRITTEN;
- if (!PageAnon(pte_page(pte)))
- categories |= PAGE_IS_FILE;
- if (is_zero_pfn(pte_pfn(pte)))
- categories |= PAGE_IS_PFNZERO;
- if (pte_soft_dirty(pte))
- categories |= PAGE_IS_SOFT_DIRTY;
- } else {
- categories |= PAGE_IS_SWAPPED;
-
- if (!pte_swp_uffd_wp_any(pte))
- categories |= PAGE_IS_WRITTEN;
- if (pte_swp_soft_dirty(pte))
- categories |= PAGE_IS_SOFT_DIRTY;
- }
-
- return categories;
-}
-
static void make_uffd_wp_huge_pte(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t ptent)
@@ -2425,365 +2079,145 @@ static void make_uffd_wp_huge_pte(struct vm_area_struct *vma,
huge_ptep_modify_prot_commit(vma, addr, ptep, ptent,
huge_pte_mkuffd_wp(ptent));
}
-#endif /* CONFIG_HUGETLB_PAGE */
-
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE)
-static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
- unsigned long addr, unsigned long end)
-{
- struct page_region *cur_buf = &p->vec_buf[p->vec_buf_index];
-
- if (!p->vec_buf)
- return;
-
- if (cur_buf->start != addr)
- cur_buf->end = addr;
- else
- cur_buf->start = cur_buf->end = 0;
-
- p->found_pages -= (end - addr) / PAGE_SIZE;
-}
-#endif
-
-static bool pagemap_scan_is_interesting_page(unsigned long categories,
- const struct pagemap_scan_private *p)
-{
- categories ^= p->arg.category_inverted;
- if ((categories & p->arg.category_mask) != p->arg.category_mask)
- return false;
- if (p->arg.category_anyof_mask && !(categories & p->arg.category_anyof_mask))
- return false;
-
- return true;
-}
-static bool pagemap_scan_is_interesting_vma(unsigned long categories,
- const struct pagemap_scan_private *p)
-{
- unsigned long required = p->arg.category_mask & PAGE_IS_WPALLOWED;
-
- categories ^= p->arg.category_inverted;
- if ((categories & required) != required)
- return false;
+/*
+ * /proc/pid/pagemap - an array mapping virtual pages to pfns
+ *
+ * For each page in the address space, this file contains one 64-bit entry
+ * consisting of the following:
+ *
+ * Bits 0-54 page frame number (PFN) if present
+ * Bits 0-4 swap type if swapped
+ * Bits 5-54 swap offset if swapped
+ * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
+ * Bit 56 page exclusively mapped
+ * Bit 57 pte is uffd-wp write-protected
+ * Bit 58 pte is a guard region
+ * Bits 59-60 zero
+ * Bit 61 page is file-page or shared-anon
+ * Bit 62 page swapped
+ * Bit 63 page present
+ *
+ * If the page is not present but in swap, then the PFN contains an
+ * encoding of the swap file number and the page's offset into the
+ * swap. Unmapped pages return a null PFN. This allows determining
+ * precisely which pages are mapped (or in swap) and comparing mapped
+ * pages between processes.
+ *
+ * Efficient users of this interface will use /proc/pid/maps to
+ * determine which areas of memory are actually mapped and llseek to
+ * skip over unmapped regions.
+ */
- return true;
-}
+/*
+ * /proc/pid/pagemap - an array mapping virtual pages to pfns
+ *
+ * For each page in the address space, this file contains one 64-bit entry
+ * consisting of the following:
+ *
+ * Bits 0-54 page frame number (PFN) if present
+ * Bits 0-4 swap type if swapped
+ * Bits 5-54 swap offset if swapped
+ * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
+ * Bit 56 page exclusively mapped
+ * Bit 57 pte is uffd-wp write-protected
+ * Bit 58 pte is a guard region
+ * Bits 59-60 zero
+ * Bit 61 page is file-page or shared-anon
+ * Bit 62 page swapped
+ * Bit 63 page present
+ *
+ * If the page is not present but in swap, then the PFN contains an
+ * encoding of the swap file number and the page's offset into the
+ * swap. Unmapped pages return a null PFN. This allows determining
+ * precisely which pages are mapped (or in swap) and comparing mapped
+ * pages between processes.
+ *
+ * Efficient users of this interface will use /proc/pid/maps to
+ * determine which areas of memory are actually mapped and llseek to
+ * skip over unmapped regions.
+ */
-static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
- struct mm_walk *walk)
+static int pagemap_open(struct inode *inode, struct file *file)
{
- struct pagemap_scan_private *p = walk->private;
- struct vm_area_struct *vma = walk->vma;
- unsigned long vma_category = 0;
- bool wp_allowed = userfaultfd_wp_async(vma) &&
- userfaultfd_wp_use_markers(vma);
-
- if (!wp_allowed) {
- /* User requested explicit failure over wp-async capability */
- if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
- return -EPERM;
- /*
- * User requires wr-protect, and allows silently skipping
- * unsupported vmas.
- */
- if (p->arg.flags & PM_SCAN_WP_MATCHING)
- return 1;
- /*
- * Then the request doesn't involve wr-protects at all,
- * fall through to the rest checks, and allow vma walk.
- */
- }
-
- if (vma->vm_flags & VM_PFNMAP)
- return 1;
-
- if (wp_allowed)
- vma_category |= PAGE_IS_WPALLOWED;
-
- if (vma->vm_flags & VM_SOFTDIRTY)
- vma_category |= PAGE_IS_SOFT_DIRTY;
-
- if (!pagemap_scan_is_interesting_vma(vma_category, p))
- return 1;
-
- p->cur_vma_category = vma_category;
+ struct mm_struct *mm;
+ mm = proc_mem_open(inode, PTRACE_MODE_READ);
+ if (IS_ERR_OR_NULL(mm))
+ return mm ? PTR_ERR(mm) : -ESRCH;
+ file->private_data = mm;
return 0;
}
-static bool pagemap_scan_push_range(unsigned long categories,
- struct pagemap_scan_private *p,
- unsigned long addr, unsigned long end)
-{
- struct page_region *cur_buf = &p->vec_buf[p->vec_buf_index];
-
- /*
- * When there is no output buffer provided at all, the sentinel values
- * won't match here. There is no other way for `cur_buf->end` to be
- * non-zero other than it being non-empty.
- */
- if (addr == cur_buf->end && categories == cur_buf->categories) {
- cur_buf->end = end;
- return true;
- }
-
- if (cur_buf->end) {
- if (p->vec_buf_index >= p->vec_buf_len - 1)
- return false;
-
- cur_buf = &p->vec_buf[++p->vec_buf_index];
- }
-
- cur_buf->start = addr;
- cur_buf->end = end;
- cur_buf->categories = categories;
-
- return true;
-}
-
-static int pagemap_scan_output(unsigned long categories,
- struct pagemap_scan_private *p,
- unsigned long addr, unsigned long *end)
-{
- unsigned long n_pages, total_pages;
- int ret = 0;
-
- if (!p->vec_buf)
- return 0;
-
- categories &= p->arg.return_mask;
-
- n_pages = (*end - addr) / PAGE_SIZE;
- if (check_add_overflow(p->found_pages, n_pages, &total_pages) ||
- total_pages > p->arg.max_pages) {
- size_t n_too_much = total_pages - p->arg.max_pages;
- *end -= n_too_much * PAGE_SIZE;
- n_pages -= n_too_much;
- ret = -ENOSPC;
- }
-
- if (!pagemap_scan_push_range(categories, p, addr, *end)) {
- *end = addr;
- n_pages = 0;
- ret = -ENOSPC;
- }
-
- p->found_pages += n_pages;
- if (ret)
- p->arg.walk_end = *end;
-
- return ret;
-}
-
-static int pagemap_scan_thp_entry(pmd_t *pmd, unsigned long start,
- unsigned long end, struct mm_walk *walk)
-{
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- struct pagemap_scan_private *p = walk->private;
- struct vm_area_struct *vma = walk->vma;
- unsigned long categories;
- spinlock_t *ptl;
- int ret = 0;
-
- ptl = pmd_trans_huge_lock(pmd, vma);
- if (!ptl)
- return -ENOENT;
-
- categories = p->cur_vma_category |
- pagemap_thp_category(p, vma, start, *pmd);
-
- if (!pagemap_scan_is_interesting_page(categories, p))
- goto out_unlock;
-
- ret = pagemap_scan_output(categories, p, start, &end);
- if (start == end)
- goto out_unlock;
-
- if (~p->arg.flags & PM_SCAN_WP_MATCHING)
- goto out_unlock;
- if (~categories & PAGE_IS_WRITTEN)
- goto out_unlock;
-
- /*
- * Break huge page into small pages if the WP operation
- * needs to be performed on a portion of the huge page.
- */
- if (end != start + HPAGE_SIZE) {
- spin_unlock(ptl);
- split_huge_pmd(vma, pmd, start);
- pagemap_scan_backout_range(p, start, end);
- /* Report as if there was no THP */
- return -ENOENT;
- }
-
- make_uffd_wp_pmd(vma, start, pmd);
- flush_tlb_range(vma, start, end);
-out_unlock:
- spin_unlock(ptl);
- return ret;
-#else /* !CONFIG_TRANSPARENT_HUGEPAGE */
- return -ENOENT;
-#endif
-}
-
-static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start,
- unsigned long end, struct mm_walk *walk)
-{
- struct pagemap_scan_private *p = walk->private;
- struct vm_area_struct *vma = walk->vma;
- unsigned long addr, flush_end = 0;
- pte_t *pte, *start_pte;
- spinlock_t *ptl;
- int ret;
-
- ret = pagemap_scan_thp_entry(pmd, start, end, walk);
- if (ret != -ENOENT)
- return ret;
-
- ret = 0;
- start_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
- if (!pte) {
- walk->action = ACTION_AGAIN;
- return 0;
- }
-
- lazy_mmu_mode_enable();
-
- if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) {
- /* Fast path for performing exclusive WP */
- for (addr = start; addr != end; pte++, addr += PAGE_SIZE) {
- pte_t ptent = ptep_get(pte);
-
- if ((pte_present(ptent) && pte_uffd_wp(ptent)) ||
- pte_swp_uffd_wp_any(ptent))
- continue;
- make_uffd_wp_pte(vma, addr, pte, ptent);
- if (!flush_end)
- start = addr;
- flush_end = addr + PAGE_SIZE;
- }
- goto flush_and_return;
- }
-
- if (!p->arg.category_anyof_mask && !p->arg.category_inverted &&
- p->arg.category_mask == PAGE_IS_WRITTEN &&
- p->arg.return_mask == PAGE_IS_WRITTEN) {
- for (addr = start; addr < end; pte++, addr += PAGE_SIZE) {
- unsigned long next = addr + PAGE_SIZE;
- pte_t ptent = ptep_get(pte);
-
- if ((pte_present(ptent) && pte_uffd_wp(ptent)) ||
- pte_swp_uffd_wp_any(ptent))
- continue;
- ret = pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN,
- p, addr, &next);
- if (next == addr)
- break;
- if (~p->arg.flags & PM_SCAN_WP_MATCHING)
- continue;
- make_uffd_wp_pte(vma, addr, pte, ptent);
- if (!flush_end)
- start = addr;
- flush_end = next;
- }
- goto flush_and_return;
- }
-
- for (addr = start; addr != end; pte++, addr += PAGE_SIZE) {
- pte_t ptent = ptep_get(pte);
- unsigned long categories = p->cur_vma_category |
- pagemap_page_category(p, vma, addr, ptent);
- unsigned long next = addr + PAGE_SIZE;
-
- if (!pagemap_scan_is_interesting_page(categories, p))
- continue;
-
- ret = pagemap_scan_output(categories, p, addr, &next);
- if (next == addr)
- break;
+static int pagemap_release(struct inode *inode, struct file *file)
+{
+ struct mm_struct *mm = file->private_data;
- if (~p->arg.flags & PM_SCAN_WP_MATCHING)
- continue;
- if (~categories & PAGE_IS_WRITTEN)
- continue;
+ if (mm)
+ mmdrop(mm);
+ return 0;
+}
- make_uffd_wp_pte(vma, addr, pte, ptent);
- if (!flush_end)
- start = addr;
- flush_end = next;
- }
+#define PM_SCAN_CATEGORIES (PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | \
+ PAGE_IS_FILE | PAGE_IS_PRESENT | \
+ PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \
+ PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \
+ PAGE_IS_GUARD)
+#define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC)
-flush_and_return:
- if (flush_end)
- flush_tlb_range(vma, start, addr);
+static bool pagemap_scan_is_interesting_vma(unsigned long categories,
+ const struct pagemap_scan_private *p)
+{
+ unsigned long required = p->arg.category_mask & PAGE_IS_WPALLOWED;
- lazy_mmu_mode_disable();
- pte_unmap_unlock(start_pte, ptl);
+ categories ^= p->arg.category_inverted;
+ if ((categories & required) != required)
+ return false;
- cond_resched();
- return ret;
+ return true;
}
-#ifdef CONFIG_HUGETLB_PAGE
-static int pagemap_scan_hugetlb_entry(pte_t *ptep, unsigned long hmask,
- unsigned long start, unsigned long end,
- struct mm_walk *walk)
+static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
+ struct mm_walk *walk)
{
struct pagemap_scan_private *p = walk->private;
struct vm_area_struct *vma = walk->vma;
- unsigned long categories;
- spinlock_t *ptl;
- int ret = 0;
- pte_t pte;
-
- if (~p->arg.flags & PM_SCAN_WP_MATCHING) {
- /* Go the short route when not write-protecting pages. */
-
- pte = huge_ptep_get(walk->mm, start, ptep);
- categories = p->cur_vma_category | pagemap_hugetlb_category(pte);
-
- if (!pagemap_scan_is_interesting_page(categories, p))
- return 0;
+ unsigned long vma_category = 0;
+ bool wp_allowed = userfaultfd_wp_async(vma) &&
+ userfaultfd_wp_use_markers(vma);
- return pagemap_scan_output(categories, p, start, &end);
+ if (!wp_allowed) {
+ /* User requested explicit failure over wp-async capability */
+ if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
+ return -EPERM;
+ /*
+ * User requires wr-protect, and allows silently skipping
+ * unsupported vmas.
+ */
+ if (p->arg.flags & PM_SCAN_WP_MATCHING)
+ return 1;
+ /*
+ * Then the request doesn't involve wr-protects at all,
+ * fall through to the rest checks, and allow vma walk.
+ */
}
- i_mmap_lock_write(vma->vm_file->f_mapping);
- ptl = huge_pte_lock(hstate_vma(vma), vma->vm_mm, ptep);
-
- pte = huge_ptep_get(walk->mm, start, ptep);
- categories = p->cur_vma_category | pagemap_hugetlb_category(pte);
-
- if (!pagemap_scan_is_interesting_page(categories, p))
- goto out_unlock;
-
- ret = pagemap_scan_output(categories, p, start, &end);
- if (start == end)
- goto out_unlock;
+ if (vma->vm_flags & VM_PFNMAP)
+ return 1;
- if (~categories & PAGE_IS_WRITTEN)
- goto out_unlock;
+ if (wp_allowed)
+ vma_category |= PAGE_IS_WPALLOWED;
- if (end != start + HPAGE_SIZE) {
- /* Partial HugeTLB page WP isn't possible. */
- pagemap_scan_backout_range(p, start, end);
- p->arg.walk_end = start;
- ret = 0;
- goto out_unlock;
- }
+ if (vma->vm_flags & VM_SOFTDIRTY)
+ vma_category |= PAGE_IS_SOFT_DIRTY;
- make_uffd_wp_huge_pte(vma, start, ptep, pte);
- flush_hugetlb_tlb_range(vma, start, end);
+ if (!pagemap_scan_is_interesting_vma(vma_category, p))
+ return 1;
-out_unlock:
- spin_unlock(ptl);
- i_mmap_unlock_write(vma->vm_file->f_mapping);
+ p->cur_vma_category = vma_category;
- return ret;
+ return 0;
}
-#else
-#define pagemap_scan_hugetlb_entry NULL
-#endif
static int pagemap_scan_pte_hole(unsigned long addr, unsigned long end,
int depth, struct mm_walk *walk)
@@ -2809,13 +2243,6 @@ static int pagemap_scan_pte_hole(unsigned long addr, unsigned long end,
return ret;
}
-static const struct mm_walk_ops pagemap_scan_ops = {
- .test_walk = pagemap_scan_test_walk,
- .pmd_entry = pagemap_scan_pmd_entry,
- .pte_hole = pagemap_scan_pte_hole,
- .hugetlb_entry = pagemap_scan_hugetlb_entry,
-};
-
static int pagemap_scan_get_args(struct pm_scan_arg *arg,
unsigned long uarg)
{
@@ -2858,64 +2285,439 @@ static int pagemap_scan_get_args(struct pm_scan_arg *arg,
return 0;
}
-static int pagemap_scan_writeback_args(struct pm_scan_arg *arg,
- unsigned long uargl)
+static int pagemap_scan_writeback_args(struct pm_scan_arg *arg,
+ unsigned long uargl)
+{
+ struct pm_scan_arg __user *uarg = (void __user *)uargl;
+
+ if (copy_to_user(&uarg->walk_end, &arg->walk_end, sizeof(arg->walk_end)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int pagemap_scan_init_bounce_buffer(struct pagemap_scan_private *p)
+{
+ if (!p->arg.vec_len)
+ return 0;
+
+ p->vec_buf_len = min_t(size_t, PAGEMAP_WALK_SIZE >> PAGE_SHIFT,
+ p->arg.vec_len);
+ p->vec_buf = kmalloc_objs(*p->vec_buf, p->vec_buf_len);
+ if (!p->vec_buf)
+ return -ENOMEM;
+
+ p->vec_buf->start = p->vec_buf->end = 0;
+ p->vec_out = (struct page_region __user *)(long)p->arg.vec;
+
+ return 0;
+}
+
+static long pagemap_scan_flush_buffer(struct pagemap_scan_private *p)
+{
+ const struct page_region *buf = p->vec_buf;
+ long n = p->vec_buf_index;
+
+ if (!p->vec_buf)
+ return 0;
+
+ if (buf[n].end != buf[n].start)
+ n++;
+
+ if (!n)
+ return 0;
+
+ if (copy_to_user(p->vec_out, buf, n * sizeof(*buf)))
+ return -EFAULT;
+
+ p->arg.vec_len -= n;
+ p->vec_out += n;
+
+ p->vec_buf_index = 0;
+ p->vec_buf_len = min_t(size_t, p->vec_buf_len, p->arg.vec_len);
+ p->vec_buf->start = p->vec_buf->end = 0;
+
+ return n;
+}
+
+static unsigned long pagemap_set_category(struct pagemap_scan_private *p,
+ struct pt_range_walk *ptw,
+ enum pt_range_walk_type type)
+{
+ unsigned long categories = 0;
+
+ if (ptw->level != PTW_PTE_LEVEL)
+ categories |= PAGE_IS_HUGE;
+
+ if (ptw->present) {
+ categories |= PAGE_IS_PRESENT;
+
+ if (type == PTW_FOLIO && !PageAnon(ptw->page))
+ categories |= PAGE_IS_FILE;
+ if (type == PTW_PFN)
+ categories |= PAGE_IS_PFNZERO;
+ } else {
+ categories |= PAGE_IS_SWAPPED;
+ }
+
+ switch (ptw->level) {
+ case PTW_PUD_LEVEL:
+ if (ptw->present) {
+ if (!pud_uffd_wp(ptw->pud))
+ categories |= PAGE_IS_WRITTEN;
+ if (pud_soft_dirty(ptw->pud))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ } else {
+ if (!pud_swp_uffd_wp(ptw->pud))
+ categories |= PAGE_IS_WRITTEN;
+ if (pud_swp_soft_dirty(ptw->pud))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ }
+ break;
+ case PTW_PMD_LEVEL:
+ if (ptw->present) {
+ if (!pmd_uffd_wp(ptw->pmd))
+ categories |= PAGE_IS_WRITTEN;
+ if (pmd_soft_dirty(ptw->pmd))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ } else {
+ if (p->masks_of_interest & PAGE_IS_FILE) {
+ const softleaf_t entry = softleaf_from_pmd(ptw->pmd);
+
+ if (softleaf_has_pfn(entry) &&
+ !folio_test_anon(softleaf_to_folio(entry)))
+ categories |= PAGE_IS_FILE;
+ }
+
+ if (!pmd_swp_uffd_wp(ptw->pmd))
+ categories |= PAGE_IS_WRITTEN;
+
+ if (pmd_swp_soft_dirty(ptw->pmd))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ }
+ break;
+ case PTW_PTE_LEVEL:
+ if (ptw->present) {
+ if (!pte_uffd_wp(ptw->pte))
+ categories |= PAGE_IS_WRITTEN;
+ if (pte_soft_dirty(ptw->pte))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ } else {
+ if (!pte_swp_uffd_wp_any(ptw->pte))
+ categories |= PAGE_IS_WRITTEN;
+ if (pte_swp_soft_dirty(ptw->pte))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ }
+ break;
+ }
+
+ return categories;
+}
+
+static unsigned long pagemap_hugetlb_category(pte_t pte)
+{
+ unsigned long categories = PAGE_IS_HUGE;
+
+ if (pte_none(pte))
+ return categories;
+
+ /*
+ * According to pagemap_hugetlb_range(), file-backed HugeTLB
+ * page cannot be swapped. So PAGE_IS_FILE is not checked for
+ * swapped pages.
+ */
+ if (pte_present(pte)) {
+ categories |= PAGE_IS_PRESENT;
+
+ if (!huge_pte_uffd_wp(pte))
+ categories |= PAGE_IS_WRITTEN;
+ if (!PageAnon(pte_page(pte)))
+ categories |= PAGE_IS_FILE;
+ if (is_zero_pfn(pte_pfn(pte)))
+ categories |= PAGE_IS_PFNZERO;
+ if (pte_soft_dirty(pte))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ } else {
+ categories |= PAGE_IS_SWAPPED;
+
+ if (!pte_swp_uffd_wp_any(pte))
+ categories |= PAGE_IS_WRITTEN;
+ if (pte_swp_soft_dirty(pte))
+ categories |= PAGE_IS_SOFT_DIRTY;
+ }
+
+ return categories;
+}
+
+static int pagemap_scan_walk(struct vm_area_struct *vma, struct pagemap_scan_private *p,
+ unsigned long addr)
+{
+ int ret = 0;
+ struct pt_range_walk ptw = {
+ .mm = vma->vm_mm
+ };
+ enum pt_range_walk_type type;
+ pt_type_flags_t flags = PT_TYPE_ALL;
+
+start_again:
+ type = pt_range_walk_start(&ptw, vma, addr, vma->vm_end, flags);
+ while (type != PTW_DONE) {
+ bool must_return = false;
+ unsigned long categories = p->cur_vma_category |
+ pagemap_set_category(p, &ptw, type);
+ unsigned long addr;
+ unsigned long flush_end = 0;
+ unsigned long end = ptw.next_addr;
+ unsigned long curr_addr = ptw.curr_addr;
+ pte_t *ptep;
+
+ addr = curr_addr;
+
+ if (type == PTW_NONE) {
+ int err;
+
+ if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p))
+ goto keep_walking;
+
+ ret = pagemap_scan_output(p->cur_vma_category, p, addr, &end);
+ if (curr_addr == end)
+ goto out;
+ if (~p->arg.flags & PM_SCAN_WP_MATCHING)
+ goto keep_walking;
+
+ err = uffd_wp_range(vma, curr_addr, end - curr_addr, true);
+ if (err < 0) {
+ ret = err;
+ goto out;
+ }
+ goto keep_walking;
+ }
+
+ if (ptw.level != PTW_PTE_LEVEL) {
+ if (is_vm_hugetlb_page(ptw.vma)) {
+ if (~p->arg.flags & PM_SCAN_WP_MATCHING) {
+ categories = 0;
+ categories = p->cur_vma_category |
+ pagemap_hugetlb_category(ptw.pte);
+ if (!pagemap_scan_is_interesting_page(categories, p))
+ goto keep_walking;
+
+ ret = pagemap_scan_output(categories, p, curr_addr, &end);
+ if (ret)
+ goto out;
+ else
+ goto keep_walking;
+ }
+ }
+
+ if (is_vm_hugetlb_page(ptw.vma)) {
+ categories = 0;
+ categories = p->cur_vma_category |
+ pagemap_hugetlb_category(ptw.pte);
+ }
+
+ if (!pagemap_scan_is_interesting_page(categories, p))
+ goto keep_walking;
+
+ ret = pagemap_scan_output(categories, p, curr_addr, &end);
+ if (curr_addr == end)
+ goto out;
+
+ if (~categories & PAGE_IS_WRITTEN)
+ goto keep_walking;
+
+ if (end != curr_addr + HPAGE_SIZE) {
+ if (is_vm_hugetlb_page(ptw.vma)) {
+ /* Partial HugeTLB page WP isn't possible. */
+ pagemap_scan_backout_range(p, curr_addr, end);
+ p->arg.walk_end = curr_addr;
+ ret = 0;
+ goto pmd_split;
+ }
+ if (ptw.level == PTW_PMD_LEVEL) {
+ pt_range_walk_done(&ptw);
+ split_huge_pmd(ptw.vma, ptw.pmdp, curr_addr);
+ pagemap_scan_backout_range(p, curr_addr, end);
+ /* Relaunch now that we split the pmd */
+ goto start_again;
+ }
+ }
+ } else {
+pmd_split:
+ lazy_mmu_mode_enable();
+ ptep = ptw.ptep;
+ if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) {
+ for (addr = curr_addr; addr != end; ptep++, addr += PAGE_SIZE) {
+ pte_t ptent = ptep_get(ptep);
+
+ ptw.next_addr = addr + PAGE_SIZE;
+ if ((pte_present(ptent) && pte_uffd_wp(ptent)) ||
+ pte_swp_uffd_wp_any(ptent))
+ continue;
+ make_uffd_wp_pte(vma, addr, ptep, ptent);
+ if (!flush_end)
+ curr_addr = addr;
+ flush_end = addr + PAGE_SIZE;
+ }
+ goto flush_and_return;
+ }
+
+ if (!p->arg.category_anyof_mask && !p->arg.category_inverted &&
+ p->arg.category_mask == PAGE_IS_WRITTEN &&
+ p->arg.return_mask == PAGE_IS_WRITTEN) {
+ for (addr = curr_addr; addr < end; ptep++, addr += PAGE_SIZE) {
+ unsigned long next = addr + PAGE_SIZE;
+ pte_t ptent = ptep_get(ptep);
+
+ ptw.next_addr = addr + PAGE_SIZE;
+ if ((pte_present(ptent) && pte_uffd_wp(ptent)) ||
+ pte_swp_uffd_wp_any(ptent))
+ continue;
+ ret = pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN,
+ p, addr, &next);
+ if (next == addr) {
+ must_return = true;
+ break;
+ }
+ if (~p->arg.flags & PM_SCAN_WP_MATCHING)
+ continue;
+ make_uffd_wp_pte(vma, addr, ptep, ptent);
+ if (!flush_end)
+ curr_addr = addr;
+ flush_end = next;
+ }
+ goto flush_and_return;
+ }
+
+ for (addr = curr_addr; addr != end; ptep++, addr += PAGE_SIZE) {
+ pte_t ptent = ptep_get(ptep);
+ unsigned long categories = p->cur_vma_category |
+ pagemap_page_category(p, vma, addr, ptent);
+ unsigned long next = addr + PAGE_SIZE;
+
+ ptw.next_addr = addr + PAGE_SIZE;
+ if (!pagemap_scan_is_interesting_page(categories, p))
+ continue;
+
+ ret = pagemap_scan_output(categories, p, addr, &next);
+ if (next == addr) {
+ must_return = true;
+ break;
+ }
+
+ if (~p->arg.flags & PM_SCAN_WP_MATCHING)
+ continue;
+ if (~categories & PAGE_IS_WRITTEN)
+ continue;
+
+ make_uffd_wp_pte(vma, addr, ptep, ptent);
+ if (!flush_end)
+ curr_addr = addr;
+ flush_end = next;
+ }
+ }
+
+ if (ptw.level == PTW_PUD_LEVEL) {
+ if (is_vm_hugetlb_page(ptw.vma))
+ make_uffd_wp_huge_pte(vma, curr_addr, ptw.ptep, ptw.pte);
+ else
+ make_uffd_wp_pud(ptw.vma, curr_addr, ptw.pudp);
+ }
+
+ if (ptw.level == PTW_PMD_LEVEL) {
+ if (is_vm_hugetlb_page(ptw.vma))
+ make_uffd_wp_huge_pte(vma, curr_addr, ptw.ptep, ptw.pte);
+ else
+ make_uffd_wp_pmd(ptw.vma, curr_addr, ptw.pmdp);
+ }
+
+ if (is_vm_hugetlb_page(ptw.vma)) {
+ flush_hugetlb_tlb_range(vma, curr_addr, end);
+ } else {
+flush_and_return:
+ if (flush_end || ptw.level != PTW_PTE_LEVEL)
+ flush_tlb_range(vma, curr_addr, end);
+ if (ptw.level == PTW_PTE_LEVEL)
+ lazy_mmu_mode_disable();
+ }
+ if (must_return)
+ goto out;
+keep_walking:
+ type = pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, flags);
+ }
+out:
+ pt_range_walk_done(&ptw);
+ return ret;
+}
+
+static int pagemap_scan_test_lab(unsigned long start, unsigned long end,
+ struct pagemap_scan_private *p,
+ struct vm_area_struct *vma)
{
- struct pm_scan_arg __user *uarg = (void __user *)uargl;
+ unsigned long vma_category = 0;
+ bool wp_allowed = userfaultfd_wp_async(vma) &&
+ userfaultfd_wp_use_markers(vma);
- if (copy_to_user(&uarg->walk_end, &arg->walk_end, sizeof(arg->walk_end)))
- return -EFAULT;
+ if (!wp_allowed) {
+ /* User requested explicit failure over wp-async capability */
+ if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
+ return -EPERM;
+ /*
+ * User requires wr-protect, and allows silently skipping
+ * unsupported vmas.
+ */
+ if (p->arg.flags & PM_SCAN_WP_MATCHING)
+ return 1;
+ /*
+ * Then the request doesn't involve wr-protects at all,
+ * fall through to the rest checks, and allow vma walk.
+ */
+ }
- return 0;
-}
+ if (vma->vm_flags & VM_PFNMAP)
+ return 1;
-static int pagemap_scan_init_bounce_buffer(struct pagemap_scan_private *p)
-{
- if (!p->arg.vec_len)
- return 0;
+ if (wp_allowed)
+ vma_category |= PAGE_IS_WPALLOWED;
- p->vec_buf_len = min_t(size_t, PAGEMAP_WALK_SIZE >> PAGE_SHIFT,
- p->arg.vec_len);
- p->vec_buf = kmalloc_objs(*p->vec_buf, p->vec_buf_len);
- if (!p->vec_buf)
- return -ENOMEM;
+ if (vma->vm_flags & VM_SOFTDIRTY)
+ vma_category |= PAGE_IS_SOFT_DIRTY;
- p->vec_buf->start = p->vec_buf->end = 0;
- p->vec_out = (struct page_region __user *)(long)p->arg.vec;
+ if (!pagemap_scan_is_interesting_vma(vma_category, p))
+ return 1;
+
+ p->cur_vma_category = vma_category;
return 0;
}
-static long pagemap_scan_flush_buffer(struct pagemap_scan_private *p)
+static int pagemap_scan_pte_hole_lab(unsigned long addr, unsigned long end,
+ struct pagemap_scan_private *p,
+ struct vm_area_struct *vma)
{
- const struct page_region *buf = p->vec_buf;
- long n = p->vec_buf_index;
-
- if (!p->vec_buf)
- return 0;
-
- if (buf[n].end != buf[n].start)
- n++;
+ int ret, err;
- if (!n)
+ if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p))
return 0;
- if (copy_to_user(p->vec_out, buf, n * sizeof(*buf)))
- return -EFAULT;
+ ret = pagemap_scan_output(p->cur_vma_category, p, addr, &end);
+ if (addr == end)
+ return ret;
- p->arg.vec_len -= n;
- p->vec_out += n;
+ if (~p->arg.flags & PM_SCAN_WP_MATCHING)
+ return ret;
- p->vec_buf_index = 0;
- p->vec_buf_len = min_t(size_t, p->vec_buf_len, p->arg.vec_len);
- p->vec_buf->start = p->vec_buf->end = 0;
+ err = uffd_wp_range(vma, addr, end - addr, true);
+ if (err < 0)
+ ret = err;
- return n;
+ return ret;
}
static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
{
struct pagemap_scan_private p = {0};
+ struct vm_area_struct *vma;
unsigned long walk_start;
size_t n_ranges_out = 0;
int ret;
@@ -2933,6 +2735,7 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
for (walk_start = p.arg.start; walk_start < p.arg.end;
walk_start = p.arg.walk_end) {
struct mmu_notifier_range range;
+ unsigned long next;
long n_out;
if (fatal_signal_pending(current)) {
@@ -2951,8 +2754,42 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
mmu_notifier_invalidate_range_start(&range);
}
- ret = walk_page_range(mm, walk_start, p.arg.end,
- &pagemap_scan_ops, &p);
+ vma = find_vma(mm, walk_start);
+ do {
+ if (!vma) {
+ walk_start = p.arg.end;
+ next = p.arg.end;
+ ret = pagemap_scan_pte_hole_lab(walk_start, next, &p, NULL);
+ if (ret)
+ break;
+ } else if (walk_start < vma->vm_start) {
+ next = min(p.arg.end, vma->vm_start);
+ ret = pagemap_scan_pte_hole_lab(walk_start, next, &p, NULL);
+ if (ret)
+ break;
+ walk_start = next;
+ } else {
+ next = min(p.arg.end, vma->vm_end);
+
+ ret = pagemap_scan_test_lab(walk_start, min(p.arg.end, vma->vm_end),
+ &p, vma);
+
+ if (ret > 0) {
+ ret = 0;
+ walk_start = min(p.arg.end, vma->vm_end);
+ next = walk_start;
+ vma = find_vma(mm, walk_start);
+ continue;
+ }
+
+ ret = pagemap_scan_walk(vma, &p, walk_start);
+ if (ret)
+ break;
+ walk_start = min(p.arg.end, vma->vm_end);
+ vma = find_vma(mm, walk_start);
+ next = walk_start;
+ }
+ } while (next < p.arg.end);
if (p.arg.flags & PM_SCAN_WP_MATCHING)
mmu_notifier_invalidate_range_end(&range);
@@ -2986,6 +2823,304 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg)
return ret;
}
+static int pagemap_read_walk_range(struct vm_area_struct *vma, unsigned long start,
+ struct pagemapread *pm)
+{
+ int err = 0;
+ struct pt_range_walk ptw = {
+ .mm = vma->vm_mm
+ };
+ enum pt_range_walk_type type;
+ pt_type_flags_t wflags = PT_TYPE_ALL;
+ pte_t *ptep;
+
+ wflags &= ~(PT_TYPE_PFN);
+
+ type = pt_range_walk_start(&ptw, vma, start, vma->vm_end, wflags);
+ while (type != PTW_DONE) {
+ unsigned long end;
+ u64 frame = 0, flags = 0;
+ struct page *page = NULL;
+ struct folio *folio = NULL;
+
+ end = 0;
+ switch (ptw.level) {
+ case PTW_PUD_LEVEL:
+ end = pud_addr_end(start, vma->vm_end);
+ if (vma->vm_flags & VM_SOFTDIRTY)
+ flags |= PM_SOFT_DIRTY;
+
+ if (pud_present(ptw.pud)) {
+ page = pud_page(ptw.pud);
+ folio = page_folio(page);
+ flags |= PM_PRESENT;
+
+ if (!folio_test_anon(folio))
+ flags |= PM_FILE;
+
+ if (pm->show_pfn) {
+ unsigned long hmask = huge_page_mask(hstate_vma(vma));
+
+ frame = pud_pfn(ptw.pud) +
+ ((start & ~hmask) >> PAGE_SHIFT);
+ }
+ } else if (pud_swp_uffd_wp(ptw.pud)) {
+ flags |= PM_UFFD_WP;
+ }
+ break;
+ case PTW_PMD_LEVEL:
+ unsigned int idx = (start & ~PMD_MASK) >> PAGE_SHIFT;
+
+ end = pmd_addr_end(start, vma->vm_end);
+ if (vma->vm_flags & VM_SOFTDIRTY)
+ flags |= PM_SOFT_DIRTY;
+
+ if (pmd_none(ptw.pmd))
+ goto populate_pagemap;
+
+ if (pmd_present(ptw.pmd)) {
+ page = pmd_page(ptw.pmd);
+ flags |= PM_PRESENT;
+
+ if (pmd_soft_dirty(ptw.pmd))
+ flags |= PM_SOFT_DIRTY;
+ if (pmd_uffd_wp(ptw.pmd))
+ flags |= PM_UFFD_WP;
+ if (pm->show_pfn)
+ frame = pmd_pfn(ptw.pmd) + idx;
+ } else if (thp_migration_supported() || IS_ENABLED(CONFIG_HUGETLB_PAGE)) {
+ const softleaf_t entry = softleaf_from_pmd(ptw.pmd);
+ unsigned long offset;
+
+ if (pm->show_pfn) {
+ if (softleaf_has_pfn(entry))
+ offset = softleaf_to_pfn(entry) + idx;
+ else
+ offset = swp_offset(entry) + idx;
+ frame = swp_type(entry) |
+ (offset << MAX_SWAPFILES_SHIFT);
+ }
+
+ if (!is_vm_hugetlb_page(vma))
+ flags |= PM_SWAP;
+ if (pmd_swp_soft_dirty(ptw.pmd))
+ flags |= PM_SOFT_DIRTY;
+ if (pmd_swp_uffd_wp(ptw.pmd))
+ flags |= PM_UFFD_WP;
+
+ VM_WARN_ON_ONCE(!pmd_is_migration_entry(ptw.pmd));
+ page = softleaf_to_page(entry);
+ }
+
+ if (page) {
+ folio = page_folio(page);
+ if (!folio_test_anon(folio))
+ flags |= PM_FILE;
+ }
+
+ break;
+ case PTW_PTE_LEVEL:
+ end = pmd_addr_end(start, vma->vm_end);
+ break;
+ }
+
+ if (ptw.level == PTW_PTE_LEVEL) {
+ ptep = ptw.ptep;
+ for (; start < end; ptep++, start += PAGE_SIZE) {
+ pagemap_entry_t pme;
+
+ pme = pte_to_pagemap_entry(pm, vma, start, ptep_get(ptep));
+ err = add_to_pagemap(&pme, pm);
+ ptw.next_addr = start + PAGE_SIZE;
+ if (err)
+ break;
+ }
+ } else if (ptw.level == PTW_PMD_LEVEL) {
+populate_pagemap:
+ for (; start != end; start += PAGE_SIZE) {
+ u64 cur_flags = flags;
+ pagemap_entry_t pme;
+
+ if (folio && (flags & PM_PRESENT) &&
+ __folio_page_mapped_exclusively(folio, page))
+ cur_flags |= PM_MMAP_EXCLUSIVE;
+
+ pme = make_pme(frame, cur_flags);
+ err = add_to_pagemap(&pme, pm);
+ if (err)
+ break;
+ if (pm->show_pfn) {
+ if (flags & PM_PRESENT)
+ frame++;
+ else if (flags & PM_SWAP)
+ frame += (1 << MAX_SWAPFILES_SHIFT);
+ }
+ }
+ }
+ type = pt_range_walk_next(&ptw, vma, vma->vm_start, vma->vm_end, wflags);
+ }
+ pt_range_walk_done(&ptw);
+
+ return err;
+}
+
+static int pagemap_pte_hole(struct mm_struct *mm, unsigned long start, unsigned long end,
+ struct pagemapread *pm)
+{
+ unsigned long addr = start;
+ int err = 0;
+
+ while (addr < end) {
+ struct vm_area_struct *vma = find_vma(mm, addr);
+ pagemap_entry_t pme = make_pme(0, 0);
+ /* End of address space hole, which we mark as non-present. */
+ unsigned long hole_end;
+
+ if (vma)
+ hole_end = min(end, vma->vm_start);
+ else
+ hole_end = end;
+
+ for (; addr < hole_end; addr += PAGE_SIZE) {
+ err = add_to_pagemap(&pme, pm);
+ if (err)
+ goto out;
+ }
+
+ if (!vma)
+ break;
+
+ /* Addresses in the VMA. */
+ if (vma->vm_flags & VM_SOFTDIRTY)
+ pme = make_pme(0, PM_SOFT_DIRTY);
+ for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
+ err = add_to_pagemap(&pme, pm);
+ if (err)
+ goto out;
+ }
+ }
+out:
+ return err;
+}
+
+static ssize_t pagemap_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct mm_struct *mm = file->private_data;
+ struct pagemapread pm;
+ unsigned long src;
+ unsigned long svpfn;
+ unsigned long start_vaddr;
+ unsigned long end_vaddr;
+ int ret = 0, copied = 0;
+
+ if (!mm || !mmget_not_zero(mm))
+ goto out;
+
+ ret = -EINVAL;
+ /* file position must be aligned */
+ if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
+ goto out_mm;
+
+ ret = 0;
+ if (!count)
+ goto out_mm;
+
+ /* do not disclose physical addresses: attack vector */
+ pm.show_pfn = file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN);
+
+ pm.len = (PAGEMAP_WALK_SIZE >> PAGE_SHIFT);
+ pm.buffer = kmalloc_array(pm.len, PM_ENTRY_BYTES, GFP_KERNEL);
+ ret = -ENOMEM;
+ if (!pm.buffer)
+ goto out_mm;
+
+ src = *ppos;
+ svpfn = src / PM_ENTRY_BYTES;
+ end_vaddr = mm->task_size;
+
+ /* watch out for wraparound */
+ start_vaddr = end_vaddr;
+ if (svpfn <= (ULONG_MAX >> PAGE_SHIFT)) {
+ unsigned long end;
+
+ ret = mmap_read_lock_killable(mm);
+ if (ret)
+ goto out_free;
+ start_vaddr = untagged_addr_remote(mm, svpfn << PAGE_SHIFT);
+ mmap_read_unlock(mm);
+
+ end = start_vaddr + ((count / PM_ENTRY_BYTES) << PAGE_SHIFT);
+ if (end >= start_vaddr && end < mm->task_size)
+ end_vaddr = end;
+ }
+
+ /* Ensure the address is inside the task */
+ if (start_vaddr > mm->task_size)
+ start_vaddr = end_vaddr;
+
+ ret = 0;
+
+ while (count && (start_vaddr < end_vaddr)) {
+ int len;
+ unsigned long end;
+ unsigned long next;
+
+ pm.pos = 0;
+ end = (start_vaddr + PAGEMAP_WALK_SIZE) & PAGEMAP_WALK_MASK;
+ if (end < start_vaddr || end > end_vaddr)
+ end = end_vaddr;
+ ret = mmap_read_lock_killable(mm);
+ if (ret)
+ goto out_free;
+
+ struct vm_area_struct *vma = find_vma(mm, start_vaddr);
+
+ do {
+ if (!vma) {
+ next = end;
+ ret = pagemap_pte_hole(mm, start_vaddr, next, &pm);
+ if (ret)
+ goto out_err;
+ } else if (start_vaddr < vma->vm_start) {
+ next = min(end, vma->vm_start);
+ ret = pagemap_pte_hole(mm, start_vaddr, next, &pm);
+ if (ret)
+ goto out_err;
+ start_vaddr = next;
+ } else {
+ ret = pagemap_read_walk_range(vma, start_vaddr, &pm);
+ if (ret)
+ goto out_err;
+ start_vaddr = min(end, vma->vm_end);
+ next = start_vaddr;
+ vma = find_vma(mm, start_vaddr);
+ }
+ } while (next < end);
+out_err:
+ mmap_read_unlock(mm);
+
+ len = min(count, PM_ENTRY_BYTES * pm.pos);
+ if (copy_to_user(buf, pm.buffer, len)) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+ copied += len;
+ buf += len;
+ count -= len;
+ }
+ *ppos += copied;
+ if (!ret || ret == PM_END_OF_BUFFER)
+ ret = copied;
+
+out_free:
+ kfree(pm.buffer);
+out_mm:
+ mmput(mm);
+out:
+ return ret;
+}
+
static long do_pagemap_cmd(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -3008,6 +3143,7 @@ const struct file_operations proc_pagemap_operations = {
.unlocked_ioctl = do_pagemap_cmd,
.compat_ioctl = do_pagemap_cmd,
};
+
#endif /* CONFIG_PROC_PAGE_MONITOR */
#ifdef CONFIG_NUMA
diff --git a/include/linux/leafops.h b/include/linux/leafops.h
index 122ac50aeb09..6444625c6fbb 100644
--- a/include/linux/leafops.h
+++ b/include/linux/leafops.h
@@ -618,6 +618,19 @@ static inline bool pmd_is_device_private_entry(pmd_t pmd)
#endif /* CONFIG_ZONE_DEVICE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
+#ifdef CONFIG_HUGETLB_PAGE
+/**
+ * pud_is_migration_entry() - Does this PUD entry encode a migration entry?
+ * @pud: PUD entry.
+ *
+ * Returns: true if the PUD encodes a migration entry, otherwise false.
+ */
+static inline bool pud_is_migration_entry(pud_t pud)
+{
+ return softleaf_is_migration(softleaf_from_pud(pud));
+}
+#endif
+
/**
* pmd_is_migration_entry() - Does this PMD entry encode a migration entry?
* @pmd: PMD entry.
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index ab43d0922ec1..edb95313a6cf 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1227,11 +1227,21 @@ static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
}
#endif
+#ifndef __HAVE_ARCH_PUDP_INVALIDATE
+extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp);
+#endif
+
#ifndef __HAVE_ARCH_PMDP_INVALIDATE
extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
#endif
+#ifndef __HAVE_ARCH_PUDP_INVALIDATE_AD
+extern pud_t pudp_invalidate_ad(struct vm_area_struct *vma,
+ unsigned long address, pud_t *pudp);
+#endif
+
#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD
/*
@@ -1774,6 +1784,21 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pud_t pud_swp_mksoft_dirty(pud_t pud)
+{
+ return pud;
+}
+
+static inline int pud_swp_soft_dirty(pud_t pud)
+{
+ return 0;
+}
+
+static inline pud_t pud_swp_clear_soft_dirty(pud_t pud)
+{
+ return pud;
+}
+
static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
{
return pmd;
@@ -1816,6 +1841,11 @@ static inline int pmd_soft_dirty(pmd_t pmd)
return 0;
}
+static inline int pud_soft_dirty(pud_t pud)
+{
+ return 0;
+}
+
static inline pte_t pte_mksoft_dirty(pte_t pte)
{
return pte;
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index af7966169d69..f390c93b98b2 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -206,6 +206,16 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
}
#endif
+#ifndef __HAVE_ARCH_PUDP_INVALIDATE_AD
+pud_t pudp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp)
+
+{
+ VM_WARN_ON_ONCE(!pud_present(*pudp));
+ return pudp_invalidate(vma, address, pudp);
+}
+#endif
+
#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH v2 0/7] Implement a new generic pagewalk API
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
` (6 preceding siblings ...)
2026-04-26 12:57 ` [RFC PATCH v2 7/7] mm: Make /proc/pid/pagemap " Oscar Salvador
@ 2026-04-26 13:11 ` Andrew Morton
2026-04-26 19:01 ` [syzbot ci] " syzbot ci
8 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2026-04-26 13:11 UTC (permalink / raw)
To: Oscar Salvador
Cc: David Hildenbrand, Michal Hocko, Muchun Song, Vlastimil Babka,
Lorenzo Stoakes, linux-kernel, linux-mm, Roman Gushchin
On Sun, 26 Apr 2026 14:57:12 +0200 Oscar Salvador <osalvador@suse.de> wrote:
> Also, I would like to thank Vlastimil, who helped me running this
> patchset quite a few times through Claude, to catch some fixes.
Well dang, I wanted to do a Claude-vs-Sashiko cage match, but "Failed
to apply". Again :(
(https://sashiko.dev/#/patchset/20260426125719.24698-1-osalvador@suse.de).
And indeed, there are a couple of little rejects against current -linus.
Giving patch(1) the "-l" flag fixes those up, no probs. Roman, perhaps
teach Sashiko to try that as a fallback?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [syzbot ci] Re: Implement a new generic pagewalk API
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
` (7 preceding siblings ...)
2026-04-26 13:11 ` [RFC PATCH v2 0/7] Implement a " Andrew Morton
@ 2026-04-26 19:01 ` syzbot ci
8 siblings, 0 replies; 11+ messages in thread
From: syzbot ci @ 2026-04-26 19:01 UTC (permalink / raw)
To: akpm, david, david, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, osalvador, vbabka
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v2] Implement a new generic pagewalk API
https://lore.kernel.org/all/20260426125719.24698-1-osalvador@suse.de
* [RFC PATCH v2 1/7] mm: Add softleaf_from_pud
* [RFC PATCH v2 2/7] mm: Add {pmd,pud}_huge_lock helper
* [RFC PATCH v2 3/7] mm: Implement folio_pmd_batch
* [RFC PATCH v2 4/7] mm: Implement pt_range_walk
* [RFC PATCH v2 5/7] mm: Make /proc/pid/smaps use the new generic pagewalk API
* [RFC PATCH v2 6/7] mm: Make /proc/pid/numa_maps use the new generic pagewalk API
* [RFC PATCH v2 7/7] mm: Make /proc/pid/pagemap use the new generic pagewalk API
and found the following issues:
* KASAN: slab-out-of-bounds Write in pagemap_read
* WARNING in __page_table_check_pmds_set
* WARNING in pt_range_walk
* WARNING: bad unlock balance in pt_range_walk
Full report is available here:
https://ci.syzbot.org/series/409219de-ca42-45a5-9204-0b315095160c
***
KASAN: slab-out-of-bounds Write in pagemap_read
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: 819bd270abf9de3b7f306e233054b85a07c47820
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/0361816f-57b2-43e4-aef2-186d8d35fdc9/config
syz repro: https://ci.syzbot.org/findings/7f740037-7398-46f2-aa01-a57caa186d06/syz_repro
==================================================================
BUG: KASAN: slab-out-of-bounds in add_to_pagemap fs/proc/task_mmu.c:1776 [inline]
BUG: KASAN: slab-out-of-bounds in pagemap_read_walk_range fs/proc/task_mmu.c:2949 [inline]
BUG: KASAN: slab-out-of-bounds in pagemap_read+0x1d60/0x2810 fs/proc/task_mmu.c:3092
Write of size 8 at addr ffff88816f1bd000 by task syz.1.18/5957
CPU: 1 UID: 0 PID: 5957 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xba/0x230 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
add_to_pagemap fs/proc/task_mmu.c:1776 [inline]
pagemap_read_walk_range fs/proc/task_mmu.c:2949 [inline]
pagemap_read+0x1d60/0x2810 fs/proc/task_mmu.c:3092
vfs_read+0x20c/0xa70 fs/read_write.c:572
ksys_pread64 fs/read_write.c:765 [inline]
__do_sys_pread64 fs/read_write.c:773 [inline]
__se_sys_pread64 fs/read_write.c:770 [inline]
__x64_sys_pread64+0x199/0x230 fs/read_write.c:770
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f340679cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f34075d2028 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 00007f3406a15fa0 RCX: 00007f340679cdd9
RDX: 0000000000019020 RSI: 0000200000000200 RDI: 0000000000000003
RBP: 00007f3406832d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000001000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3406a16038 R14: 00007f3406a15fa0 R15: 00007ffdb721f7e8
</TASK>
Allocated by task 5957:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
kasan_kmalloc include/linux/kasan.h:263 [inline]
__kmalloc_cache_noprof+0x31c/0x660 mm/slub.c:5339
kmalloc_noprof include/linux/slab.h:962 [inline]
kmalloc_array_noprof include/linux/slab.h:1109 [inline]
pagemap_read+0x27d/0x2810 fs/proc/task_mmu.c:3033
vfs_read+0x20c/0xa70 fs/read_write.c:572
ksys_pread64 fs/read_write.c:765 [inline]
__do_sys_pread64 fs/read_write.c:773 [inline]
__se_sys_pread64 fs/read_write.c:770 [inline]
__x64_sys_pread64+0x199/0x230 fs/read_write.c:770
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff88816f1bc000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 0 bytes to the right of
allocated 4096-byte region [ffff88816f1bc000, ffff88816f1bd000)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88816f1b8000 pfn:0x16f1b8
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x57ff00000000240(workingset|head|node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000240 ffff888100042140 ffff888160400a08 ffffea0005c6fe10
raw: ffff88816f1b8000 0000000000040003 00000000f5000000 0000000000000000
head: 057ff00000000240 ffff888100042140 ffff888160400a08 ffffea0005c6fe10
head: ffff88816f1b8000 0000000000040003 00000000f5000000 0000000000000000
head: 057ff00000000003 ffffea0005bc6e01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5715, tgid 5715 (rm), ts 50636400155, free_ts 46909302793
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x231/0x280 mm/page_alloc.c:1889
prep_new_page mm/page_alloc.c:1897 [inline]
get_page_from_freelist+0x24dc/0x2580 mm/page_alloc.c:3962
__alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5250
alloc_slab_page mm/slub.c:3255 [inline]
allocate_slab+0x77/0x660 mm/slub.c:3444
new_slab mm/slub.c:3502 [inline]
refill_objects+0x331/0x3c0 mm/slub.c:7134
refill_sheaf mm/slub.c:2804 [inline]
__pcs_replace_empty_main+0x2b9/0x620 mm/slub.c:4578
alloc_from_pcs mm/slub.c:4681 [inline]
slab_alloc_node mm/slub.c:4815 [inline]
__do_kmalloc_node mm/slub.c:5218 [inline]
__kmalloc_noprof+0x474/0x760 mm/slub.c:5231
kmalloc_noprof include/linux/slab.h:966 [inline]
tomoyo_realpath_from_path+0xe3/0x5d0 security/tomoyo/realpath.c:251
tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
tomoyo_path_perm+0x283/0x560 security/tomoyo/file.c:827
tomoyo_path_unlink+0xaa/0xf0 security/tomoyo/tomoyo.c:162
security_path_unlink+0x15f/0x330 security/security.c:1457
filename_unlinkat+0x349/0x610 fs/namei.c:5537
__do_sys_unlink fs/namei.c:5575 [inline]
__se_sys_unlink+0x2e/0x140 fs/namei.c:5572
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5257 tgid 5257 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
__free_pages_prepare mm/page_alloc.c:1433 [inline]
__free_frozen_pages+0xc2b/0xdb0 mm/page_alloc.c:2978
__slab_free+0x263/0x2b0 mm/slub.c:5532
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4501 [inline]
slab_alloc_node mm/slub.c:4830 [inline]
kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4837
alloc_filename fs/namei.c:142 [inline]
do_getname+0x2e/0x250 fs/namei.c:182
getname_flags fs/namei.c:225 [inline]
getname include/linux/fs.h:2512 [inline]
class_filename_constructor include/linux/fs.h:2539 [inline]
__do_sys_unlink fs/namei.c:5574 [inline]
__se_sys_unlink+0x1e/0x140 fs/namei.c:5572
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff88816f1bcf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88816f1bcf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88816f1bd000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88816f1bd080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88816f1bd100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
***
WARNING in __page_table_check_pmds_set
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: 819bd270abf9de3b7f306e233054b85a07c47820
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/0361816f-57b2-43e4-aef2-186d8d35fdc9/config
syz repro: https://ci.syzbot.org/findings/f8fe52b4-cef7-4d25-8e60-cbbb1a4f4df5/syz_repro
------------[ cut here ]------------
softleaf_cached_writable(entry)
WARNING: mm/page_table_check.c:227 at page_table_check_pmd_flags mm/page_table_check.c:227 [inline], CPU#1: syz.0.17/6022
WARNING: mm/page_table_check.c:227 at __page_table_check_pmds_set+0x18e/0x340 mm/page_table_check.c:240, CPU#1: syz.0.17/6022
Modules linked in:
CPU: 1 UID: 0 PID: 6022 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:page_table_check_pmd_flags mm/page_table_check.c:227 [inline]
RIP: 0010:__page_table_check_pmds_set+0x18e/0x340 mm/page_table_check.c:240
Code: 89 fe e8 85 44 89 ff 48 b8 00 00 00 00 00 00 00 94 4c 01 f8 48 c1 e8 3b 0f 94 c0 4d 39 e5 76 0f 84 c0 74 0b e8 83 3f 89 ff 90 <0f> 0b 90 eb 05 e8 78 3f 89 ff 31 ff 89 ee e8 af 43 89 ff 41 89 ef
RSP: 0018:ffffc90003ce7af8 EFLAGS: 00010293
RAX: ffffffff823c4d7d RBX: ffff888111416001 RCX: ffff88816cbd1d00
RDX: 0000000000000000 RSI: 6c00000000000000 RDI: 6c00000000000000
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000003
R10: 0000000000000002 R11: 0000000000000000 R12: c7fffffffffffffe
R13: dfffffffdc9bfe06 R14: ffff888111416008 R15: 6c00000000000000
FS: 00007ff0386af6c0(0000) GS:ffff8882a9466000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555574a1a9e8 CR3: 00000001063c6000 CR4: 00000000000006f0
Call Trace:
<TASK>
page_table_check_pmds_set include/linux/page_table_check.h:92 [inline]
set_pmd_at arch/x86/include/asm/pgtable.h:1234 [inline]
make_uffd_wp_pmd fs/proc/task_mmu.c:1903 [inline]
pagemap_scan_walk fs/proc/task_mmu.c:2631 [inline]
do_pagemap_scan fs/proc/task_mmu.c:2785 [inline]
do_pagemap_cmd+0x3bd8/0x4310 fs/proc/task_mmu.c:3131
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff03779cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff0386af028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ff037a16090 RCX: 00007ff03779cdd9
RDX: 00002000000001c0 RSI: 00000000c0606610 RDI: 0000000000000003
RBP: 00007ff037832d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff037a16128 R14: 00007ff037a16090 R15: 00007ffd47ec5488
</TASK>
***
WARNING in pt_range_walk
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: 819bd270abf9de3b7f306e233054b85a07c47820
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/0361816f-57b2-43e4-aef2-186d8d35fdc9/config
syz repro: https://ci.syzbot.org/findings/02bfe437-a81f-4510-9fff-4b769eb9a0a4/syz_repro
------------[ cut here ]------------
next_addr < vma->vm_start || next_addr >= vma->vm_end
WARNING: mm/pagewalk.c:1052 at pt_range_walk+0x14e/0x35a0 mm/pagewalk.c:1052, CPU#1: syz.2.19/6013
Modules linked in:
CPU: 1 UID: 0 PID: 6013 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:pt_range_walk+0x14e/0x35a0 mm/pagewalk.c:1052
Code: 74 08 48 89 df e8 12 1a 15 00 49 89 de 48 8b 1b 48 8b 7c 24 10 48 89 de e8 6f a5 aa ff 48 39 5c 24 10 73 13 e8 03 a3 aa ff 90 <0f> 0b 90 b8 01 00 00 00 e9 88 2b 00 00 49 8d 5e 08 48 89 d8 48 c1
RSP: 0018:ffffc90003d37920 EFLAGS: 00010293
RAX: ffffffff821b11f0 RBX: 0000200000800000 RCX: ffff8881134f5700
RDX: 0000000000000000 RSI: 0000200000800000 RDI: 0000200000800000
RBP: ffffc90003d37b30 R08: 00000000000000ff R09: 1ffff1102dc8b830
R10: dffffc0000000000 R11: ffffed102dc8b831 R12: ffff8881709b1800
R13: dffffc0000000000 R14: ffff8881709b1800 R15: 0000200000800000
FS: 00007f33141ff6c0(0000) GS:ffff8882a9466000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f331344f0d1 CR3: 000000017150e000 CR4: 00000000000006f0
Call Trace:
<TASK>
pagemap_scan_walk fs/proc/task_mmu.c:2463 [inline]
do_pagemap_scan fs/proc/task_mmu.c:2785 [inline]
do_pagemap_cmd+0x3924/0x4310 fs/proc/task_mmu.c:3131
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f331339cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f33141ff028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f3313615fa0 RCX: 00007f331339cdd9
RDX: 0000200000000140 RSI: 00000000c0606610 RDI: 0000000000000003
RBP: 00007f3313432d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3313616038 R14: 00007f3313615fa0 R15: 00007fff980e14d8
</TASK>
***
WARNING: bad unlock balance in pt_range_walk
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: 819bd270abf9de3b7f306e233054b85a07c47820
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/0361816f-57b2-43e4-aef2-186d8d35fdc9/config
syz repro: https://ci.syzbot.org/findings/1e53935d-6673-468b-905d-88da8f1e39f3/syz_repro
loop1: detected capacity change from 0 to 4096
=====================================
WARNING: bad unlock balance detected!
syzkaller #0 Not tainted
-------------------------------------
syz.1.18/5968 is trying to release lock (ptlock_ptr(ptdesc)) at:
[<ffffffff821aeb0a>] spin_unlock include/linux/spinlock.h:389 [inline]
[<ffffffff821aeb0a>] pt_range_walk+0x25a/0x35a0 mm/pagewalk.c:1058
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syz.1.18/5968:
#0: ffff8881626b0340 (&mm->mmap_lock){++++}-{4:4}, at: mmap_read_lock_killable include/linux/mmap_lock.h:601 [inline]
#0: ffff8881626b0340 (&mm->mmap_lock){++++}-{4:4}, at: do_pagemap_scan fs/proc/task_mmu.c:2746 [inline]
#0: ffff8881626b0340 (&mm->mmap_lock){++++}-{4:4}, at: do_pagemap_cmd+0x618/0x4310 fs/proc/task_mmu.c:3131
stack backtrace:
CPU: 1 UID: 0 PID: 5968 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_unlock_imbalance_bug+0xdc/0xf0 kernel/locking/lockdep.c:5298
__lock_release kernel/locking/lockdep.c:5537 [inline]
lock_release+0x248/0x3d0 kernel/locking/lockdep.c:5889
__raw_spin_unlock include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock+0x16/0x50 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:389 [inline]
pt_range_walk+0x25a/0x35a0 mm/pagewalk.c:1058
pagemap_scan_walk fs/proc/task_mmu.c:2463 [inline]
do_pagemap_scan fs/proc/task_mmu.c:2785 [inline]
do_pagemap_cmd+0x3924/0x4310 fs/proc/task_mmu.c:3131
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0ce439cdd9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0ce5214028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f0ce4615fa0 RCX: 00007f0ce439cdd9
RDX: 0000200000000140 RSI: 00000000c0606610 RDI: 0000000000000004
RBP: 00007f0ce4432d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f0ce4616038 R14: 00007f0ce4615fa0 R15: 00007fffbeb73768
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-26 19:01 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-26 12:57 [RFC PATCH v2 0/7] Implement a new generic pagewalk API Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 1/7] mm: Add softleaf_from_pud Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 2/7] mm: Add {pmd,pud}_huge_lock helper Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 3/7] mm: Implement folio_pmd_batch Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 4/7] mm: Implement pt_range_walk Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 5/7] mm: Make /proc/pid/smaps use the new generic pagewalk API Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 6/7] mm: Make /proc/pid/numa_maps " Oscar Salvador
2026-04-26 12:57 ` [RFC PATCH v2 7/7] mm: Make /proc/pid/pagemap " Oscar Salvador
2026-04-26 13:11 ` [RFC PATCH v2 0/7] Implement a " Andrew Morton
2026-04-26 19:01 ` [syzbot ci] " syzbot ci
-- strict thread matches above, loose matches on Subject: below --
2026-04-12 17:42 [RFC PATCH 0/7] " Oscar Salvador
2026-04-13 7:38 ` [syzbot ci] " syzbot ci
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox