From: Lorenzo Stoakes <ljs@kernel.org>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Michal Hocko <mhocko@suse.com>,
Muchun Song <muchun.song@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v3 0/8] Implement a new generic pagewalk API
Date: Thu, 11 Jun 2026 13:07:21 +0100 [thread overview]
Message-ID: <aiqkyOCy2rf99wxg@lucifer> (raw)
In-Reply-To: <20260525165528.184397-1-osalvador@suse.de>
-cc wrong email
Sorry to be a pain but I only just noticed this because it's going to the
wrong email address :)
Could you make sure to send future revisions to ljs@kernel.org?
Thanks, Lorenzo
On Mon, May 25, 2026 at 06:55:20PM +0200, Oscar Salvador wrote:
> Changelog:
> rfcv2 -> rfcv3:
> - Fix an out-of-bounds write
> - Convert clear_refs to the new API
> - Fix issue when reading cont-PMDs
> rfc -> rfcv2:
> - Add pte_hole functionality
> - Fix pagemap issues
> - Fix shmem in smap
> - Testing with pagemap "testsuite"
>
> [WARNING]
>
> This is not yet fully complete, but before investing more time into it I would like
> to know whether 1) this is heading into the right direction and 2) this is something
> we are still interested in.
> There are still things that need work:
>
> - convert make_uffd_wp_huge_pte: Since hugetlb is being dealt like a
> pte, we inherited PTE_MARKERs for it when those came into play, and
> AFAIK, those are being used mostly for UFFD.
> From here on we have two options: 1) find another way to deal with
> UFFD without markers or 2) introduce markers for PMD and PUD level.
> I am leaning towards option 1), because 2) seems a bit unfair.
> I still need to put some thought into it and see how we can achieve
> that.
>
> - Teach the new API how to use other kind of locks. E.g: pagemap scan
> needs to take i_mmap_lock during the scanning, so we need to able to
> take that lock. I have some ideas to do that, but something for the
> new version.
>
> - Find corner-cases and fix them.
>
>
> Kudos go to David, who was the person suggesting the interface and
> he gave me some ideas where to begin, besides providing feedback
> on early stages (in case there is something stupid don't blame him, blame me)
>
> Also, I would like to thank Vlastimil, who helped me running this
> patchset quite a few times through Claude, to catch some fixes.
>
> [/WARNING]
>
> [TESTING]
> Part of the testing has been to duplicate
> /proc/$$/(pagemap,smaps,numa_maps,clear_refs) and have the same with
> _lab extension linked to the old API.
> In that way I could check whether the outcome from e.g: /proc/$$/smaps
> and /proc/$$/smaps_lab was the same for any given program.
> The same I did for pagemap and numa_maps.
>
> Also, regarding pagemap:
> So far, tools/mm/page-types.c reports the right outcome (compared to the old API),
> and tools/testing/selftests/mm/pagemap_ioctl.c only reports 4 failing tests.
> Although to be honest, I do not how much should I trust that one because if I
> add a few delays in the userspace code, some tests that were failing before are not
> now, so yeah.
>
> localhost:~/workspace # ./page-types -p 1168
> flags page-count MB symbolic-flags long-symbolic-flags
> 0x0000000000000800 1 0 ___________M_______________________________ mmap
> 0x0000000000000828 2 0 ___U_l_____M_______________________________ uptodate,lru,mmap
> 0x000000000000082c 1 0 __RU_l_____M_______________________________ referenced,uptodate,lru,mmap
> 0x0000000000004838 1 0 ___UDl_____M__b____________________________ uptodate,dirty,lru,mmap,swapbacked
> 0x000000000000086c 423 1 __RU_lA____M_______________________________ referenced,uptodate,lru,active,mmap
> 0x0000000000205828 29 0 ___U_l_____Ma_b______x_____________________ uptodate,lru,mmap,anonymous,swapbacked,ksm
> 0x000000000020586c 1 0 __RU_lA____Ma_b______x_____________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
> total 458 1
>
> localhost:~/workspace # ./page-types_lab -p 1168
> flags page-count MB symbolic-flags long-symbolic-flags
> 0x0000000000000804 1 0 __R________M_______________________________ referenced,mmap
> 0x0000000000000828 2 0 ___U_l_____M_______________________________ uptodate,lru,mmap
> 0x000000000000082c 1 0 __RU_l_____M_______________________________ referenced,uptodate,lru,mmap
> 0x0000000000004838 1 0 ___UDl_____M__b____________________________ uptodate,dirty,lru,mmap,swapbacked
> 0x000000000000086c 423 1 __RU_lA____M_______________________________ referenced,uptodate,lru,active,mmap
> 0x0000000000205828 29 0 ___U_l_____Ma_b______x_____________________ uptodate,lru,mmap,anonymous,swapbacked,ksm
> 0x000000000020586c 1 0 __RU_lA____Ma_b______x_____________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
> total 458 1
>
> page-types being using the new API and page-types_lab the old one.
>
> # ./pagemap_ioctl
> TAP version 13
> 1..117
> ok 1 sanity_tests_sd Zero range size is valid
> ok 2 sanity_tests_sd output bu
> ok 35 Walk_end: 1 max page
> ok 36 Page testing: all new pages must not be written (dirty)
> ok 37 Page testing: all pages must be written (dirty)
> ok 38 Page testing: all pages dirty other than first and the last one
> ok 39 Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
> ok 40 Page testing: only middle page dirty
> ok 41 Page testing: only two middle pages dirty
> ok 42 Large Page testing: all new pages must not be written (dirty)
> ok 43 Large Page testing: all pages must be written (dirty)
> ok 44 Large Page testing: all pages dirty other than first and the last one
> ok 45 Large Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
> ok 46 Large Page testing: only middle page dirty
> ok 47 Large Page testing: only two middle pages dirty
> ok 48 Huge page testing: all new pages must not be written (dirty)
> ok 49 Huge page testing: all pages must be written (dirty)
> ok 50 Huge page testing: all pages dirty other than first and the last one
> ok 51 Huge page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
> ok 52 Huge page testing: only middle page dirty
> ok 53 Huge page testing: only two middle pages dirty
> ok 54 Hugetlb shmem testing: all new pages must not be written (dirty)
> ok 55 Hugetlb shmem testing: all pages must be written (dirty)
> ok 56 Hugetlb shmem testing: all pages dirty other than first and the last one
> ok 57 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
> ok 58 Hugetlb shmem testing: only middle page dirty
> not ok 59 Hugetlb shmem testing: only two middle pages dirty
> ok 60 Hugetlb mem testing: all new pages must not be written (dirty)
> ok 61 Hugetlb mem testing: all pages must be written (dirty)
> ok 62 Hugetlb mem testing: all pages dirty other than first and the last one
> ok 63 Hugetlb mem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
> ok 64 Hugetlb mem testing: only middle page dirty
> not ok 65 Hugetlb mem testing: only two middle pages dirty
> ok 66 Hugetlb shmem testing: all new pages must not be written (dirty)
> ok 67 Hugetlb shmem testing: all pages must be written (dirty)
> ok 68 Hugetlb shmem testing: all pages dirty other than first and the last one
> ok 69 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
> ok 70 Hugetlb shmem testing: only middle page dirty
> not ok 71 Hugetlb shmem testing: only two middle pages dirty
> ok 72 File memory testing: all new pages must not be written (dirty)
> ok 73 File memory testing: all p
> # Totals: pass:113 fail:4 xfail:0 xpass:0 skip:0 error:0
>
> [/TESTING]
>
> In the LSFMM/BFP 2025, there was a general agreement that we 1) would like to have
> a generic pagewalk API 2) that replaces the existing one with callbacks if possible
> and 3) that HugeTLB can use without the need to special case it (e.g: not having to
> depend on .hugetlb_entry callbacks)., which means having a lot of duplicated
> code and also having a lot of special casing just because hugetlb lore.
>
> pt_range_walk API tries to do that and replaces the old behaviour of "in
> HugeTLB world everything reads as a PTE" and starts reading HugeTLB entries
> the way they really are, that means interpreting them as PMD/PUD entries and
> contiguous-PMD/PTE entries.
>
> In order to achieve that, we need some infrastructure we did not really need until
> know, in order to be able to read HugeTLB pages as PUD/PMD entries.
> E.g: softleaf_from_pud had to be added and some other pud_* functions.
>
> In a few words, this API goes through an address range and returns
> whatever it is in there (swap/hwpoison/migration/marker entries, folios,
> pfn and device entries, or nothing).
>
> These are the internal return types the API uses:
>
> PT_TYPE_NONE
> PT_TYPE_FOLIO
> PT_TYPE_MARKER
> PT_TYPE_PFN
> PT_TYPE_SWAP
> PT_TYPE_MIGRATION
> PT_TYPE_DEVICE
> PT_TYPE_HWPOISON
>
> The API also handles locking and batching itself, so the caller
> does not really need to bother with that.
>
> In order to handle contiguous-PMD mapped hugetlb pages, folio_pmd_batch,
> which is an analogous of folio_pte_batch, has been implemented.
>
> More information about the API can be found in patch #4.
>
> This was tested on x86_64 and arm64, but as I said, it is still
> incomplete, therefore the RFC, to gather some initial feedback before
> investing more time into this.
>
> For now, all users of the old API from fs/proc/task_mmu.c have been
> converted: /proc/pid/(smaps|numa_maps|pagemap|clear_refs).
>
> Thanks in advance
>
> Oscar Salvador (8):
> mm: Add softleaf_from_pud
> mm: Add {pmd,pud}_huge_lock helper
> mm: Implement folio_pmd_batch
> mm: Implement pt_range_walk
> mm: Make /proc/pid/smaps use the new generic pagewalk API
> mm: Make /proc/pid/numa_maps use the new generic pagewalk API
> mm: Make /proc/pid/pagemap use the new generic pagewalk API
> mm: Make /proc/pid/clear_refs use the new generic pagewalk API
>
> arch/arm64/include/asm/pgtable.h | 41 +
> arch/loongarch/include/asm/pgtable.h | 1 +
> arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +
> arch/s390/include/asm/pgtable.h | 38 +
> arch/x86/include/asm/pgtable.h | 53 +
> arch/x86/include/asm/pgtable_64.h | 2 +
> arch/x86/mm/pgtable.c | 18 +-
> fs/proc/task_mmu.c | 2295 ++++++++----------
> include/asm-generic/pgtable_uffd.h | 15 +
> include/linux/leafops.h | 46 +
> include/linux/mm.h | 2 +
> include/linux/mm_inline.h | 32 +
> include/linux/pagewalk.h | 106 +
> include/linux/pgtable.h | 95 +
> mm/internal.h | 75 +-
> mm/memory.c | 22 +
> mm/pagewalk.c | 400 +++
> mm/pgtable-generic.c | 21 +
> 18 files changed, 2039 insertions(+), 1230 deletions(-)
>
> --
> 2.53.0
>
next prev parent reply other threads:[~2026-06-11 12:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 16:55 [RFC PATCH v3 0/8] Implement a new generic pagewalk API Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 1/8] mm: Add softleaf_from_pud Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 2/8] mm: Add {pmd,pud}_huge_lock helper Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 3/8] mm: Implement folio_pmd_batch Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 4/8] mm: Implement pt_range_walk Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 5/8] mm: Make /proc/pid/smaps use the new generic pagewalk API Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 6/8] mm: Make /proc/pid/numa_maps " Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 7/8] mm: Make /proc/pid/pagemap " Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 8/8] mm: Make /proc/pid/clear_refs " Oscar Salvador
2026-05-26 5:49 ` [syzbot ci] Re: Implement a " syzbot ci
2026-06-11 12:01 ` Oscar Salvador (SUSE)
2026-06-11 13:22 ` syzbot ci
2026-06-11 13:52 ` Oscar Salvador (SUSE)
2026-06-11 15:08 ` syzbot ci
2026-06-11 12:07 ` Lorenzo Stoakes [this message]
2026-06-11 12:29 ` [RFC PATCH v3 0/8] " Oscar Salvador (SUSE)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiqkyOCy2rf99wxg@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox