Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@suse.com>,
	 Muchun Song <muchun.song@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v3 0/8] Implement a new generic pagewalk API
Date: Thu, 11 Jun 2026 13:07:21 +0100	[thread overview]
Message-ID: <aiqkyOCy2rf99wxg@lucifer> (raw)
In-Reply-To: <20260525165528.184397-1-osalvador@suse.de>

-cc wrong email

Sorry to be a pain but I only just noticed this because it's going to the
wrong email address :)

Could you make sure to send future revisions to ljs@kernel.org?

Thanks, Lorenzo

On Mon, May 25, 2026 at 06:55:20PM +0200, Oscar Salvador wrote:
> Changelog:
>  rfcv2 -> rfcv3:
>  - Fix an out-of-bounds write
>  - Convert clear_refs to the new API
>  - Fix issue when reading cont-PMDs
>  rfc -> rfcv2:
>  - Add pte_hole functionality
>  - Fix pagemap issues
>  - Fix shmem in smap
>  - Testing with pagemap "testsuite"
>
> [WARNING]
>
> This is not yet fully complete, but before investing more time into it I would like
> to know whether 1) this is heading into the right direction and 2) this is something
> we are still interested in.
> There are still things that need work:
>
> - convert make_uffd_wp_huge_pte: Since hugetlb is being dealt like a
>   pte, we inherited PTE_MARKERs for it when those came into play, and
>   AFAIK, those are being used mostly for UFFD.
>   From here on we have two options: 1) find another way to deal with
>   UFFD without markers or 2) introduce markers for PMD and PUD level.
>   I am leaning towards option 1), because 2) seems a bit unfair.
>   I still need to put some thought into it and see how we can achieve
>   that.
>
> - Teach the new API how to use other kind of locks. E.g: pagemap scan
>   needs to take i_mmap_lock during the scanning, so we need to able to
>   take that lock. I have some ideas to do that, but something for the
>   new version.
>
> - Find corner-cases and fix them.
>
>
> Kudos go to David, who was the person suggesting the interface and
> he gave me some ideas where to begin, besides providing feedback
> on early stages (in case there is something stupid don't blame him, blame me)
>
> Also, I would like to thank Vlastimil, who helped me running this
> patchset quite a few times through Claude, to catch some fixes.
>
> [/WARNING]
>
> [TESTING]
> Part of the testing has been to duplicate
> /proc/$$/(pagemap,smaps,numa_maps,clear_refs) and have the same with
> _lab extension linked to the old API.
> In that way I could check whether the outcome from e.g: /proc/$$/smaps
> and /proc/$$/smaps_lab was the same for any given program.
> The same I did for pagemap and numa_maps.
>
> Also, regarding pagemap:
> So far, tools/mm/page-types.c reports the right outcome (compared to the old API),
> and tools/testing/selftests/mm/pagemap_ioctl.c only reports 4 failing tests.
> Although to be honest, I do not how much should I trust that one because if I
> add a few delays in the userspace code, some tests that were failing before are not
> now, so yeah.
>
>  localhost:~/workspace # ./page-types -p 1168
>               flags     page-count       MB  symbolic-flags                     long-symbolic-flags
>  0x0000000000000800              1        0  ___________M_______________________________        mmap
>  0x0000000000000828              2        0  ___U_l_____M_______________________________        uptodate,lru,mmap
>  0x000000000000082c              1        0  __RU_l_____M_______________________________        referenced,uptodate,lru,mmap
>  0x0000000000004838              1        0  ___UDl_____M__b____________________________        uptodate,dirty,lru,mmap,swapbacked
>  0x000000000000086c            423        1  __RU_lA____M_______________________________        referenced,uptodate,lru,active,mmap
>  0x0000000000205828             29        0  ___U_l_____Ma_b______x_____________________        uptodate,lru,mmap,anonymous,swapbacked,ksm
>  0x000000000020586c              1        0  __RU_lA____Ma_b______x_____________________        referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
>               total            458        1
>
>  localhost:~/workspace # ./page-types_lab -p 1168
>               flags     page-count       MB  symbolic-flags                     long-symbolic-flags
>  0x0000000000000804              1        0  __R________M_______________________________        referenced,mmap
>  0x0000000000000828              2        0  ___U_l_____M_______________________________        uptodate,lru,mmap
>  0x000000000000082c              1        0  __RU_l_____M_______________________________        referenced,uptodate,lru,mmap
>  0x0000000000004838              1        0  ___UDl_____M__b____________________________        uptodate,dirty,lru,mmap,swapbacked
>  0x000000000000086c            423        1  __RU_lA____M_______________________________        referenced,uptodate,lru,active,mmap
>  0x0000000000205828             29        0  ___U_l_____Ma_b______x_____________________        uptodate,lru,mmap,anonymous,swapbacked,ksm
>  0x000000000020586c              1        0  __RU_lA____Ma_b______x_____________________        referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
>               total            458        1
>
> page-types being using the new API and page-types_lab the old one.
>
>  # ./pagemap_ioctl
>  TAP version 13
>  1..117
>  ok 1 sanity_tests_sd Zero range size is valid
>  ok 2 sanity_tests_sd output bu
>  ok 35 Walk_end: 1 max page
>  ok 36 Page testing: all new pages must not be written (dirty)
>  ok 37 Page testing: all pages must be written (dirty)
>  ok 38 Page testing: all pages dirty other than first and the last one
>  ok 39 Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
>  ok 40 Page testing: only middle page dirty
>  ok 41 Page testing: only two middle pages dirty
>  ok 42 Large Page testing: all new pages must not be written (dirty)
>  ok 43 Large Page testing: all pages must be written (dirty)
>  ok 44 Large Page testing: all pages dirty other than first and the last one
>  ok 45 Large Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
>  ok 46 Large Page testing: only middle page dirty
>  ok 47 Large Page testing: only two middle pages dirty
>  ok 48 Huge page testing: all new pages must not be written (dirty)
>  ok 49 Huge page testing: all pages must be written (dirty)
>  ok 50 Huge page testing: all pages dirty other than first and the last one
>  ok 51 Huge page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
>  ok 52 Huge page testing: only middle page dirty
>  ok 53 Huge page testing: only two middle pages dirty
>  ok 54 Hugetlb shmem testing: all new pages must not be written (dirty)
>  ok 55 Hugetlb shmem testing: all pages must be written (dirty)
>  ok 56 Hugetlb shmem testing: all pages dirty other than first and the last one
>  ok 57 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
>  ok 58 Hugetlb shmem testing: only middle page dirty
>  not ok 59 Hugetlb shmem testing: only two middle pages dirty
>  ok 60 Hugetlb mem testing: all new pages must not be written (dirty)
>  ok 61 Hugetlb mem testing: all pages must be written (dirty)
>  ok 62 Hugetlb mem testing: all pages dirty other than first and the last one
>  ok 63 Hugetlb mem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
>  ok 64 Hugetlb mem testing: only middle page dirty
>  not ok 65 Hugetlb mem testing: only two middle pages dirty
>  ok 66 Hugetlb shmem testing: all new pages must not be written (dirty)
>  ok 67 Hugetlb shmem testing: all pages must be written (dirty)
>  ok 68 Hugetlb shmem testing: all pages dirty other than first and the last one
>  ok 69 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
>  ok 70 Hugetlb shmem testing: only middle page dirty
>  not ok 71 Hugetlb shmem testing: only two middle pages dirty
>  ok 72 File memory testing: all new pages must not be written (dirty)
>  ok 73 File memory testing: all p
>  # Totals: pass:113 fail:4 xfail:0 xpass:0 skip:0 error:0
>
> [/TESTING]
>
> In the LSFMM/BFP 2025, there was a general agreement that we 1) would like to have
> a generic pagewalk API 2) that replaces the existing one with callbacks if possible
> and 3) that HugeTLB can use without the need to special case it (e.g: not having to
> depend on .hugetlb_entry callbacks)., which means having a lot of duplicated
> code and also having a lot of special casing just because hugetlb lore.
>
> pt_range_walk API tries to do that and replaces the old behaviour of "in
> HugeTLB world everything reads as a PTE" and starts reading HugeTLB entries
> the way they really are, that means interpreting them as PMD/PUD entries and
> contiguous-PMD/PTE entries.
>
> In order to achieve that, we need some infrastructure we did not really need until
> know, in order to be able to read HugeTLB pages as PUD/PMD entries.
> E.g: softleaf_from_pud had to be added and some other pud_* functions.
>
> In a few words, this API goes through an address range and returns
> whatever it is in there (swap/hwpoison/migration/marker entries, folios,
> pfn and device entries, or nothing).
>
> These are the internal return types the API uses:
>
>  PT_TYPE_NONE
>  PT_TYPE_FOLIO
>  PT_TYPE_MARKER
>  PT_TYPE_PFN
>  PT_TYPE_SWAP
>  PT_TYPE_MIGRATION
>  PT_TYPE_DEVICE
>  PT_TYPE_HWPOISON
>
> The API also handles locking and batching itself, so the caller
> does not really need to bother with that.
>
> In order to handle contiguous-PMD mapped hugetlb pages, folio_pmd_batch,
> which is an analogous of folio_pte_batch, has been implemented.
>
> More information about the API can be found in patch #4.
>
> This was tested on x86_64 and arm64, but as I said, it is still
> incomplete, therefore the RFC, to gather some initial feedback before
> investing more time into this.
>
> For now, all users of the old API from fs/proc/task_mmu.c have been
> converted: /proc/pid/(smaps|numa_maps|pagemap|clear_refs).
>
> Thanks in advance
>
> Oscar Salvador (8):
>   mm: Add softleaf_from_pud
>   mm: Add {pmd,pud}_huge_lock helper
>   mm: Implement folio_pmd_batch
>   mm: Implement pt_range_walk
>   mm: Make /proc/pid/smaps use the new generic pagewalk API
>   mm: Make /proc/pid/numa_maps use the new generic pagewalk API
>   mm: Make /proc/pid/pagemap use the new generic pagewalk API
>   mm: Make /proc/pid/clear_refs use the new generic pagewalk API
>
>  arch/arm64/include/asm/pgtable.h             |   41 +
>  arch/loongarch/include/asm/pgtable.h         |    1 +
>  arch/powerpc/include/asm/book3s/64/pgtable.h |    7 +
>  arch/s390/include/asm/pgtable.h              |   38 +
>  arch/x86/include/asm/pgtable.h               |   53 +
>  arch/x86/include/asm/pgtable_64.h            |    2 +
>  arch/x86/mm/pgtable.c                        |   18 +-
>  fs/proc/task_mmu.c                           | 2295 ++++++++----------
>  include/asm-generic/pgtable_uffd.h           |   15 +
>  include/linux/leafops.h                      |   46 +
>  include/linux/mm.h                           |    2 +
>  include/linux/mm_inline.h                    |   32 +
>  include/linux/pagewalk.h                     |  106 +
>  include/linux/pgtable.h                      |   95 +
>  mm/internal.h                                |   75 +-
>  mm/memory.c                                  |   22 +
>  mm/pagewalk.c                                |  400 +++
>  mm/pgtable-generic.c                         |   21 +
>  18 files changed, 2039 insertions(+), 1230 deletions(-)
>
> --
> 2.53.0
>


  parent reply	other threads:[~2026-06-11 12:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 16:55 [RFC PATCH v3 0/8] Implement a new generic pagewalk API Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 1/8] mm: Add softleaf_from_pud Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 2/8] mm: Add {pmd,pud}_huge_lock helper Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 3/8] mm: Implement folio_pmd_batch Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 4/8] mm: Implement pt_range_walk Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 5/8] mm: Make /proc/pid/smaps use the new generic pagewalk API Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 6/8] mm: Make /proc/pid/numa_maps " Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 7/8] mm: Make /proc/pid/pagemap " Oscar Salvador
2026-05-25 16:55 ` [RFC PATCH v3 8/8] mm: Make /proc/pid/clear_refs " Oscar Salvador
2026-05-26  5:49 ` [syzbot ci] Re: Implement a " syzbot ci
2026-06-11 12:01   ` Oscar Salvador (SUSE)
2026-06-11 13:22     ` syzbot ci
2026-06-11 13:52   ` Oscar Salvador (SUSE)
2026-06-11 15:08     ` syzbot ci
2026-06-11 12:07 ` Lorenzo Stoakes [this message]
2026-06-11 12:29   ` [RFC PATCH v3 0/8] " Oscar Salvador (SUSE)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiqkyOCy2rf99wxg@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox