linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>, Peter Xu <peterx@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Kairui Song <kasong@tencent.com>, Nhat Pham <nphamcs@gmail.com>,
	Baoquan He <bhe@redhat.com>, Chris Li <chrisl@kernel.org>,
	SeongJae Park <sj@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
	Xu Xin <xu.xin16@zte.com.cn>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Jann Horn <jannh@google.com>, Miaohe Lin <linmiaohe@huawei.com>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	damon@lists.linux.dev
Subject: [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries
Date: Mon, 10 Nov 2025 22:21:18 +0000	[thread overview]
Message-ID: <cover.1762812360.git.lorenzo.stoakes@oracle.com> (raw)

There's an established convention in the kernel that we treat leaf page
tables (so far at the PTE, PMD level) as containing 'swap entries' should
they be neither empty (i.e. p**_none() evaluating true) nor present
(i.e. p**_present() evaluating true).

However, at the same time we also have helper predicates - is_swap_pte(),
is_swap_pmd() - which are inconsistently used.

This is problematic, as it is logical to assume that should somebody wish
to operate upon a page table swap entry they should first check to see if
it is in fact one.

It also implies that perhaps, in future, we might introduce a non-present,
none page table entry that is not a swap entry.

This series resolves this issue by systematically eliminating all use of
the is_swap_pte() and is swap_pmd() predicates so we retain only the
convention that should a leaf page table entry be neither none nor present
it is a swap entry.

We also have the further issue that 'swap entry' is unfortunately a really
rather overloaded term and in fact refers to both entries for swap and for
other information such as migration entries, page table markers, and device
private entries.

We therefore have the rather 'unique' concept of a 'non-swap' swap entry.

This series therefore introduces the concept of 'software leaf entries', of
type softleaf_t, to eliminate this confusion.

A software leaf entry in this sense is any page table entry which is
non-present, and represented by the softleaf_t type. That is - page table
leaf entries which are software-controlled by the kernel.

This includes 'none' or empty entries, which are simply represented by an
zero leaf entry value.

In order to maintain compatibility as we transition the kernel to this new
type, we simply typedef swp_entry_t to softleaf_t.

We introduce a number of predicates and helpers to interact with software
leaf entries in include/linux/leafops.h which, as it imports swapops.h, can
be treated as a drop-in replacement for swapops.h wherever leaf entry
helpers are used.

Since softleaf_from_[pte, pmd]() treats present entries as they were
empty/none leaf entries, this allows for a great deal of simplification of
code throughout the code base, which this series utilises a great deal.

We additionally change from swap entry to software leaf entry handling
where it makes sense to and eliminate functions from swapops.h where
software leaf entries obviate the need for the functions.

v3:
* Propagated tag (thanks SJ! :)
* Fixed up comments as per Mike.
* Fixed is_marker issue as per Lance.
* Fixed issue with softleaf_from_pte() as per Kairiu.
* Fixed comments as per Lance.
* Fixed comments as per Kairiu.
* Fixed missing softleaf_is_device_exclusive() kdoc in patch 2.
* Updated softleaf_from_pmd() to correct the none case like the PTE case.
* Fixed the rather unusual generic_max_swapfile_size() function which, at
  least on x86-64, generates an entirely invalid PTE entry (an empty one)
  then treats it as if it were a swap entry. We resolve this by generating
  this value manually.

v2:
* Folded all fixpatches into patches they fix.
* Added Vlasta's tag to patch 1 (thanks!)
* Renamed leaf_entry_t to softleaf_t and leafent_xxx() to softleaf_xxx() as
  a result of discussion between Matthew, Jason, David, Gregory & myself to
  make clearer that we abstract the concept of a software page table leaf
  entry.
* Updated all commit messages to reference softleaves.
* Updated the kdoc comment describing softleaf_t to provide more detail.
* Added a description of softleaves to the top of leafops.h.
https://lore.kernel.org/all/cover.1762621567.git.lorenzo.stoakes@oracle.com/

non-RFC v1:
* As part of efforts to eliminate swp_entry_t usage, remove
  pte_none_mostly() and correct UFFD PTE marker handling.
* Introduce leaf_entry_t - credit to Gregory for naming, and to Jason for
  the concept of simply using a leafent_*() set of functions to interact
  with these entities.
* Replace pte_to_swp_entry_or_zero() with leafent_from_pte() and simply
  categorise pte_none() cases as an empty leaf entry, as per Jason.
* Eliminate get_pte_swap_entry() - as we can simply do this with
  leafent_from_pte() also, as discussed with Jason.
* Put pmd_trans_huge_lock() acquisition/release in pagemap_pmd_range()
  rather than pmd_trans_huge_lock_thp() as per Gregory.
* Eliminate pmd_to_swp_entry() and related and introduce leafent_from_pmd()
  to replace it and further propagate leaf entry usage.
* Remove the confusing and unnecessary is_hugetlb_entry_[migration,
  hwpoison]() functions.
* Replace is_pfn_swap_entry(), pfn_swap_entry_to_page(),
  is_writable_device_private_entry(), is_device_exclusive_entry(),
  is_migration_entry(), is_writable_migration_entry(),
  is_readable_migration_entry(), is_readable_exclusive_migration_entry()
  and pfn_swap_entry_folio() with leafent equivalents.
* Wrapped up the 'safe' behaviour discussed with Jason in
  leafent_from_[pte, pmd]() so these can be used unconditionally which
  simplifies things a lot.
* Further changes that are a consequence of the introduction of leaf
  entries.
https://lore.kernel.org/all/cover.1762171281.git.lorenzo.stoakes@oracle.com/

RFC:
https://lore.kernel.org/all/cover.1761288179.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (16):
  mm: correctly handle UFFD PTE markers
  mm: introduce leaf entry type and use to simplify leaf entry logic
  mm: avoid unnecessary uses of is_swap_pte()
  mm: eliminate is_swap_pte() when softleaf_from_pte() suffices
  mm: use leaf entries in debug pgtable + remove is_swap_pte()
  fs/proc/task_mmu: refactor pagemap_pmd_range()
  mm: avoid unnecessary use of is_swap_pmd()
  mm/huge_memory: refactor copy_huge_pmd() non-present logic
  mm/huge_memory: refactor change_huge_pmd() non-present logic
  mm: replace pmd_to_swp_entry() with softleaf_from_pmd()
  mm: introduce pmd_is_huge() and use where appropriate
  mm: remove remaining is_swap_pmd() users and is_swap_pmd()
  mm: remove non_swap_entry() and use softleaf helpers instead
  mm: remove is_hugetlb_entry_[migration, hwpoisoned]()
  mm: eliminate further swapops predicates
  mm: replace remaining pte_to_swp_entry() with softleaf_from_pte()

 MAINTAINERS                   |   1 +
 arch/s390/mm/gmap_helpers.c   |  20 +-
 arch/s390/mm/pgtable.c        |  12 +-
 fs/proc/task_mmu.c            | 294 +++++++++-------
 fs/userfaultfd.c              |  95 +++---
 include/asm-generic/hugetlb.h |   8 -
 include/linux/huge_mm.h       |  48 ++-
 include/linux/hugetlb.h       |   2 -
 include/linux/leafops.h       | 619 ++++++++++++++++++++++++++++++++++
 include/linux/migrate.h       |   2 +-
 include/linux/mm_inline.h     |   6 +-
 include/linux/mm_types.h      |  25 ++
 include/linux/swapops.h       | 273 +--------------
 include/linux/userfaultfd_k.h |  33 +-
 mm/damon/ops-common.c         |   6 +-
 mm/debug_vm_pgtable.c         |  86 +++--
 mm/filemap.c                  |   8 +-
 mm/hmm.c                      |  41 ++-
 mm/huge_memory.c              | 263 ++++++++-------
 mm/hugetlb.c                  | 165 ++++-----
 mm/internal.h                 |  20 +-
 mm/khugepaged.c               |  33 +-
 mm/ksm.c                      |   6 +-
 mm/madvise.c                  |  28 +-
 mm/memory-failure.c           |   8 +-
 mm/memory.c                   | 150 ++++----
 mm/mempolicy.c                |  25 +-
 mm/migrate.c                  |  45 +--
 mm/migrate_device.c           |  24 +-
 mm/mincore.c                  |  25 +-
 mm/mprotect.c                 |  59 ++--
 mm/mremap.c                   |  13 +-
 mm/page_table_check.c         |  33 +-
 mm/page_vma_mapped.c          |  65 ++--
 mm/pagewalk.c                 |  15 +-
 mm/rmap.c                     |  17 +-
 mm/shmem.c                    |   7 +-
 mm/swap_state.c               |  12 +-
 mm/swapfile.c                 |  22 +-
 mm/userfaultfd.c              |  53 +--
 40 files changed, 1582 insertions(+), 1085 deletions(-)
 create mode 100644 include/linux/leafops.h

--
2.51.0

             reply	other threads:[~2025-11-10 22:22 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 22:21 Lorenzo Stoakes [this message]
2025-11-10 22:21 ` [PATCH v3 01/16] mm: correctly handle UFFD PTE markers Lorenzo Stoakes
2025-11-11  9:39   ` Mike Rapoport
2025-11-11  9:48     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic Lorenzo Stoakes
2025-11-11  3:25   ` Zi Yan
2025-11-11  7:16     ` Lorenzo Stoakes
2025-11-11 16:20       ` Zi Yan
2025-11-11 13:06     ` David Hildenbrand (Red Hat)
2025-11-11 16:26       ` Zi Yan
2025-11-11  3:56   ` Zi Yan
2025-11-11  7:31     ` Lorenzo Stoakes
2025-11-11 16:40       ` Zi Yan
2025-11-10 22:21 ` [PATCH v3 03/16] mm: avoid unnecessary uses of is_swap_pte() Lorenzo Stoakes
2025-11-12  2:58   ` Zi Yan
2025-11-10 22:21 ` [PATCH v3 04/16] mm: eliminate is_swap_pte() when softleaf_from_pte() suffices Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 05/16] mm: use leaf entries in debug pgtable + remove is_swap_pte() Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 06/16] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 07/16] mm: avoid unnecessary use of is_swap_pmd() Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 08/16] mm/huge_memory: refactor copy_huge_pmd() non-present logic Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 09/16] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 10/16] mm: replace pmd_to_swp_entry() with softleaf_from_pmd() Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 11/16] mm: introduce pmd_is_huge() and use where appropriate Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 12/16] mm: remove remaining is_swap_pmd() users and is_swap_pmd() Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 13/16] mm: remove non_swap_entry() and use softleaf helpers instead Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 14/16] mm: remove is_hugetlb_entry_[migration, hwpoisoned]() Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 15/16] mm: eliminate further swapops predicates Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 16/16] mm: replace remaining pte_to_swp_entry() with softleaf_from_pte() Lorenzo Stoakes
2025-11-10 22:24 ` [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries Lorenzo Stoakes
2025-11-11  0:17 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2025-11-08 17:08 Lorenzo Stoakes
2025-11-08 18:01 ` Andrew Morton
2025-11-10  7:32 ` Chris Li
2025-11-10 10:18   ` Lorenzo Stoakes
2025-11-10 11:04     ` Chris Li
2025-11-10 11:27       ` Lorenzo Stoakes
2025-11-10 23:38         ` Hugh Dickins
2025-11-11  0:23           ` Andrew Morton
2025-11-11  4:07             ` Hugh Dickins
2025-11-11  6:51               ` Lorenzo Stoakes
2025-11-11  4:16           ` Kairui Song
2025-11-11  6:55             ` Lorenzo Stoakes
2025-11-11  9:19         ` Chris Li
2025-11-11 10:03           ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1762812360.git.lorenzo.stoakes@oracle.com \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=byungchul@sk.com \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=kvm@vger.kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=leon@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).