Generic Linux architectural discussions
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>, Peter Xu <peterx@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Kairui Song <kasong@tencent.com>, Nhat Pham <nphamcs@gmail.com>,
	Baoquan He <bhe@redhat.com>, Chris Li <chrisl@kernel.org>,
	SeongJae Park <sj@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
	Xu Xin <xu.xin16@zte.com.cn>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Jann Horn <jannh@google.com>, Miaohe Lin <linmiaohe@huawei.com>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	damon@lists.linux.dev
Subject: Re: [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic
Date: Mon, 10 Nov 2025 22:25:40 -0500	[thread overview]
Message-ID: <CBBF1711-5881-4B5A-ADE6-1D86C0E94296@nvidia.com> (raw)
In-Reply-To: <c879383aac77d96a03e4d38f7daba893cd35fc76.1762812360.git.lorenzo.stoakes@oracle.com>

On 10 Nov 2025, at 17:21, Lorenzo Stoakes wrote:

> The kernel maintains leaf page table entries which contain either:
>
> - Nothing ('none' entries)
> - Present entries (that is stuff the hardware can navigate without fault)

This is not true for:

1. pXX_protnone(), where _PAGE_PROTNONE flag also means pXX_present() is
true, but hardware would still trigger a fault.
2. pmd_present() where _PAGE_PSE also means a present PMD (see the comment
in pmd_present()).

This commit log needs to be updated.

> - Everything else that will cause a fault which the kernel handles

This is not true because of the reasons above.

How should we categorize these non-present to HW but present to SW entries,
like protnone and under splitting PMDs? Strictly speaking, they are
softleaf entries, but that would require more changes to the kernel code
and pXX_present() means HW present.

To not make this series more complicated, I think updating commit log
and comments to use pXX_present() instead of HW present might be
the easiest way out. We can revisit pXX_present() vs HW present later.

OK, I will focus on code review now.

>
> In the 'everything else' group we include swap entries, but we also include
> a number of other things such as migration entries, device private entries
> and marker entries.
>
> Unfortunately this 'everything else' group expresses everything through
> a swp_entry_t type, and these entries are referred to swap entries even
> though they may well not contain a... swap entry.
>
> This is compounded by the rather mind-boggling concept of a non-swap swap
> entry (checked via non_swap_entry()) and the means by which we twist and
> turn to satisfy this.
>
> This patch lays the foundation for reducing this confusion.
>
> We refer to 'everything else' as a 'software-define leaf entry' or
> 'softleaf'. for short And in fact we scoop up the 'none' entries into this
> concept also so we are left with:
>
> - Present entries.
> - Softleaf entries (which may be empty).
>
> This allows for radical simplification across the board - one can simply
> convert any leaf page table entry to a leaf entry via softleaf_from_pte().
>
> If the entry is present, we return an empty leaf entry, so it is assumed
> the caller is aware that they must differentiate between the two categories
> of page table entries, checking for the former via pte_present().
>
> As a result, we can eliminate a number of places where we would otherwise
> need to use predicates to see if we can proceed with leaf page table entry
> conversion and instead just go ahead and do it unconditionally.
>
> We do so where we can, adjusting surrounding logic as necessary to
> integrate the new softleaf_t logic as far as seems reasonable at this
> stage.
>
> We typedef swp_entry_t to softleaf_t for the time being until the
> conversion can be complete, meaning everything remains compatible
> regardless of which type is used. We will eventually remove swp_entry_t
> when the conversion is complete.
>
> We introduce a new header file to keep things clear - leafops.h - this
> imports swapops.h so can direct replace swapops imports without issue, and
> we do so in all the files that require it.
>
> Additionally, add new leafops.h file to core mm maintainers entry.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  MAINTAINERS                   |   1 +
>  fs/proc/task_mmu.c            |  26 +--
>  fs/userfaultfd.c              |   6 +-
>  include/linux/leafops.h       | 387 ++++++++++++++++++++++++++++++++++
>  include/linux/mm_inline.h     |   6 +-
>  include/linux/mm_types.h      |  25 +++
>  include/linux/swapops.h       |  28 ---
>  include/linux/userfaultfd_k.h |  51 +----
>  mm/hmm.c                      |   2 +-
>  mm/hugetlb.c                  |  37 ++--
>  mm/madvise.c                  |  16 +-
>  mm/memory.c                   |  41 ++--
>  mm/mincore.c                  |   6 +-
>  mm/mprotect.c                 |   6 +-
>  mm/mremap.c                   |   4 +-
>  mm/page_vma_mapped.c          |  11 +-
>  mm/shmem.c                    |   7 +-
>  mm/userfaultfd.c              |   6 +-
>  18 files changed, 502 insertions(+), 164 deletions(-)
>  create mode 100644 include/linux/leafops.h
>


Best Regards,
Yan, Zi

  reply	other threads:[~2025-11-11  3:25 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 22:21 [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 01/16] mm: correctly handle UFFD PTE markers Lorenzo Stoakes
2025-11-11  9:39   ` Mike Rapoport
2025-11-11  9:48     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic Lorenzo Stoakes
2025-11-11  3:25   ` Zi Yan [this message]
2025-11-11  7:16     ` Lorenzo Stoakes
2025-11-11 16:20       ` Zi Yan
2025-11-11 13:06     ` David Hildenbrand (Red Hat)
2025-11-11 16:26       ` Zi Yan
2025-11-12 15:36         ` Lorenzo Stoakes
2025-11-11  3:56   ` Zi Yan
2025-11-11  7:31     ` Lorenzo Stoakes
2025-11-11 16:40       ` Zi Yan
2025-11-12 14:06         ` Lorenzo Stoakes
2025-11-12 15:32   ` Lorenzo Stoakes
2025-11-12 15:36   ` Vlastimil Babka
2025-11-13 14:56   ` Lorenzo Stoakes
2025-11-13 15:32     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 03/16] mm: avoid unnecessary uses of is_swap_pte() Lorenzo Stoakes
2025-11-12  2:58   ` Zi Yan
2025-11-12 15:59     ` Lorenzo Stoakes
2025-11-12 16:03       ` Zi Yan
2025-11-12 16:11     ` Zi Yan
2025-11-12 18:48   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 04/16] mm: eliminate is_swap_pte() when softleaf_from_pte() suffices Lorenzo Stoakes
2025-11-21 16:46   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 05/16] mm: use leaf entries in debug pgtable + remove is_swap_pte() Lorenzo Stoakes
2025-11-21 17:10   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 06/16] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
2025-11-21 17:17   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 07/16] mm: avoid unnecessary use of is_swap_pmd() Lorenzo Stoakes
2025-11-21 17:42   ` Vlastimil Babka
2025-11-21 19:25     ` Lorenzo Stoakes
2025-11-21 19:55       ` Andrew Morton
2025-11-24 12:27         ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 08/16] mm/huge_memory: refactor copy_huge_pmd() non-present logic Lorenzo Stoakes
2025-11-21 17:56   ` Vlastimil Babka
2025-11-21 19:23     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 09/16] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
2025-11-21 17:58   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 10/16] mm: replace pmd_to_swp_entry() with softleaf_from_pmd() Lorenzo Stoakes
2025-11-21 18:42   ` Vlastimil Babka
2025-11-21 19:22     ` Lorenzo Stoakes
2025-11-21 19:23   ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 11/16] mm: introduce pmd_is_huge() and use where appropriate Lorenzo Stoakes
2025-11-27 17:00   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 12/16] mm: remove remaining is_swap_pmd() users and is_swap_pmd() Lorenzo Stoakes
2025-11-27 17:03   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 13/16] mm: remove non_swap_entry() and use softleaf helpers instead Lorenzo Stoakes
2025-11-27 17:12   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 14/16] mm: remove is_hugetlb_entry_[migration, hwpoisoned]() Lorenzo Stoakes
2025-11-27 17:29   ` Vlastimil Babka
2025-11-27 17:41     ` Lorenzo Stoakes
2025-11-27 17:45   ` Lorenzo Stoakes
2025-11-27 19:33     ` Andrew Morton
2025-11-10 22:21 ` [PATCH v3 15/16] mm: eliminate further swapops predicates Lorenzo Stoakes
2025-11-27 17:42   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 16/16] mm: replace remaining pte_to_swp_entry() with softleaf_from_pte() Lorenzo Stoakes
2025-11-27 17:53   ` Vlastimil Babka
2025-11-27 18:02     ` Vlastimil Babka
2025-11-27 18:03     ` Lorenzo Stoakes
2025-11-10 22:24 ` [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries Lorenzo Stoakes
2025-11-11  0:17 ` Andrew Morton
2025-11-21 23:44 ` Jason Gunthorpe
2025-11-24 10:06   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CBBF1711-5881-4B5A-ADE6-1D86C0E94296@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=byungchul@sk.com \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=kvm@vger.kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=leon@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox