linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	nvdimm@lists.linux.dev, David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Juergen Gross <jgross@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Alistair Popple <apopple@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>
Subject: [PATCH RFC 08/14] mm/huge_memory: mark PMD mappings of the huge zero folio special
Date: Tue, 17 Jun 2025 17:43:39 +0200	[thread overview]
Message-ID: <20250617154345.2494405-9-david@redhat.com> (raw)
In-Reply-To: <20250617154345.2494405-1-david@redhat.com>

The huge zero folio is refcounted (+mapcounted -- is that a word?)
differently than "normal" folios, similarly (but different) to the ordinary
shared zeropage.

For this reason, we special-case these pages in
vm_normal_page*/vm_normal_folio*, and only allow selected callers to
still use them (e.g., GUP can still take a reference on them).

vm_normal_page_pmd() already filters out the huge zero folio. However,
so far we are not marking it as special like we do with the ordinary
shared zeropage. Let's mark it as special, so we can further refactor
vm_normal_page_pmd() and vm_normal_page().

While at it, update the doc regarding the shared zero folios.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/huge_memory.c |  5 ++++-
 mm/memory.c      | 13 +++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 92400f3baa9ff..8f03cd4e40397 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1309,6 +1309,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm,
 {
 	pmd_t entry;
 	entry = folio_mk_pmd(zero_folio, vma->vm_page_prot);
+	entry = pmd_mkspecial(entry);
 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
 	set_pmd_at(mm, haddr, pmd, entry);
 	mm_inc_nr_ptes(mm);
@@ -1418,7 +1419,9 @@ static vm_fault_t insert_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (fop.is_folio) {
 		entry = folio_mk_pmd(fop.folio, vma->vm_page_prot);
 
-		if (!is_huge_zero_folio(fop.folio)) {
+		if (is_huge_zero_folio(fop.folio)) {
+			entry = pmd_mkspecial(entry);
+		} else {
 			folio_get(fop.folio);
 			folio_add_file_rmap_pmd(fop.folio, &fop.folio->page, vma);
 			add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR);
diff --git a/mm/memory.c b/mm/memory.c
index 9a1acd057ce59..ef277dab69e33 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -541,7 +541,13 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
  *
  * "Special" mappings do not wish to be associated with a "struct page" (either
  * it doesn't exist, or it exists but they don't want to touch it). In this
- * case, NULL is returned here. "Normal" mappings do have a struct page.
+ * case, NULL is returned here. "Normal" mappings do have a struct page and
+ * are ordinarily refcounted.
+ *
+ * Page mappings of the shared zero folios are always considered "special", as
+ * they are not ordinarily refcounted. However, selected page table walkers
+ * (such as GUP) can still identify these mappings and work with the
+ * underlying "struct page".
  *
  * There are 2 broad cases. Firstly, an architecture may define a pte_special()
  * pte bit, in which case this function is trivial. Secondly, an architecture
@@ -571,9 +577,8 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
  *
  * VM_MIXEDMAP mappings can likewise contain memory with or without "struct
  * page" backing, however the difference is that _all_ pages with a struct
- * page (that is, those where pfn_valid is true) are refcounted and considered
- * normal pages by the VM. The only exception are zeropages, which are
- * *never* refcounted.
+ * page (that is, those where pfn_valid is true, except the shared zero
+ * folios) are refcounted and considered normal pages by the VM.
  *
  * The disadvantage is that pages are refcounted (which can be slower and
  * simply not an option for some PFNMAP users). The advantage is that we
-- 
2.49.0



  parent reply	other threads:[~2025-06-17 15:44 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17 15:43 [PATCH RFC 00/14] mm: vm_normal_page*() + CoW PFNMAP improvements David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page() David Hildenbrand
2025-06-20 12:50   ` Oscar Salvador
2025-06-23 14:04     ` David Hildenbrand
2025-06-25  7:54       ` Oscar Salvador
2025-07-03 12:34       ` Lance Yang
2025-07-03 12:39         ` David Hildenbrand
2025-07-03 14:44           ` Lance Yang
2025-07-04 12:40             ` David Hildenbrand
2025-07-07  6:31               ` Hugh Dickins
2025-07-07 13:19                 ` David Hildenbrand
2025-07-08  2:52                   ` Hugh Dickins
2025-07-11 15:30                     ` David Hildenbrand
2025-07-11 18:49                       ` Hugh Dickins
2025-07-11 18:57                         ` David Hildenbrand
2025-06-25  7:55   ` Oscar Salvador
2025-07-03 14:50   ` Lance Yang
2025-06-17 15:43 ` [PATCH RFC 02/14] mm: drop highest_memmap_pfn David Hildenbrand
2025-06-20 13:04   ` Oscar Salvador
2025-06-20 18:11   ` Pedro Falcato
2025-06-17 15:43 ` [PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages David Hildenbrand
2025-06-20 13:27   ` Oscar Salvador
2025-06-23 19:22     ` David Hildenbrand
2025-06-20 18:24   ` Pedro Falcato
2025-06-23 19:19     ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 04/14] mm/huge_memory: move more common code into insert_pmd() David Hildenbrand
2025-06-20 14:12   ` Oscar Salvador
2025-07-07  2:48     ` Alistair Popple
2025-06-17 15:43 ` [PATCH RFC 05/14] mm/huge_memory: move more common code into insert_pud() David Hildenbrand
2025-06-20 14:15   ` Oscar Salvador
2025-07-07  2:51   ` Alistair Popple
2025-06-17 15:43 ` [PATCH RFC 06/14] mm/huge_memory: support huge zero folio in vmf_insert_folio_pmd() David Hildenbrand
2025-06-25  8:15   ` Oscar Salvador
2025-06-25  8:17     ` Oscar Salvador
2025-06-25  8:20   ` Oscar Salvador
2025-06-25  8:59     ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 07/14] fs/dax: use vmf_insert_folio_pmd() to insert the huge zero folio David Hildenbrand
2025-06-24  1:16   ` Alistair Popple
2025-06-25  9:03     ` David Hildenbrand
2025-07-04 13:22       ` David Hildenbrand
2025-07-07 11:50         ` Alistair Popple
2025-06-17 15:43 ` David Hildenbrand [this message]
2025-06-25  8:32   ` [PATCH RFC 08/14] mm/huge_memory: mark PMD mappings of the huge zero folio special Oscar Salvador
2025-07-14 12:41     ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 09/14] mm/memory: introduce is_huge_zero_pfn() and use it in vm_normal_page_pmd() David Hildenbrand
2025-06-25  8:37   ` Oscar Salvador
2025-06-17 15:43 ` [PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*() David Hildenbrand
2025-06-25  8:53   ` Oscar Salvador
2025-06-25  8:57     ` David Hildenbrand
2025-06-25  9:20       ` Oscar Salvador
2025-06-25 10:14         ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 11/14] mm: remove "horrible special case to handle copy-on-write behaviour" David Hildenbrand
2025-06-25  8:47   ` David Hildenbrand
2025-06-25  9:02     ` Oscar Salvador
2025-06-25  9:04       ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 12/14] mm: drop addr parameter from vm_normal_*_pmd() David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 13/14] mm: introduce and use vm_normal_page_pud() David Hildenbrand
2025-06-25  9:22   ` Oscar Salvador
2025-06-17 15:43 ` [PATCH RFC 14/14] mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page() David Hildenbrand
2025-06-25  9:34   ` Oscar Salvador
2025-07-14 14:19     ` David Hildenbrand
2025-06-17 16:18 ` [PATCH RFC 00/14] mm: vm_normal_page*() + CoW PFNMAP improvements David Hildenbrand
2025-06-17 18:25   ` David Hildenbrand
2025-06-25  8:49 ` Lorenzo Stoakes
2025-06-25  8:55   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250617154345.2494405-9-david@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dev.jain@arm.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgross@suse.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).