sparclinux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte()
@ 2025-02-17 19:08 Matthew Wilcox (Oracle)
  2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

The intent is to add folio_mk_pte() to remove the conversion from folio
to page necessary to call mk_pte().  Eventually we might end up removing
mk_pte(), but that's not what's being proposed today.

I didn't want to add folio_mk_pte() to each architecture, and I didn't
want to lose any optimisations that architectures have from their own
implementation of mk_pte().  Fortunately, most architectures have by
now turned their mk_pte() into a fairly bland variant of pfn_pte(),
but s390 is different.

So patch 1 hoists the optimisation of calling pte_mkdirty() from s390
to generic code.  I'd appreciate some eyes on this from mm people who
understand this better than I do.  I originally had

-	if (write)
+	if (write || folio_test_dirty(folio))
		entry = maybe_mkwrite(pte_mkdirty(entry), vma);

and I think that broke COW under some circumstances that 01.org could
reproduce and I couldn't.

The various architecture maintainers might care to make sure that what
I've done is an equivalent transformation.  x86 was particularly tricky.
The build bots say it works ... at least now I've dealt with the pesky
!MMU problem.

The last patch to actually use folio_mk_pte() ought to be the least likely
to have a problem  since it's equivalent to calling mk_pte(&folio->page).

Matthew Wilcox (Oracle) (7):
  mm: Set the pte dirty if the folio is already dirty
  mm: Introduce a common definition of mk_pte()
  sparc32: Remove custom definition of mk_pte()
  x86: Remove custom definition of mk_pte()
  um: Remove custom definition of mk_pte()
  mm: Make mk_pte() definition unconditional
  mm: Add folio_mk_pte()

 arch/alpha/include/asm/pgtable.h         |  7 -------
 arch/arc/include/asm/pgtable-levels.h    |  1 -
 arch/arm/include/asm/pgtable.h           |  1 -
 arch/arm64/include/asm/pgtable.h         |  6 ------
 arch/csky/include/asm/pgtable.h          |  5 -----
 arch/hexagon/include/asm/pgtable.h       |  3 ---
 arch/loongarch/include/asm/pgtable.h     |  6 ------
 arch/m68k/include/asm/mcf_pgtable.h      |  6 ------
 arch/m68k/include/asm/motorola_pgtable.h |  6 ------
 arch/m68k/include/asm/sun3_pgtable.h     |  6 ------
 arch/microblaze/include/asm/pgtable.h    |  8 --------
 arch/mips/include/asm/pgtable.h          |  6 ------
 arch/nios2/include/asm/pgtable.h         |  6 ------
 arch/openrisc/include/asm/pgtable.h      |  2 --
 arch/parisc/include/asm/pgtable.h        |  6 ------
 arch/powerpc/include/asm/pgtable.h       |  3 +--
 arch/riscv/include/asm/pgtable.h         |  2 --
 arch/s390/include/asm/pgtable.h          | 10 ----------
 arch/sh/include/asm/pgtable_32.h         |  8 --------
 arch/sparc/include/asm/pgtable_32.h      | 15 +++++----------
 arch/sparc/include/asm/pgtable_64.h      |  1 -
 arch/um/include/asm/pgtable-2level.h     |  1 -
 arch/um/include/asm/pgtable-4level.h     |  9 ---------
 arch/um/include/asm/pgtable.h            | 18 ++++++++----------
 arch/x86/include/asm/pgtable.h           | 19 +++----------------
 arch/xtensa/include/asm/pgtable.h        |  6 ------
 include/linux/mm.h                       | 22 ++++++++++++++++++++++
 mm/memory.c                              |  8 +++++---
 mm/userfaultfd.c                         |  2 +-
 29 files changed, 45 insertions(+), 154 deletions(-)

-- 
2.47.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-18 10:31   ` David Hildenbrand
                     ` (2 more replies)
  2025-02-17 19:08 ` [PATCH 2/7] mm: Introduce a common definition of mk_pte() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

If the first access to a folio is a read that is then followed by a
write, we can save a page fault.  s390 implemented this in their
mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty
bits"), but other architectures can also benefit from this.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/s390/include/asm/pgtable.h | 7 +------
 mm/memory.c                     | 2 ++
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 3ca5af4cfe43..3ee495b5171e 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
 
 static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
 {
-	unsigned long physpage = page_to_phys(page);
-	pte_t __pte = mk_pte_phys(physpage, pgprot);
-
-	if (pte_write(__pte) && PageDirty(page))
-		__pte = pte_mkdirty(__pte);
-	return __pte;
+	return mk_pte_phys(page_to_phys(page), pgprot);
 }
 
 #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7c6d54..4330560eee55 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
 
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	else if (pte_write(entry) && folio_test_dirty(folio))
+		entry = pte_mkdirty(entry);
 	if (unlikely(vmf_orig_pte_uffd_wp(vmf)))
 		entry = pte_mkuffd_wp(entry);
 	/* copy-on-write page */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/7] mm: Introduce a common definition of mk_pte()
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
  2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-18  8:15   ` Geert Uytterhoeven
                     ` (2 more replies)
  2025-02-17 19:08 ` [PATCH 3/7] sparc32: Remove custom " Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

Most architectures simply call pfn_pte().  Centralise that as the normal
definition and remove the definition of mk_pte() from the architectures
which have either that exact definition or something similar.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/alpha/include/asm/pgtable.h         | 7 -------
 arch/arc/include/asm/pgtable-levels.h    | 1 -
 arch/arm/include/asm/pgtable.h           | 1 -
 arch/arm64/include/asm/pgtable.h         | 6 ------
 arch/csky/include/asm/pgtable.h          | 5 -----
 arch/hexagon/include/asm/pgtable.h       | 3 ---
 arch/loongarch/include/asm/pgtable.h     | 6 ------
 arch/m68k/include/asm/mcf_pgtable.h      | 6 ------
 arch/m68k/include/asm/motorola_pgtable.h | 6 ------
 arch/m68k/include/asm/sun3_pgtable.h     | 6 ------
 arch/microblaze/include/asm/pgtable.h    | 8 --------
 arch/mips/include/asm/pgtable.h          | 6 ------
 arch/nios2/include/asm/pgtable.h         | 6 ------
 arch/openrisc/include/asm/pgtable.h      | 2 --
 arch/parisc/include/asm/pgtable.h        | 6 ------
 arch/powerpc/include/asm/pgtable.h       | 3 +--
 arch/riscv/include/asm/pgtable.h         | 2 --
 arch/s390/include/asm/pgtable.h          | 5 -----
 arch/sh/include/asm/pgtable_32.h         | 8 --------
 arch/sparc/include/asm/pgtable_64.h      | 1 -
 arch/xtensa/include/asm/pgtable.h        | 6 ------
 include/linux/mm.h                       | 9 +++++++++
 22 files changed, 10 insertions(+), 99 deletions(-)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 02e8817a8921..2676017f42f1 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -192,13 +192,6 @@ extern unsigned long __zero_page(void);
 #define pte_pfn(pte)		(pte_val(pte) >> PFN_PTE_SHIFT)
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
-#define mk_pte(page, pgprot)						\
-({									\
-	pte_t pte;							\
-									\
-	pte_val(pte) = (page_to_pfn(page) << 32) | pgprot_val(pgprot);	\
-	pte;								\
-})
 
 extern inline pte_t pfn_pte(unsigned long physpfn, pgprot_t pgprot)
 { pte_t pte; pte_val(pte) = (PHYS_TWIDDLE(physpfn) << 32) | pgprot_val(pgprot); return pte; }
diff --git a/arch/arc/include/asm/pgtable-levels.h b/arch/arc/include/asm/pgtable-levels.h
index 86e148226463..55dbd2719e35 100644
--- a/arch/arc/include/asm/pgtable-levels.h
+++ b/arch/arc/include/asm/pgtable-levels.h
@@ -177,7 +177,6 @@
 #define set_pte(ptep, pte)	((*(ptep)) = (pte))
 #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)
 #define pfn_pte(pfn, prot)	__pte(__pfn_to_phys(pfn) | pgprot_val(prot))
-#define mk_pte(page, prot)	pfn_pte(page_to_pfn(page), prot)
 
 #ifdef CONFIG_ISA_ARCV2
 #define pmd_leaf(x)		(pmd_val(x) & _PAGE_HW_SZ)
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index be91e376df79..74c3b5a6eab3 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -169,7 +169,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pfn_pte(pfn,prot)	__pte(__pfn_to_phys(pfn) | pgprot_val(prot))
 
 #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
-#define mk_pte(page,prot)	pfn_pte(page_to_pfn(page), prot)
 
 #define pte_clear(mm,addr,ptep)	set_pte_ext(ptep, __pte(0), 0)
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b2a2ad1b9e8..64b469d799c6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -793,12 +793,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 /* use ONLY for statically allocated translation tables */
 #define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page,prot)	pfn_pte(page_to_pfn(page),prot)
-
 #if CONFIG_PGTABLE_LEVELS > 2
 
 #define pmd_ERROR(e)	\
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index a397e1718ab6..b8378431aeff 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -249,11 +249,6 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
 	return __pgprot(prot);
 }
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot)    pfn_pte(page_to_pfn(page), (pgprot))
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) |
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index 8c5b7a1c3d90..9fbdfdbc539f 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -238,9 +238,6 @@ static inline int pte_present(pte_t pte)
 	return pte_val(pte) & _PAGE_PRESENT;
 }
 
-/* mk_pte - make a PTE out of a page pointer and protection bits */
-#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
-
 /* pte_page - returns a page (frame pointer/descriptor?) based on a PTE */
 #define pte_page(x) pfn_to_page(pte_pfn(x))
 
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index da346733a1da..9ba3a4ebcd98 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -426,12 +426,6 @@ static inline unsigned long pte_accessible(struct mm_struct *mm, pte_t a)
 	return false;
 }
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) |
diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
index 48f87a8a8832..f5c596b211d4 100644
--- a/arch/m68k/include/asm/mcf_pgtable.h
+++ b/arch/m68k/include/asm/mcf_pgtable.h
@@ -96,12 +96,6 @@
 
 #define pmd_pgtable(pmd) pfn_to_virt(pmd_val(pmd) >> PAGE_SHIFT)
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	pte_val(pte) = (pte_val(pte) & CF_PAGE_CHG_MASK) | pgprot_val(newprot);
diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h
index 9866c7acdabe..040ac3bad713 100644
--- a/arch/m68k/include/asm/motorola_pgtable.h
+++ b/arch/m68k/include/asm/motorola_pgtable.h
@@ -81,12 +81,6 @@ extern unsigned long mm_cachebits;
 
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot);
diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h
index 30081aee8164..73745dc0ec0e 100644
--- a/arch/m68k/include/asm/sun3_pgtable.h
+++ b/arch/m68k/include/asm/sun3_pgtable.h
@@ -76,12 +76,6 @@
 
 #ifndef __ASSEMBLY__
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	pte_val(pte) = (pte_val(pte) & SUN3_PAGE_CHG_MASK) | pgprot_val(newprot);
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index e4ea2ec3642f..b1bb2c65dd04 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -285,14 +285,6 @@ static inline pte_t mk_pte_phys(phys_addr_t physpage, pgprot_t pgprot)
 	return pte;
 }
 
-#define mk_pte(page, pgprot) \
-({									   \
-	pte_t pte;							   \
-	pte_val(pte) = (((page - mem_map) << PAGE_SHIFT) + memory_start) |  \
-			pgprot_val(pgprot);				   \
-	pte;								   \
-})
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot);
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index c29a551eb0ca..d69cfa5a8ac6 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -504,12 +504,6 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
 	return true;
 }
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
-
 #if defined(CONFIG_XPA)
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index eab87c6beacb..f490f2fa0dca 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -217,12 +217,6 @@ static inline void pte_clear(struct mm_struct *mm,
 	set_pte(ptep, null);
 }
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, prot)	(pfn_pte(page_to_pfn(page), prot))
-
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 60c6ce7ff2dc..71bfb8c8c482 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -299,8 +299,6 @@ static inline pte_t __mk_pte(void *page, pgprot_t pgprot)
 	return pte;
 }
 
-#define mk_pte(page, pgprot) __mk_pte(page_address(page), (pgprot))
-
 #define mk_pte_phys(physpage, pgprot) \
 ({                                                                      \
 	pte_t __pte;                                                    \
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index babf65751e81..e85963fefa63 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -338,10 +338,6 @@ static inline pte_t pte_mkspecial(pte_t pte)	{ pte_val(pte) |= _PAGE_SPECIAL; re
 #endif
 
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
 #define __mk_pte(addr,pgprot) \
 ({									\
 	pte_t __pte;							\
@@ -351,8 +347,6 @@ static inline pte_t pte_mkspecial(pte_t pte)	{ pte_val(pte) |= _PAGE_SPECIAL; re
 	__pte;								\
 })
 
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
-
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
 {
 	pte_t pte;
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 2f72ad885332..93d77ad5a92f 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -53,9 +53,8 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 #define MAX_PTRS_PER_PGD PTRS_PER_PGD
 #endif
 
-/* Keep these as a macros to avoid include dependency mess */
+/* Keep this as a macro to avoid include dependency mess */
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 050fdc49b5ad..293a7776fe07 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -339,8 +339,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 	return __pte((pfn << _PAGE_PFN_SHIFT) | prot_val);
 }
 
-#define mk_pte(page, prot)       pfn_pte(page_to_pfn(page), prot)
-
 static inline int pte_present(pte_t pte)
 {
 	return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE));
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 3ee495b5171e..db932beabc87 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1449,11 +1449,6 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
 	return pte_mkyoung(__pte);
 }
 
-static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
-{
-	return mk_pte_phys(page_to_phys(page), pgprot);
-}
-
 #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
 #define p4d_index(address) (((address) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
 #define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index f939f1215232..db2e48366e0d 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -380,14 +380,6 @@ PTE_BIT_FUNC(low, mkspecial, |= _PAGE_SPECIAL);
 
 #define pgprot_noncached	 pgprot_writecombine
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- *
- * extern pte_t mk_pte(struct page *page, pgprot_t pgprot)
- */
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	pte.pte_low &= _PAGE_CHG_MASK;
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2b7f358762c1..ac115adb488f 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -225,7 +225,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 	BUILD_BUG_ON(_PAGE_SZBITS_4U != 0UL || _PAGE_SZBITS_4V != 0UL);
 	return __pte(paddr | pgprot_val(prot));
 }
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index 1647a7cc3fbf..cb1725c40e36 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -269,17 +269,11 @@ static inline pte_t pte_mkwrite_novma(pte_t pte)
 		((__pgprot((pgprot_val(prot) & ~_PAGE_CA_MASK) | \
 			   _PAGE_CA_BYPASS)))
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-
 #define PFN_PTE_SHIFT		PAGE_SHIFT
 #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)
 #define pte_same(a,b)		(pte_val(a) == pte_val(b))
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
-#define mk_pte(page, prot)	pfn_pte(page_to_pfn(page), prot)
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7b1068ddcbb7..3ef11ff3922f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1916,6 +1916,15 @@ static inline struct folio *pfn_folio(unsigned long pfn)
 	return page_folio(pfn_to_page(pfn));
 }
 
+#ifndef mk_pte
+#ifdef CONFIG_MMU
+static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
+{
+	return pfn_pte(page_to_pfn(page), pgprot);
+}
+#endif
+#endif
+
 /**
  * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA.
  * @folio: The folio.
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/7] sparc32: Remove custom definition of mk_pte()
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
  2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
  2025-02-17 19:08 ` [PATCH 2/7] mm: Introduce a common definition of mk_pte() Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-17 19:08 ` [PATCH 4/7] x86: " Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

Instead of defining pfn_pte() in terms of mk_pte(), make pfn_pte() the
base implementation.  That lets us use the generic definition of mk_pte().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/sparc/include/asm/pgtable_32.h | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 62bcafe38b1f..1454ebe91539 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -255,7 +255,11 @@ static inline pte_t pte_mkyoung(pte_t pte)
 }
 
 #define PFN_PTE_SHIFT			(PAGE_SHIFT - 4)
-#define pfn_pte(pfn, prot)		mk_pte(pfn_to_page(pfn), prot)
+
+static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
+{
+	return __pte((pfn << PFN_PTE_SHIFT) | pgprot_val(pgprot));
+}
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
@@ -272,15 +276,6 @@ static inline unsigned long pte_pfn(pte_t pte)
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
-{
-	return __pte((page_to_pfn(page) << (PAGE_SHIFT-4)) | pgprot_val(pgprot));
-}
-
 static inline pte_t mk_pte_phys(unsigned long page, pgprot_t pgprot)
 {
 	return __pte(((page) >> 4) | pgprot_val(pgprot));
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/7] x86: Remove custom definition of mk_pte()
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2025-02-17 19:08 ` [PATCH 3/7] sparc32: Remove custom " Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-19 20:53   ` Dave Hansen
  2025-02-17 19:08 ` [PATCH 5/7] um: " Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

Move the shadow stack check to pfn_pte() which lets us use the common
definition of mk_pte().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/x86/include/asm/pgtable.h | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 593f10aabd45..9f480bdafd20 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -784,6 +784,9 @@ static inline pgprotval_t check_pgprot(pgprot_t pgprot)
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
 	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
+	/* This bit combination is used to mark shadow stacks */
+	WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_DIRTY | _PAGE_RW)) ==
+			_PAGE_DIRTY);
 	pfn ^= protnone_mask(pgprot_val(pgprot));
 	pfn &= PTE_PFN_MASK;
 	return __pte(pfn | check_pgprot(pgprot));
@@ -1080,22 +1083,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  */
 #define pmd_page(pmd)	pfn_to_page(pmd_pfn(pmd))
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- *
- * (Currently stuck as a macro because of indirect forward reference
- * to linux/mm.h:page_to_nid())
- */
-#define mk_pte(page, pgprot)						  \
-({									  \
-	pgprot_t __pgprot = pgprot;					  \
-									  \
-	WARN_ON_ONCE((pgprot_val(__pgprot) & (_PAGE_DIRTY | _PAGE_RW)) == \
-		    _PAGE_DIRTY);					  \
-	pfn_pte(page_to_pfn(page), __pgprot);				  \
-})
-
 static inline int pmd_bad(pmd_t pmd)
 {
 	return (pmd_flags(pmd) & ~(_PAGE_USER | _PAGE_ACCESSED)) !=
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/7] um: Remove custom definition of mk_pte()
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2025-02-17 19:08 ` [PATCH 4/7] x86: " Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-17 19:08 ` [PATCH 6/7] mm: Make mk_pte() definition unconditional Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

Move the pfn_pte() definitions from the 2level and 4level files to the
generic pgtable.h and delete the custom definition of mk_pte() so that
we use the central definition.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/um/include/asm/pgtable-2level.h |  1 -
 arch/um/include/asm/pgtable-4level.h |  9 ---------
 arch/um/include/asm/pgtable.h        | 18 ++++++++----------
 3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/um/include/asm/pgtable-2level.h b/arch/um/include/asm/pgtable-2level.h
index ab0c8dd86564..14ec16f92ce4 100644
--- a/arch/um/include/asm/pgtable-2level.h
+++ b/arch/um/include/asm/pgtable-2level.h
@@ -37,7 +37,6 @@ static inline void pgd_mkuptodate(pgd_t pgd)	{ }
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
 
 #define pte_pfn(x) phys_to_pfn(pte_val(x))
-#define pfn_pte(pfn, prot) __pte(pfn_to_phys(pfn) | pgprot_val(prot))
 #define pfn_pmd(pfn, prot) __pmd(pfn_to_phys(pfn) | pgprot_val(prot))
 
 #endif
diff --git a/arch/um/include/asm/pgtable-4level.h b/arch/um/include/asm/pgtable-4level.h
index 0d279caee93c..7a271b7b83d2 100644
--- a/arch/um/include/asm/pgtable-4level.h
+++ b/arch/um/include/asm/pgtable-4level.h
@@ -102,15 +102,6 @@ static inline unsigned long pte_pfn(pte_t pte)
 	return phys_to_pfn(pte_val(pte));
 }
 
-static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
-{
-	pte_t pte;
-	phys_t phys = pfn_to_phys(page_nr);
-
-	pte_set_val(pte, phys, pgprot);
-	return pte;
-}
-
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pmd((page_nr << PAGE_SHIFT) | pgprot_val(pgprot));
diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index 5601ca98e8a6..ca2a519d53ab 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -260,19 +260,17 @@ static inline int pte_same(pte_t pte_a, pte_t pte_b)
 	return !((pte_val(pte_a) ^ pte_val(pte_b)) & ~_PAGE_NEEDSYNC);
 }
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-
 #define __virt_to_page(virt) phys_to_page(__pa(virt))
 #define virt_to_page(addr) __virt_to_page((const unsigned long) addr)
 
-#define mk_pte(page, pgprot) \
-	({ pte_t pte;					\
-							\
-	pte_set_val(pte, page_to_phys(page), (pgprot));	\
-	pte;})
+static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
+{
+	pte_t pte;
+
+	pte_set_val(pte, pfn_to_phys(pfn), pgprot);
+
+	return pte;
+}
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/7] mm: Make mk_pte() definition unconditional
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2025-02-17 19:08 ` [PATCH 5/7] um: " Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-18 10:32   ` David Hildenbrand
  2025-02-17 19:08 ` [PATCH 7/7] mm: Add folio_mk_pte() Matthew Wilcox (Oracle)
  2025-02-18 10:29 ` [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() David Hildenbrand
  7 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

All architectures now use the common mk_pte() definition, so we
can remove the condition.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3ef11ff3922f..62dccde9c561 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1916,14 +1916,12 @@ static inline struct folio *pfn_folio(unsigned long pfn)
 	return page_folio(pfn_to_page(pfn));
 }
 
-#ifndef mk_pte
 #ifdef CONFIG_MMU
 static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
 {
 	return pfn_pte(page_to_pfn(page), pgprot);
 }
 #endif
-#endif
 
 /**
  * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA.
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/7] mm: Add folio_mk_pte()
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2025-02-17 19:08 ` [PATCH 6/7] mm: Make mk_pte() definition unconditional Matthew Wilcox (Oracle)
@ 2025-02-17 19:08 ` Matthew Wilcox (Oracle)
  2025-02-18 10:33   ` David Hildenbrand
  2025-02-18 10:29 ` [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() David Hildenbrand
  7 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-02-17 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), linux-arch, x86, linux-s390, sparclinux,
	linux-um

Removes a cast from folio to page in four callers of mk_pte().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 15 +++++++++++++++
 mm/memory.c        |  6 +++---
 mm/userfaultfd.c   |  2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 62dccde9c561..b1e311bae6b7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1921,6 +1921,21 @@ static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
 {
 	return pfn_pte(page_to_pfn(page), pgprot);
 }
+
+/**
+ * folio_mk_pte - Create a PTE for this folio
+ * @folio: The folio to create a PTE for
+ * @pgprot: The page protection bits to use
+ *
+ * Create a page table entry for the first page of this folio.
+ * This is suitable for passing to set_ptes().
+ *
+ * Return: A page table entry suitable for mapping this folio.
+ */
+static inline pte_t folio_mk_pte(struct folio *folio, pgprot_t pgprot)
+{
+	return pfn_pte(folio_pfn(folio), pgprot);
+}
 #endif
 
 /**
diff --git a/mm/memory.c b/mm/memory.c
index 4330560eee55..72411a6d696f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -936,7 +936,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
 	rss[MM_ANONPAGES]++;
 
 	/* All done, just insert the new page copy in the child */
-	pte = mk_pte(&new_folio->page, dst_vma->vm_page_prot);
+	pte = folio_mk_pte(new_folio, dst_vma->vm_page_prot);
 	pte = maybe_mkwrite(pte_mkdirty(pte), dst_vma);
 	if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte)))
 		/* Uffd-wp needs to be delivered to dest pte as well */
@@ -3480,7 +3480,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			inc_mm_counter(mm, MM_ANONPAGES);
 		}
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
-		entry = mk_pte(&new_folio->page, vma->vm_page_prot);
+		entry = folio_mk_pte(new_folio, vma->vm_page_prot);
 		entry = pte_sw_mkyoung(entry);
 		if (unlikely(unshare)) {
 			if (pte_soft_dirty(vmf->orig_pte))
@@ -4892,7 +4892,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	 */
 	__folio_mark_uptodate(folio);
 
-	entry = mk_pte(&folio->page, vma->vm_page_prot);
+	entry = folio_mk_pte(folio, vma->vm_page_prot);
 	entry = pte_sw_mkyoung(entry);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry), vma);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index af3dfc3633db..507a9e3caec7 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1066,7 +1066,7 @@ static int move_present_pte(struct mm_struct *mm,
 	folio_move_anon_rmap(src_folio, dst_vma);
 	src_folio->index = linear_page_index(dst_vma, dst_addr);
 
-	orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
+	orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot);
 	/* Follow mremap() behavior and treat the entry dirty after the move */
 	orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] mm: Introduce a common definition of mk_pte()
  2025-02-17 19:08 ` [PATCH 2/7] mm: Introduce a common definition of mk_pte() Matthew Wilcox (Oracle)
@ 2025-02-18  8:15   ` Geert Uytterhoeven
  2025-02-18 10:32   ` David Hildenbrand
  2025-02-19  9:07   ` Alexander Gordeev
  2 siblings, 0 replies; 21+ messages in thread
From: Geert Uytterhoeven @ 2025-02-18  8:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

On Mon, 17 Feb 2025 at 20:09, Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> Most architectures simply call pfn_pte().  Centralise that as the normal
> definition and remove the definition of mk_pte() from the architectures
> which have either that exact definition or something similar.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

>  arch/m68k/include/asm/mcf_pgtable.h      | 6 ------
>  arch/m68k/include/asm/motorola_pgtable.h | 6 ------
>  arch/m68k/include/asm/sun3_pgtable.h     | 6 ------

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte()
  2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2025-02-17 19:08 ` [PATCH 7/7] mm: Add folio_mk_pte() Matthew Wilcox (Oracle)
@ 2025-02-18 10:29 ` David Hildenbrand
  7 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-02-18 10:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-arch, x86, linux-s390, sparclinux, linux-um

On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote:
> The intent is to add folio_mk_pte() to remove the conversion from folio
> to page necessary to call mk_pte().  Eventually we might end up removing
> mk_pte(), but that's not what's being proposed today.
> 
> I didn't want to add folio_mk_pte() to each architecture, and I didn't
> want to lose any optimisations that architectures have from their own
> implementation of mk_pte().  Fortunately, most architectures have by
> now turned their mk_pte() into a fairly bland variant of pfn_pte(),
> but s390 is different.
> 
> So patch 1 hoists the optimisation of calling pte_mkdirty() from s390
> to generic code.  I'd appreciate some eyes on this from mm people who
> understand this better than I do.  I originally had
> 
> -	if (write)
> +	if (write || folio_test_dirty(folio))
> 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> 
> and I think that broke COW under some circumstances that 01.org could
> reproduce and I couldn't.

If it's an anon folio that logic would be broken, yes (anon CoW). We do 
have can_change_pte_writable() that tells you when it is safe to upgrade 
write permissions for a PTE.

Looking at can_change_pte_writable(), I don't know if filesystems with 
writenotify might have a problem when setting the PTE dirty and allowing 
for write access, just because the folio is dirty.

So I assume that it would break fs-level CoW indeed.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
@ 2025-02-18 10:31   ` David Hildenbrand
  2025-02-18 16:20   ` Alexander Gordeev
  2025-02-19  7:33   ` Alexander Gordeev
  2 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-02-18 10:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-arch, x86, linux-s390, sparclinux, linux-um

On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote:
> If the first access to a folio is a read that is then followed by a
> write, we can save a page fault.  s390 implemented this in their
> mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty
> bits"), but other architectures can also benefit from this.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   arch/s390/include/asm/pgtable.h | 7 +------
>   mm/memory.c                     | 2 ++
>   2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 3ca5af4cfe43..3ee495b5171e 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
>   
>   static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
>   {
> -	unsigned long physpage = page_to_phys(page);
> -	pte_t __pte = mk_pte_phys(physpage, pgprot);
> -
> -	if (pte_write(__pte) && PageDirty(page))
> -		__pte = pte_mkdirty(__pte);
> -	return __pte;
> +	return mk_pte_phys(page_to_phys(page), pgprot);
>   }
>   
>   #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
> diff --git a/mm/memory.c b/mm/memory.c
> index 539c0f7c6d54..4330560eee55 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>   
>   	if (write)
>   		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +	else if (pte_write(entry) && folio_test_dirty(folio))
> +		entry = pte_mkdirty(entry);
>   	if (unlikely(vmf_orig_pte_uffd_wp(vmf)))
>   		entry = pte_mkuffd_wp(entry);
>   	/* copy-on-write page */

Yes, that looks sane

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] mm: Introduce a common definition of mk_pte()
  2025-02-17 19:08 ` [PATCH 2/7] mm: Introduce a common definition of mk_pte() Matthew Wilcox (Oracle)
  2025-02-18  8:15   ` Geert Uytterhoeven
@ 2025-02-18 10:32   ` David Hildenbrand
  2025-02-19  9:07   ` Alexander Gordeev
  2 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-02-18 10:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-arch, x86, linux-s390, sparclinux, linux-um

On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote:
> Most architectures simply call pfn_pte().  Centralise that as the normal
> definition and remove the definition of mk_pte() from the architectures
> which have either that exact definition or something similar.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/7] mm: Make mk_pte() definition unconditional
  2025-02-17 19:08 ` [PATCH 6/7] mm: Make mk_pte() definition unconditional Matthew Wilcox (Oracle)
@ 2025-02-18 10:32   ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-02-18 10:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-arch, x86, linux-s390, sparclinux, linux-um

On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote:
> All architectures now use the common mk_pte() definition, so we
> can remove the condition.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3ef11ff3922f..62dccde9c561 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1916,14 +1916,12 @@ static inline struct folio *pfn_folio(unsigned long pfn)
>   	return page_folio(pfn_to_page(pfn));
>   }
>   
> -#ifndef mk_pte
>   #ifdef CONFIG_MMU
>   static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
>   {
>   	return pfn_pte(page_to_pfn(page), pgprot);
>   }
>   #endif
> -#endif
>   
>   /**
>    * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] mm: Add folio_mk_pte()
  2025-02-17 19:08 ` [PATCH 7/7] mm: Add folio_mk_pte() Matthew Wilcox (Oracle)
@ 2025-02-18 10:33   ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-02-18 10:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-arch, x86, linux-s390, sparclinux, linux-um

On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote:
> Removes a cast from folio to page in four callers of mk_pte().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Yes, that looks good

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
  2025-02-18 10:31   ` David Hildenbrand
@ 2025-02-18 16:20   ` Alexander Gordeev
  2025-02-18 17:06     ` Matthew Wilcox
  2025-02-19  7:33   ` Alexander Gordeev
  2 siblings, 1 reply; 21+ messages in thread
From: Alexander Gordeev @ 2025-02-18 16:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

On Mon, Feb 17, 2025 at 07:08:28PM +0000, Matthew Wilcox (Oracle) wrote:

Hi Matthew,

> If the first access to a folio is a read that is then followed by a
> write, we can save a page fault.  s390 implemented this in their
> mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty
> bits"), but other architectures can also benefit from this.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  arch/s390/include/asm/pgtable.h | 7 +------
>  mm/memory.c                     | 2 ++
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 3ca5af4cfe43..3ee495b5171e 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
>  
>  static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
>  {
> -	unsigned long physpage = page_to_phys(page);
> -	pte_t __pte = mk_pte_phys(physpage, pgprot);
> -
> -	if (pte_write(__pte) && PageDirty(page))
> -		__pte = pte_mkdirty(__pte);
> -	return __pte;
> +	return mk_pte_phys(page_to_phys(page), pgprot);
>  }

With the above the implicit dirtifying of hugetlb PTEs (as result of
mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed:

static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
		bool try_mkwrite)
{
	...
	if (try_mkwrite && (vma->vm_flags & VM_WRITE)) {
		entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
					 vma->vm_page_prot)));
	} else {
		entry = huge_pte_wrprotect(mk_huge_pte(page,
					   vma->vm_page_prot));
	}
	...
}

What is your take on this?

>  #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
> diff --git a/mm/memory.c b/mm/memory.c
> index 539c0f7c6d54..4330560eee55 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>  
>  	if (write)
>  		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +	else if (pte_write(entry) && folio_test_dirty(folio))
> +		entry = pte_mkdirty(entry);
>  	if (unlikely(vmf_orig_pte_uffd_wp(vmf)))
>  		entry = pte_mkuffd_wp(entry);
>  	/* copy-on-write page */

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-18 16:20   ` Alexander Gordeev
@ 2025-02-18 17:06     ` Matthew Wilcox
  2025-02-19  7:27       ` Alexander Gordeev
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2025-02-18 17:06 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

On Tue, Feb 18, 2025 at 05:20:28PM +0100, Alexander Gordeev wrote:
> > +++ b/arch/s390/include/asm/pgtable.h
> > @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
> >  
> >  static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
> >  {
> > -	unsigned long physpage = page_to_phys(page);
> > -	pte_t __pte = mk_pte_phys(physpage, pgprot);
> > -
> > -	if (pte_write(__pte) && PageDirty(page))
> > -		__pte = pte_mkdirty(__pte);
> > -	return __pte;
> > +	return mk_pte_phys(page_to_phys(page), pgprot);
> >  }
> 
> With the above the implicit dirtifying of hugetlb PTEs (as result of
> mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed:
> 
> static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
> 		bool try_mkwrite)
> {
> 	...
> 	if (try_mkwrite && (vma->vm_flags & VM_WRITE)) {
> 		entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
> 					 vma->vm_page_prot)));
> 	} else {
> 		entry = huge_pte_wrprotect(mk_huge_pte(page,
> 					   vma->vm_page_prot));
> 	}

Took me a moment to spot how this was getting invoked; for anyone else
playing along, it's mk_huge_pte() which calls mk_pte().

But I'm not sure how you lose out on the PTE being marked dirty.  In
the first arm that you've quoted, the pte is made dirty anyway.  In the
second arm, it's being writeprotected, so marking it dirty isn't a
helpful thing to do because writing to it will cause a fault anyway?

I know s390 is a little different, so there's probably something I'm
missing.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-18 17:06     ` Matthew Wilcox
@ 2025-02-19  7:27       ` Alexander Gordeev
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Gordeev @ 2025-02-19  7:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

On Tue, Feb 18, 2025 at 05:06:38PM +0000, Matthew Wilcox wrote:
...
> > With the above the implicit dirtifying of hugetlb PTEs (as result of
> > mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed:
> > 
> > static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
> > 		bool try_mkwrite)
> > {
> > 	...
> > 	if (try_mkwrite && (vma->vm_flags & VM_WRITE)) {
> > 		entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
> > 					 vma->vm_page_prot)));
> > 	} else {
> > 		entry = huge_pte_wrprotect(mk_huge_pte(page,
> > 					   vma->vm_page_prot));
> > 	}
> 
> Took me a moment to spot how this was getting invoked; for anyone else
> playing along, it's mk_huge_pte() which calls mk_pte().
> 
> But I'm not sure how you lose out on the PTE being marked dirty.  In
> the first arm that you've quoted, the pte is made dirty anyway.  In the
> second arm, it's being writeprotected, so marking it dirty isn't a
> helpful thing to do because writing to it will cause a fault anyway?
> 
> I know s390 is a little different, so there's probably something I'm
> missing.

No, it is just me missing the obvious.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
  2025-02-18 10:31   ` David Hildenbrand
  2025-02-18 16:20   ` Alexander Gordeev
@ 2025-02-19  7:33   ` Alexander Gordeev
  2025-02-19  8:46     ` Alexander Gordeev
  2 siblings, 1 reply; 21+ messages in thread
From: Alexander Gordeev @ 2025-02-19  7:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

On Mon, Feb 17, 2025 at 07:08:28PM +0000, Matthew Wilcox (Oracle) wrote:
> If the first access to a folio is a read that is then followed by a
> write, we can save a page fault.  s390 implemented this in their
> mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty
> bits"), but other architectures can also benefit from this.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  arch/s390/include/asm/pgtable.h | 7 +------
>  mm/memory.c                     | 2 ++
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 3ca5af4cfe43..3ee495b5171e 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
>  
>  static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
>  {
> -	unsigned long physpage = page_to_phys(page);
> -	pte_t __pte = mk_pte_phys(physpage, pgprot);
> -
> -	if (pte_write(__pte) && PageDirty(page))
> -		__pte = pte_mkdirty(__pte);
> -	return __pte;
> +	return mk_pte_phys(page_to_phys(page), pgprot);
>  }
>  
>  #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
> diff --git a/mm/memory.c b/mm/memory.c
> index 539c0f7c6d54..4330560eee55 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>  
>  	if (write)
>  		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +	else if (pte_write(entry) && folio_test_dirty(folio))
> +		entry = pte_mkdirty(entry);
>  	if (unlikely(vmf_orig_pte_uffd_wp(vmf)))
>  		entry = pte_mkuffd_wp(entry);
>  	/* copy-on-write page */

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty
  2025-02-19  7:33   ` Alexander Gordeev
@ 2025-02-19  8:46     ` Alexander Gordeev
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Gordeev @ 2025-02-19  8:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>

Sorry, I meant for s390.
Can not judge the other archs impact.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] mm: Introduce a common definition of mk_pte()
  2025-02-17 19:08 ` [PATCH 2/7] mm: Introduce a common definition of mk_pte() Matthew Wilcox (Oracle)
  2025-02-18  8:15   ` Geert Uytterhoeven
  2025-02-18 10:32   ` David Hildenbrand
@ 2025-02-19  9:07   ` Alexander Gordeev
  2 siblings, 0 replies; 21+ messages in thread
From: Alexander Gordeev @ 2025-02-19  9:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, x86, linux-s390, sparclinux, linux-um

On Mon, Feb 17, 2025 at 07:08:29PM +0000, Matthew Wilcox (Oracle) wrote:
> Most architectures simply call pfn_pte().  Centralise that as the normal
> definition and remove the definition of mk_pte() from the architectures
> which have either that exact definition or something similar.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
...
>  arch/s390/include/asm/pgtable.h          | 5 -----
...
>  include/linux/mm.h                       | 9 +++++++++
>  22 files changed, 10 insertions(+), 99 deletions(-)
> 
...
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 3ee495b5171e..db932beabc87 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1449,11 +1449,6 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
>  	return pte_mkyoung(__pte);
>  }
>  
> -static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
> -{
> -	return mk_pte_phys(page_to_phys(page), pgprot);
> -}
> -
>  #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
>  #define p4d_index(address) (((address) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
>  #define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
...
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7b1068ddcbb7..3ef11ff3922f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1916,6 +1916,15 @@ static inline struct folio *pfn_folio(unsigned long pfn)
>  	return page_folio(pfn_to_page(pfn));
>  }
>  
> +#ifndef mk_pte
> +#ifdef CONFIG_MMU
> +static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
> +{
> +	return pfn_pte(page_to_pfn(page), pgprot);
> +}
> +#endif
> +#endif
> +
>  /**
>   * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA.
>   * @folio: The folio.

For s390:

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/7] x86: Remove custom definition of mk_pte()
  2025-02-17 19:08 ` [PATCH 4/7] x86: " Matthew Wilcox (Oracle)
@ 2025-02-19 20:53   ` Dave Hansen
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Hansen @ 2025-02-19 20:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-arch, x86, linux-s390, sparclinux, linux-um

On 2/17/25 11:08, Matthew Wilcox (Oracle) wrote:
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 593f10aabd45..9f480bdafd20 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -784,6 +784,9 @@ static inline pgprotval_t check_pgprot(pgprot_t pgprot)
>  static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
>  {
>  	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
> +	/* This bit combination is used to mark shadow stacks */
> +	WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_DIRTY | _PAGE_RW)) ==
> +			_PAGE_DIRTY);

Looks sane to me. Good riddance to unnecessary arch-specific code.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Just one note (in case anyone ever trips over that WARN_ON_ONCE()): This
is a problem with the existing code and with your patch, but the
'pgprot' going to mk_pte() or pfn_pte() probably can't come from a
hardware PTE that has the dirty bit clear.

Old, pre-shadow-stack hardware could race between:

	1. Software transitioning _PAGE_RW 1=>0
	2. The CPU page walker trying to set
	   _PAGE_DIRTY in response to a write

and end up with a Write=0,Dirty=1 PTE.

That doesn't happen for kernel memory because most or all of the PTEs
that the kernel establishes have _PAGE_DIRTY=1 so the page walker never
tries to set _PAGE_DIRTY. It's also generally some kind of a bug if one
CPU is in the kernel trying to write to a page while another is making
the page read-only.

Anyway, I can basically see cases where this warning might trip, but
it's very likely on older hardware when the kernel is doing something
else silly.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-02-19 21:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 19:08 [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() Matthew Wilcox (Oracle)
2025-02-17 19:08 ` [PATCH 1/7] mm: Set the pte dirty if the folio is already dirty Matthew Wilcox (Oracle)
2025-02-18 10:31   ` David Hildenbrand
2025-02-18 16:20   ` Alexander Gordeev
2025-02-18 17:06     ` Matthew Wilcox
2025-02-19  7:27       ` Alexander Gordeev
2025-02-19  7:33   ` Alexander Gordeev
2025-02-19  8:46     ` Alexander Gordeev
2025-02-17 19:08 ` [PATCH 2/7] mm: Introduce a common definition of mk_pte() Matthew Wilcox (Oracle)
2025-02-18  8:15   ` Geert Uytterhoeven
2025-02-18 10:32   ` David Hildenbrand
2025-02-19  9:07   ` Alexander Gordeev
2025-02-17 19:08 ` [PATCH 3/7] sparc32: Remove custom " Matthew Wilcox (Oracle)
2025-02-17 19:08 ` [PATCH 4/7] x86: " Matthew Wilcox (Oracle)
2025-02-19 20:53   ` Dave Hansen
2025-02-17 19:08 ` [PATCH 5/7] um: " Matthew Wilcox (Oracle)
2025-02-17 19:08 ` [PATCH 6/7] mm: Make mk_pte() definition unconditional Matthew Wilcox (Oracle)
2025-02-18 10:32   ` David Hildenbrand
2025-02-17 19:08 ` [PATCH 7/7] mm: Add folio_mk_pte() Matthew Wilcox (Oracle)
2025-02-18 10:33   ` David Hildenbrand
2025-02-18 10:29 ` [PATCH 0/7] Add folio_mk_pte() and simplify mk_pte() David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).