linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT
@ 2024-10-05 12:38 Anshuman Khandual
  2024-10-05 12:38 ` [PATCH 1/5] arm64/mm: Drop pte_mkhuge() Anshuman Khandual
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-05 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Oliver Upton, James Morse,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
	Mark Rutland, kvmarm, linux-kernel

Clearing PXD_TABLE_BIT i.e bit[1] on a page table entry always operates on
the assumption that subsequent PXD_VALID i.e bit[0] is set. That's because
bits[1:0]="01" makes a block mapping. So it is prudent to treat bits[1:0]
as a single register field, which should be updated as block or table etc.
Although mk_[pmd|pud]_sect_prot() helpers go to some extent in using these
PXD_TYPE_SECT macros, their usage is not really consistent else where.

This series removes these table bit clearing for block mapping creation and
eventually completely drops off those table macros.

This series applies on v6.12-rc1.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kvmarm@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (5):
  arm64/mm: Drop pte_mkhuge()
  arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
  arm64/ptdump: Test PMD_TYPE_MASK for block mapping
  KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping
  arm64/mm: Drop PXD_TABLE_BIT

 arch/arm64/include/asm/pgtable-hwdef.h |  6 +-----
 arch/arm64/include/asm/pgtable.h       | 20 ++++++++------------
 arch/arm64/kvm/ptdump.c                |  4 ++--
 arch/arm64/mm/hugetlbpage.c            |  2 +-
 arch/arm64/mm/ptdump.c                 |  8 ++++----
 5 files changed, 16 insertions(+), 24 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] arm64/mm: Drop pte_mkhuge()
  2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
@ 2024-10-05 12:38 ` Anshuman Khandual
  2024-10-09 13:20   ` Ryan Roberts
  2024-10-05 12:38 ` [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT] Anshuman Khandual
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-05 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Oliver Upton, James Morse,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
	Mark Rutland, kvmarm, linux-kernel

Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
platform provided pte_mkhuge(). But if any platform already provides custom
arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
This creates confusion as if both of these callbacks are being used in core
HugeTLB and required to be defined in the platform.

This changes arch_make_huge_pte() to create block mapping directly and also
drops off now redundant helper pte_mkhuge(), making things clear. Also this
changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 arch/arm64/include/asm/pgtable.h       | 5 -----
 arch/arm64/mm/hugetlbpage.c            | 2 +-
 3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index fd330c1db289..956a702cb532 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -158,6 +158,7 @@
 #define PTE_VALID		(_AT(pteval_t, 1) << 0)
 #define PTE_TYPE_MASK		(_AT(pteval_t, 3) << 0)
 #define PTE_TYPE_PAGE		(_AT(pteval_t, 3) << 0)
+#define PTE_TYPE_SECT		(_AT(pteval_t, 1) << 0)
 #define PTE_TABLE_BIT		(_AT(pteval_t, 1) << 1)
 #define PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
 #define PTE_RDONLY		(_AT(pteval_t, 1) << 7)		/* AP[2] */
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c329ea061dc9..fa4c32a9f572 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
 	}
 }
 
-/*
- * Huge pte definitions.
- */
-#define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
-
 /*
  * Hugetlb definitions.
  */
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 5f1e2103888b..5922c95630ad 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
 {
 	size_t pagesize = 1UL << shift;
 
-	entry = pte_mkhuge(entry);
+	entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);
 	if (pagesize == CONT_PTE_SIZE) {
 		entry = pte_mkcont(entry);
 	} else if (pagesize == CONT_PMD_SIZE) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
  2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
  2024-10-05 12:38 ` [PATCH 1/5] arm64/mm: Drop pte_mkhuge() Anshuman Khandual
@ 2024-10-05 12:38 ` Anshuman Khandual
  2024-10-09 13:28   ` Ryan Roberts
  2024-10-05 12:38 ` [PATCH 3/5] arm64/ptdump: Test PMD_TYPE_MASK for block mapping Anshuman Khandual
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-05 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Oliver Upton, James Morse,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
	Mark Rutland, kvmarm, linux-kernel

This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
, mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
for the symbol's availability preventing a build warning.

While here this also drops pmd_val() check from pmd_trans_huge() helper, as
pmd_present() returning true already ensures that pmd_val() cannot be false

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index fa4c32a9f572..45c49c5ace80 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
 
 static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
 {
-	return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
+	return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
 }
 
 static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
 {
-	return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
+	return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
 }
 
 static inline pte_t pte_swp_mkexclusive(pte_t pte)
@@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
  * THP definitions.
  */
 
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+				 PMD_TYPE_SECT)
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
-	return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
+	return pmd_present(pmd) && pmd_sect(pmd);
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
@@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
 
 #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
 
-#define pmd_mkhuge(pmd)		(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
+#define pmd_mkhuge(pmd)		(__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_devmap(pmd)		pte_devmap(pmd_pte(pmd))
@@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
 #define pud_mkyoung(pud)	pte_pud(pte_mkyoung(pud_pte(pud)))
 #define pud_write(pud)		pte_write(pud_pte(pud))
 
-#define pud_mkhuge(pud)		(__pud(pud_val(pud) & ~PUD_TABLE_BIT))
+#define pud_mkhuge(pud)		(__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
 
 #define __pud_to_phys(pud)	__pte_to_phys(pud_pte(pud))
 #define __phys_to_pud_val(phys)	__phys_to_pte_val(phys)
@@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 
 #define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 				 PMD_TYPE_TABLE)
-#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
-				 PMD_TYPE_SECT)
 #define pmd_leaf(pmd)		(pmd_present(pmd) && !pmd_table(pmd))
 #define pmd_bad(pmd)		(!pmd_table(pmd))
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] arm64/ptdump: Test PMD_TYPE_MASK for block mapping
  2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
  2024-10-05 12:38 ` [PATCH 1/5] arm64/mm: Drop pte_mkhuge() Anshuman Khandual
  2024-10-05 12:38 ` [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT] Anshuman Khandual
@ 2024-10-05 12:38 ` Anshuman Khandual
  2024-10-09 13:29   ` Ryan Roberts
  2024-10-05 12:38 ` [PATCH 4/5] KVM: arm64: ptdump: " Anshuman Khandual
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-05 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Oliver Upton, James Morse,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
	Mark Rutland, kvmarm, linux-kernel

arm64 block mapping requires given page table entry's bits[1:0] to be "01".
But now only bit[1] is checked to be clear, while also implicitly assuming
bit[0] to be set. This modifies ptdump to check both the relevant bits via
the mask PMD_TYPE_MASK and check the resultant value against PMD_TYPE_MASK.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/ptdump.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 264c5f9b97d8..8cec0da4cff2 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -80,10 +80,10 @@ static const struct ptdump_prot_bits pte_bits[] = {
 		.set	= "CON",
 		.clear	= "   ",
 	}, {
-		.mask	= PTE_TABLE_BIT,
-		.val	= PTE_TABLE_BIT,
-		.set	= "   ",
-		.clear	= "BLK",
+		.mask	= PMD_TYPE_MASK,
+		.val	= PMD_TYPE_SECT,
+		.set	= "BLK",
+		.clear	= "   ",
 	}, {
 		.mask	= PTE_UXN,
 		.val	= PTE_UXN,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping
  2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
                   ` (2 preceding siblings ...)
  2024-10-05 12:38 ` [PATCH 3/5] arm64/ptdump: Test PMD_TYPE_MASK for block mapping Anshuman Khandual
@ 2024-10-05 12:38 ` Anshuman Khandual
  2024-10-09 13:30   ` Ryan Roberts
  2024-10-05 12:38 ` [PATCH 5/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
  2024-10-09 13:32 ` [PATCH 0/5] " Ryan Roberts
  5 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-05 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Oliver Upton, James Morse,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
	Mark Rutland, kvmarm, linux-kernel

This changes stage-2 ptdump making it test given page table entries against
PMD_TYPE_SECT on PMD_TYPE_MASK bits for a block mapping, as is the case for
stage-1 ptdump.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/kvm/ptdump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/ptdump.c b/arch/arm64/kvm/ptdump.c
index e4a342e903e2..098416d7e5c2 100644
--- a/arch/arm64/kvm/ptdump.c
+++ b/arch/arm64/kvm/ptdump.c
@@ -52,8 +52,8 @@ static const struct ptdump_prot_bits stage2_pte_bits[] = {
 		.set	= "AF",
 		.clear	= "  ",
 	}, {
-		.mask	= PTE_TABLE_BIT | PTE_VALID,
-		.val	= PTE_VALID,
+		.mask	= PMD_TYPE_MASK,
+		.val	= PMD_TYPE_SECT,
 		.set	= "BLK",
 		.clear	= "   ",
 	},
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] arm64/mm: Drop PXD_TABLE_BIT
  2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
                   ` (3 preceding siblings ...)
  2024-10-05 12:38 ` [PATCH 4/5] KVM: arm64: ptdump: " Anshuman Khandual
@ 2024-10-05 12:38 ` Anshuman Khandual
  2024-10-09 13:32 ` [PATCH 0/5] " Ryan Roberts
  5 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-05 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Marc Zyngier, Oliver Upton, James Morse,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
	Mark Rutland, kvmarm, linux-kernel

This just drops table bit position markers for all page table levels, which
are not used any more.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 956a702cb532..a524a8bb570f 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -97,7 +97,6 @@
  * Level -1 descriptor (PGD).
  */
 #define PGD_TYPE_TABLE		(_AT(pgdval_t, 3) << 0)
-#define PGD_TABLE_BIT		(_AT(pgdval_t, 1) << 1)
 #define PGD_TYPE_MASK		(_AT(pgdval_t, 3) << 0)
 #define PGD_TABLE_PXN		(_AT(pgdval_t, 1) << 59)
 #define PGD_TABLE_UXN		(_AT(pgdval_t, 1) << 60)
@@ -106,7 +105,6 @@
  * Level 0 descriptor (P4D).
  */
 #define P4D_TYPE_TABLE		(_AT(p4dval_t, 3) << 0)
-#define P4D_TABLE_BIT		(_AT(p4dval_t, 1) << 1)
 #define P4D_TYPE_MASK		(_AT(p4dval_t, 3) << 0)
 #define P4D_TYPE_SECT		(_AT(p4dval_t, 1) << 0)
 #define P4D_SECT_RDONLY		(_AT(p4dval_t, 1) << 7)		/* AP[2] */
@@ -117,7 +115,6 @@
  * Level 1 descriptor (PUD).
  */
 #define PUD_TYPE_TABLE		(_AT(pudval_t, 3) << 0)
-#define PUD_TABLE_BIT		(_AT(pudval_t, 1) << 1)
 #define PUD_TYPE_MASK		(_AT(pudval_t, 3) << 0)
 #define PUD_TYPE_SECT		(_AT(pudval_t, 1) << 0)
 #define PUD_SECT_RDONLY		(_AT(pudval_t, 1) << 7)		/* AP[2] */
@@ -130,7 +127,6 @@
 #define PMD_TYPE_MASK		(_AT(pmdval_t, 3) << 0)
 #define PMD_TYPE_TABLE		(_AT(pmdval_t, 3) << 0)
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
-#define PMD_TABLE_BIT		(_AT(pmdval_t, 1) << 1)
 
 /*
  * Section
@@ -159,7 +155,6 @@
 #define PTE_TYPE_MASK		(_AT(pteval_t, 3) << 0)
 #define PTE_TYPE_PAGE		(_AT(pteval_t, 3) << 0)
 #define PTE_TYPE_SECT		(_AT(pteval_t, 1) << 0)
-#define PTE_TABLE_BIT		(_AT(pteval_t, 1) << 1)
 #define PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
 #define PTE_RDONLY		(_AT(pteval_t, 1) << 7)		/* AP[2] */
 #define PTE_SHARED		(_AT(pteval_t, 3) << 8)		/* SH[1:0], inner shareable */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] arm64/mm: Drop pte_mkhuge()
  2024-10-05 12:38 ` [PATCH 1/5] arm64/mm: Drop pte_mkhuge() Anshuman Khandual
@ 2024-10-09 13:20   ` Ryan Roberts
  2024-10-14  8:59     ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2024-10-09 13:20 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 05/10/2024 13:38, Anshuman Khandual wrote:
> Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
> platform provided pte_mkhuge(). But if any platform already provides custom
> arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
> defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
> This creates confusion as if both of these callbacks are being used in core
> HugeTLB and required to be defined in the platform.
> 
> This changes arch_make_huge_pte() to create block mapping directly and also
> drops off now redundant helper pte_mkhuge(), making things clear. Also this
> changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
> to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>  arch/arm64/include/asm/pgtable.h       | 5 -----
>  arch/arm64/mm/hugetlbpage.c            | 2 +-
>  3 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index fd330c1db289..956a702cb532 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -158,6 +158,7 @@
>  #define PTE_VALID		(_AT(pteval_t, 1) << 0)
>  #define PTE_TYPE_MASK		(_AT(pteval_t, 3) << 0)
>  #define PTE_TYPE_PAGE		(_AT(pteval_t, 3) << 0)
> +#define PTE_TYPE_SECT		(_AT(pteval_t, 1) << 0)
>  #define PTE_TABLE_BIT		(_AT(pteval_t, 1) << 1)
>  #define PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
>  #define PTE_RDONLY		(_AT(pteval_t, 1) << 7)		/* AP[2] */
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index c329ea061dc9..fa4c32a9f572 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
>  	}
>  }
>  
> -/*
> - * Huge pte definitions.
> - */
> -#define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
> -
>  /*
>   * Hugetlb definitions.
>   */
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 5f1e2103888b..5922c95630ad 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>  {
>  	size_t pagesize = 1UL << shift;
>  
> -	entry = pte_mkhuge(entry);
> +	entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);

I think there may be an existing bug here; if pagesize == CONT_PTE_SIZE, then
entry will be placed in the level 3 table. In this case, shouldn't bit 1 remain
set, because at level 3, a page mapping is denoted by bits[1:0] = 3 ? Currently
its being unconditionally cleared.

>  	if (pagesize == CONT_PTE_SIZE) {
>  		entry = pte_mkcont(entry);
>  	} else if (pagesize == CONT_PMD_SIZE) {



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
  2024-10-05 12:38 ` [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT] Anshuman Khandual
@ 2024-10-09 13:28   ` Ryan Roberts
  2024-10-14 10:48     ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2024-10-09 13:28 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 05/10/2024 13:38, Anshuman Khandual wrote:
> This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
> , mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
> instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
> for the symbol's availability preventing a build warning.
> 
> While here this also drops pmd_val() check from pmd_trans_huge() helper, as
> pmd_present() returning true already ensures that pmd_val() cannot be false
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index fa4c32a9f572..45c49c5ace80 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
>  
>  static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
>  {
> -	return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
> +	return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
>  }
>  
>  static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
>  {
> -	return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
> +	return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>  }
>  
>  static inline pte_t pte_swp_mkexclusive(pte_t pte)
> @@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
>   * THP definitions.
>   */
>  
> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +				 PMD_TYPE_SECT)
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline int pmd_trans_huge(pmd_t pmd)
>  {
> -	return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
> +	return pmd_present(pmd) && pmd_sect(pmd);

Bug? Prevously we would have returned true for a "present-invalid" PMD block
mapping - that's one which is formatted as a PMD block mapping except the
PTE_VALID bit is clear and PTE_PRESENT_INVALID is set. But now, due to
pmd_sect() testing VALID is set (via PMD_TYPE_SECT), we no longer return true in
this case.

>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> @@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>  
>  #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
>  
> -#define pmd_mkhuge(pmd)		(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
> +#define pmd_mkhuge(pmd)		(__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))

I'm not sure if this also suffers from a similar problem? Is it possible that a
present-invalid pmd would be passed to pmd_mkhuge()? If so, then we are now
incorrectly setting the PTE_VALID bit.

>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define pmd_devmap(pmd)		pte_devmap(pmd_pte(pmd))
> @@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
>  #define pud_mkyoung(pud)	pte_pud(pte_mkyoung(pud_pte(pud)))
>  #define pud_write(pud)		pte_write(pud_pte(pud))
>  
> -#define pud_mkhuge(pud)		(__pud(pud_val(pud) & ~PUD_TABLE_BIT))
> +#define pud_mkhuge(pud)		(__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
>  
>  #define __pud_to_phys(pud)	__pte_to_phys(pud_pte(pud))
>  #define __phys_to_pud_val(phys)	__phys_to_pte_val(phys)
> @@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  
>  #define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>  				 PMD_TYPE_TABLE)
> -#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> -				 PMD_TYPE_SECT)
>  #define pmd_leaf(pmd)		(pmd_present(pmd) && !pmd_table(pmd))
>  #define pmd_bad(pmd)		(!pmd_table(pmd))
>  



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] arm64/ptdump: Test PMD_TYPE_MASK for block mapping
  2024-10-05 12:38 ` [PATCH 3/5] arm64/ptdump: Test PMD_TYPE_MASK for block mapping Anshuman Khandual
@ 2024-10-09 13:29   ` Ryan Roberts
  0 siblings, 0 replies; 18+ messages in thread
From: Ryan Roberts @ 2024-10-09 13:29 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 05/10/2024 13:38, Anshuman Khandual wrote:
> arm64 block mapping requires given page table entry's bits[1:0] to be "01".
> But now only bit[1] is checked to be clear, while also implicitly assuming
> bit[0] to be set. This modifies ptdump to check both the relevant bits via
> the mask PMD_TYPE_MASK and check the resultant value against PMD_TYPE_MASK.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  arch/arm64/mm/ptdump.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index 264c5f9b97d8..8cec0da4cff2 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -80,10 +80,10 @@ static const struct ptdump_prot_bits pte_bits[] = {
>  		.set	= "CON",
>  		.clear	= "   ",
>  	}, {
> -		.mask	= PTE_TABLE_BIT,
> -		.val	= PTE_TABLE_BIT,
> -		.set	= "   ",
> -		.clear	= "BLK",
> +		.mask	= PMD_TYPE_MASK,
> +		.val	= PMD_TYPE_SECT,
> +		.set	= "BLK",
> +		.clear	= "   ",
>  	}, {
>  		.mask	= PTE_UXN,
>  		.val	= PTE_UXN,



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping
  2024-10-05 12:38 ` [PATCH 4/5] KVM: arm64: ptdump: " Anshuman Khandual
@ 2024-10-09 13:30   ` Ryan Roberts
  0 siblings, 0 replies; 18+ messages in thread
From: Ryan Roberts @ 2024-10-09 13:30 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 05/10/2024 13:38, Anshuman Khandual wrote:
> This changes stage-2 ptdump making it test given page table entries against
> PMD_TYPE_SECT on PMD_TYPE_MASK bits for a block mapping, as is the case for
> stage-1 ptdump.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: James Morse <james.morse@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: kvmarm@lists.linux.dev
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  arch/arm64/kvm/ptdump.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/ptdump.c b/arch/arm64/kvm/ptdump.c
> index e4a342e903e2..098416d7e5c2 100644
> --- a/arch/arm64/kvm/ptdump.c
> +++ b/arch/arm64/kvm/ptdump.c
> @@ -52,8 +52,8 @@ static const struct ptdump_prot_bits stage2_pte_bits[] = {
>  		.set	= "AF",
>  		.clear	= "  ",
>  	}, {
> -		.mask	= PTE_TABLE_BIT | PTE_VALID,
> -		.val	= PTE_VALID,
> +		.mask	= PMD_TYPE_MASK,
> +		.val	= PMD_TYPE_SECT,
>  		.set	= "BLK",
>  		.clear	= "   ",
>  	},



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT
  2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
                   ` (4 preceding siblings ...)
  2024-10-05 12:38 ` [PATCH 5/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
@ 2024-10-09 13:32 ` Ryan Roberts
  2024-10-15  6:47   ` Anshuman Khandual
  5 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2024-10-09 13:32 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 05/10/2024 13:38, Anshuman Khandual wrote:
> Clearing PXD_TABLE_BIT i.e bit[1] on a page table entry always operates on
> the assumption that subsequent PXD_VALID i.e bit[0] is set. That's because
> bits[1:0]="01" makes a block mapping. So it is prudent to treat bits[1:0]
> as a single register field, which should be updated as block or table etc.
> Although mk_[pmd|pud]_sect_prot() helpers go to some extent in using these
> PXD_TYPE_SECT macros, their usage is not really consistent else where.
> 
> This series removes these table bit clearing for block mapping creation and
> eventually completely drops off those table macros.

Given the issue I just noticed in patch 2, I'm not sure if it's going to be
practical to remove the table bit after all? Sorry I didn't spot this before.

> 
> This series applies on v6.12-rc1.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: James Morse <james.morse@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: kvmarm@lists.linux.dev
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> 
> Anshuman Khandual (5):
>   arm64/mm: Drop pte_mkhuge()
>   arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
>   arm64/ptdump: Test PMD_TYPE_MASK for block mapping
>   KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping
>   arm64/mm: Drop PXD_TABLE_BIT
> 
>  arch/arm64/include/asm/pgtable-hwdef.h |  6 +-----
>  arch/arm64/include/asm/pgtable.h       | 20 ++++++++------------
>  arch/arm64/kvm/ptdump.c                |  4 ++--
>  arch/arm64/mm/hugetlbpage.c            |  2 +-
>  arch/arm64/mm/ptdump.c                 |  8 ++++----
>  5 files changed, 16 insertions(+), 24 deletions(-)
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] arm64/mm: Drop pte_mkhuge()
  2024-10-09 13:20   ` Ryan Roberts
@ 2024-10-14  8:59     ` Anshuman Khandual
  2024-10-14 11:08       ` Ryan Roberts
  0 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-14  8:59 UTC (permalink / raw)
  To: Ryan Roberts, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel



On 10/9/24 18:50, Ryan Roberts wrote:
> On 05/10/2024 13:38, Anshuman Khandual wrote:
>> Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
>> platform provided pte_mkhuge(). But if any platform already provides custom
>> arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
>> defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
>> This creates confusion as if both of these callbacks are being used in core
>> HugeTLB and required to be defined in the platform.
>>
>> This changes arch_make_huge_pte() to create block mapping directly and also
>> drops off now redundant helper pte_mkhuge(), making things clear. Also this
>> changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
>> to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>  arch/arm64/include/asm/pgtable.h       | 5 -----
>>  arch/arm64/mm/hugetlbpage.c            | 2 +-
>>  3 files changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index fd330c1db289..956a702cb532 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -158,6 +158,7 @@
>>  #define PTE_VALID		(_AT(pteval_t, 1) << 0)
>>  #define PTE_TYPE_MASK		(_AT(pteval_t, 3) << 0)
>>  #define PTE_TYPE_PAGE		(_AT(pteval_t, 3) << 0)
>> +#define PTE_TYPE_SECT		(_AT(pteval_t, 1) << 0)
>>  #define PTE_TABLE_BIT		(_AT(pteval_t, 1) << 1)
>>  #define PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
>>  #define PTE_RDONLY		(_AT(pteval_t, 1) << 7)		/* AP[2] */
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index c329ea061dc9..fa4c32a9f572 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
>>  	}
>>  }
>>  
>> -/*
>> - * Huge pte definitions.
>> - */
>> -#define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>> -
>>  /*
>>   * Hugetlb definitions.
>>   */
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 5f1e2103888b..5922c95630ad 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>>  {
>>  	size_t pagesize = 1UL << shift;
>>  
>> -	entry = pte_mkhuge(entry);
>> +	entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);
> 
> I think there may be an existing bug here; if pagesize == CONT_PTE_SIZE, then
> entry will be placed in the level 3 table. In this case, shouldn't bit 1 remain
> set, because at level 3, a page mapping is denoted by bits[1:0] = 3 ? Currently
> its being unconditionally cleared.

That's not a problem, pte_mkcont() brings back both the bits
via PTE_TYPE_PAGE along with PTE_CONT.

        if (pagesize == CONT_PTE_SIZE) {
                entry = pte_mkcont(entry);
        } else if (pagesize == CONT_PMD_SIZE) {
                entry = pmd_pte(pmd_mkcont(pte_pmd(entry)));
        } else if (pagesize != PUD_SIZE && pagesize != PMD_SIZE) {
                pr_warn("%s: unrecognized huge page size 0x%lx\n",
                        __func__, pagesize);
        }

static inline pte_t pte_mkcont(pte_t pte)
{
	pte = set_pte_bit(pte, __pgprot(PTE_CONT));
      	return set_pte_bit(pte, __pgprot(PTE_TYPE_PAGE));
}

Although the same is not required for CONT_PMD_SIZE size huge
pages where only PTE_CONT is enough.

static inline pmd_t pmd_mkcont(pmd_t pmd)
{
        return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
}
 
>>  	if (pagesize == CONT_PTE_SIZE) {
>>  		entry = pte_mkcont(entry);
>>  	} else if (pagesize == CONT_PMD_SIZE) {
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
  2024-10-09 13:28   ` Ryan Roberts
@ 2024-10-14 10:48     ` Anshuman Khandual
  2024-10-14 11:33       ` Ryan Roberts
  0 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-14 10:48 UTC (permalink / raw)
  To: Ryan Roberts, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel



On 10/9/24 18:58, Ryan Roberts wrote:
> On 05/10/2024 13:38, Anshuman Khandual wrote:
>> This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
>> , mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
>> instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
>> for the symbol's availability preventing a build warning.
>>
>> While here this also drops pmd_val() check from pmd_trans_huge() helper, as
>> pmd_present() returning true already ensures that pmd_val() cannot be false
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index fa4c32a9f572..45c49c5ace80 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
>>  
>>  static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
>>  {
>> -	return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
>> +	return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
>>  }
>>  
>>  static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
>>  {
>> -	return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
>> +	return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>>  }
>>  
>>  static inline pte_t pte_swp_mkexclusive(pte_t pte)
>> @@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
>>   * THP definitions.
>>   */
>>  
>> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +				 PMD_TYPE_SECT)
>> +
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  static inline int pmd_trans_huge(pmd_t pmd)
>>  {
>> -	return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>> +	return pmd_present(pmd) && pmd_sect(pmd);
> 
> Bug? Prevously we would have returned true for a "present-invalid" PMD block
> mapping - that's one which is formatted as a PMD block mapping except the
> PTE_VALID bit is clear and PTE_PRESENT_INVALID is set. But now, due to
> pmd_sect() testing VALID is set (via PMD_TYPE_SECT), we no longer return true in
> this case.

Agreed, that will be problematic but the situation can be rectified by decoupling
pmd_present_invalid() from pte_present_invalid() by checking for both last bits
instead of just the valid bit against PTE_PRESENT_INVALID.

#define pmd_sect(pmd)          ((pmd_val(pmd) & PMD_TYPE_MASK) == \
                                PMD_TYPE_SECT)

#define pmd_present_invalid(pmd) \
       ((pmd_val(pmd) & (PMD_TYPE_MASK | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID)

 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
	return pmd_sect(pmd) || pmd_present_invalid(pmd);
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */

> 
>>  }
>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>  
>> @@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>  
>>  #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
>>  
>> -#define pmd_mkhuge(pmd)		(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
>> +#define pmd_mkhuge(pmd)		(__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> 
> I'm not sure if this also suffers from a similar problem? Is it possible that a
> present-invalid pmd would be passed to pmd_mkhuge()? If so, then we are now
> incorrectly setting the PTE_VALID bit.
pmd_mkhuge() converts a regular pmd into a huge page and on arm64
creating a huge page also involves setting PTE_VALID. Why would a
present-invalid pmd is passed into pmd_mkhuge() without intending
to make a huge entry ?

There just two generic use cases for pmd_mkhuge().

insert_pfn_pmd
	   entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));

set_huge_zero_folio
        entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
        entry = pmd_mkhuge(entry);

As instances in mm/debug_vm_pgtable.c, pmd_mkinvalid() should be
called on a PMD entry after pmd_mkhuge() not the other way around.

> 
>>  
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  #define pmd_devmap(pmd)		pte_devmap(pmd_pte(pmd))
>> @@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
>>  #define pud_mkyoung(pud)	pte_pud(pte_mkyoung(pud_pte(pud)))
>>  #define pud_write(pud)		pte_write(pud_pte(pud))
>>  
>> -#define pud_mkhuge(pud)		(__pud(pud_val(pud) & ~PUD_TABLE_BIT))
>> +#define pud_mkhuge(pud)		(__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
>>  
>>  #define __pud_to_phys(pud)	__pte_to_phys(pud_pte(pud))
>>  #define __phys_to_pud_val(phys)	__phys_to_pte_val(phys)
>> @@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>  
>>  #define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>  				 PMD_TYPE_TABLE)
>> -#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> -				 PMD_TYPE_SECT)
>>  #define pmd_leaf(pmd)		(pmd_present(pmd) && !pmd_table(pmd))
>>  #define pmd_bad(pmd)		(!pmd_table(pmd))
>>  
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] arm64/mm: Drop pte_mkhuge()
  2024-10-14  8:59     ` Anshuman Khandual
@ 2024-10-14 11:08       ` Ryan Roberts
  2024-10-15  4:23         ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2024-10-14 11:08 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 14/10/2024 09:59, Anshuman Khandual wrote:
> 
> 
> On 10/9/24 18:50, Ryan Roberts wrote:
>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>> Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
>>> platform provided pte_mkhuge(). But if any platform already provides custom
>>> arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
>>> defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
>>> This creates confusion as if both of these callbacks are being used in core
>>> HugeTLB and required to be defined in the platform.
>>>
>>> This changes arch_make_huge_pte() to create block mapping directly and also
>>> drops off now redundant helper pte_mkhuge(), making things clear. Also this
>>> changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
>>> to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>>  arch/arm64/include/asm/pgtable.h       | 5 -----
>>>  arch/arm64/mm/hugetlbpage.c            | 2 +-
>>>  3 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>>> index fd330c1db289..956a702cb532 100644
>>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>>> @@ -158,6 +158,7 @@
>>>  #define PTE_VALID		(_AT(pteval_t, 1) << 0)
>>>  #define PTE_TYPE_MASK		(_AT(pteval_t, 3) << 0)
>>>  #define PTE_TYPE_PAGE		(_AT(pteval_t, 3) << 0)
>>> +#define PTE_TYPE_SECT		(_AT(pteval_t, 1) << 0)
>>>  #define PTE_TABLE_BIT		(_AT(pteval_t, 1) << 1)
>>>  #define PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
>>>  #define PTE_RDONLY		(_AT(pteval_t, 1) << 7)		/* AP[2] */
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index c329ea061dc9..fa4c32a9f572 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
>>>  	}
>>>  }
>>>  
>>> -/*
>>> - * Huge pte definitions.
>>> - */
>>> -#define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>>> -
>>>  /*
>>>   * Hugetlb definitions.
>>>   */
>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>> index 5f1e2103888b..5922c95630ad 100644
>>> --- a/arch/arm64/mm/hugetlbpage.c
>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>> @@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>>>  {
>>>  	size_t pagesize = 1UL << shift;
>>>  
>>> -	entry = pte_mkhuge(entry);
>>> +	entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);
>>
>> I think there may be an existing bug here; if pagesize == CONT_PTE_SIZE, then
>> entry will be placed in the level 3 table. In this case, shouldn't bit 1 remain
>> set, because at level 3, a page mapping is denoted by bits[1:0] = 3 ? Currently
>> its being unconditionally cleared.
> 
> That's not a problem, pte_mkcont() brings back both the bits
> via PTE_TYPE_PAGE along with PTE_CONT.
> 
>         if (pagesize == CONT_PTE_SIZE) {
>                 entry = pte_mkcont(entry);
>         } else if (pagesize == CONT_PMD_SIZE) {
>                 entry = pmd_pte(pmd_mkcont(pte_pmd(entry)));
>         } else if (pagesize != PUD_SIZE && pagesize != PMD_SIZE) {
>                 pr_warn("%s: unrecognized huge page size 0x%lx\n",
>                         __func__, pagesize);
>         }
> 
> static inline pte_t pte_mkcont(pte_t pte)
> {
> 	pte = set_pte_bit(pte, __pgprot(PTE_CONT));
>       	return set_pte_bit(pte, __pgprot(PTE_TYPE_PAGE));

Oh wow, that's pretty hacky. Good job we never call pte_mkcont() on a
PTE_PRESENT_INVALID PTE. This would turn it valid again.

> }
> 
> Although the same is not required for CONT_PMD_SIZE size huge
> pages where only PTE_CONT is enough.
> 
> static inline pmd_t pmd_mkcont(pmd_t pmd)
> {
>         return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
> }
>  
>>>  	if (pagesize == CONT_PTE_SIZE) {
>>>  		entry = pte_mkcont(entry);
>>>  	} else if (pagesize == CONT_PMD_SIZE) {
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
  2024-10-14 10:48     ` Anshuman Khandual
@ 2024-10-14 11:33       ` Ryan Roberts
  2024-10-15  3:50         ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2024-10-14 11:33 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel

On 14/10/2024 11:48, Anshuman Khandual wrote:
> 
> 
> On 10/9/24 18:58, Ryan Roberts wrote:
>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>> This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
>>> , mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
>>> instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
>>> for the symbol's availability preventing a build warning.
>>>
>>> While here this also drops pmd_val() check from pmd_trans_huge() helper, as
>>> pmd_present() returning true already ensures that pmd_val() cannot be false
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>  arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
>>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index fa4c32a9f572..45c49c5ace80 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
>>>  
>>>  static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
>>>  {
>>> -	return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
>>> +	return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
>>>  }
>>>  
>>>  static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
>>>  {
>>> -	return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
>>> +	return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>>>  }
>>>  
>>>  static inline pte_t pte_swp_mkexclusive(pte_t pte)
>>> @@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
>>>   * THP definitions.
>>>   */
>>>  
>>> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>> +				 PMD_TYPE_SECT)
>>> +
>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>  static inline int pmd_trans_huge(pmd_t pmd)
>>>  {
>>> -	return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>>> +	return pmd_present(pmd) && pmd_sect(pmd);
>>
>> Bug? Prevously we would have returned true for a "present-invalid" PMD block
>> mapping - that's one which is formatted as a PMD block mapping except the
>> PTE_VALID bit is clear and PTE_PRESENT_INVALID is set. But now, due to
>> pmd_sect() testing VALID is set (via PMD_TYPE_SECT), we no longer return true in
>> this case.
> 
> Agreed, that will be problematic but the situation can be rectified by decoupling
> pmd_present_invalid() from pte_present_invalid() by checking for both last bits
> instead of just the valid bit against PTE_PRESENT_INVALID.
> 
> #define pmd_sect(pmd)          ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>                                 PMD_TYPE_SECT)

I know this is pre-existing, but the fact that this depends on PMD_VALID being
set feels like something waiting to bite us. From the SW's PoV, we should get
the same answer regardless of whether PMD_VALID xor PTE_PRESENT_INVALID is set.
I know there is nobody depending on that right now, but it feels like a bug
waiting to happen. I'm not sure how you would fix that without having the SW
explcitly know about the table bit's existance though.

> 
> #define pmd_present_invalid(pmd) \
>        ((pmd_val(pmd) & (PMD_TYPE_MASK | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID)

I read this as "if the type field is 0 and PTE_PRESENT_INVALID is 1 then it's
present-invalid". That doesn't really feel any better to me than the code
knowing there is a table bit. What's the benefit of doing this vs what the code
already does? It all feels quite hacky to me.

> 
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline int pmd_trans_huge(pmd_t pmd)
>  {
> 	return pmd_sect(pmd) || pmd_present_invalid(pmd);
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >>
>>>  }
>>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>  
>>> @@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>  
>>>  #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
>>>  
>>> -#define pmd_mkhuge(pmd)		(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
>>> +#define pmd_mkhuge(pmd)		(__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>>
>> I'm not sure if this also suffers from a similar problem? Is it possible that a
>> present-invalid pmd would be passed to pmd_mkhuge()? If so, then we are now
>> incorrectly setting the PTE_VALID bit.
> pmd_mkhuge() converts a regular pmd into a huge page and on arm64
> creating a huge page also involves setting PTE_VALID. Why would a
> present-invalid pmd is passed into pmd_mkhuge() without intending
> to make a huge entry ?
> 
> There just two generic use cases for pmd_mkhuge().
> 
> insert_pfn_pmd
> 	   entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
> 
> set_huge_zero_folio
>         entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
>         entry = pmd_mkhuge(entry);
> 
> As instances in mm/debug_vm_pgtable.c, pmd_mkinvalid() should be
> called on a PMD entry after pmd_mkhuge() not the other way around.

I guess it depends on your perspective. I agree there is no issue today. But
from the core-mm's PoV, a present-invalid PMD should be indistinguishable from a
present (-valid) one.


> 
>>
>>>  
>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>  #define pmd_devmap(pmd)		pte_devmap(pmd_pte(pmd))
>>> @@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
>>>  #define pud_mkyoung(pud)	pte_pud(pte_mkyoung(pud_pte(pud)))
>>>  #define pud_write(pud)		pte_write(pud_pte(pud))
>>>  
>>> -#define pud_mkhuge(pud)		(__pud(pud_val(pud) & ~PUD_TABLE_BIT))
>>> +#define pud_mkhuge(pud)		(__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
>>>  
>>>  #define __pud_to_phys(pud)	__pte_to_phys(pud_pte(pud))
>>>  #define __phys_to_pud_val(phys)	__phys_to_pte_val(phys)
>>> @@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>  
>>>  #define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>>  				 PMD_TYPE_TABLE)
>>> -#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>> -				 PMD_TYPE_SECT)
>>>  #define pmd_leaf(pmd)		(pmd_present(pmd) && !pmd_table(pmd))
>>>  #define pmd_bad(pmd)		(!pmd_table(pmd))
>>>  
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]
  2024-10-14 11:33       ` Ryan Roberts
@ 2024-10-15  3:50         ` Anshuman Khandual
  0 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-15  3:50 UTC (permalink / raw)
  To: Ryan Roberts, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel



On 10/14/24 17:03, Ryan Roberts wrote:
> On 14/10/2024 11:48, Anshuman Khandual wrote:
>>
>>
>> On 10/9/24 18:58, Ryan Roberts wrote:
>>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>>> This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
>>>> , mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
>>>> instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
>>>> for the symbol's availability preventing a build warning.
>>>>
>>>> While here this also drops pmd_val() check from pmd_trans_huge() helper, as
>>>> pmd_present() returning true already ensures that pmd_val() cannot be false
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
>>>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index fa4c32a9f572..45c49c5ace80 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
>>>>  
>>>>  static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
>>>>  {
>>>> -	return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
>>>> +	return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
>>>>  }
>>>>  
>>>>  static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
>>>>  {
>>>> -	return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
>>>> +	return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>>>>  }
>>>>  
>>>>  static inline pte_t pte_swp_mkexclusive(pte_t pte)
>>>> @@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
>>>>   * THP definitions.
>>>>   */
>>>>  
>>>> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>>> +				 PMD_TYPE_SECT)
>>>> +
>>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>  static inline int pmd_trans_huge(pmd_t pmd)
>>>>  {
>>>> -	return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>>>> +	return pmd_present(pmd) && pmd_sect(pmd);
>>>
>>> Bug? Prevously we would have returned true for a "present-invalid" PMD block
>>> mapping - that's one which is formatted as a PMD block mapping except the
>>> PTE_VALID bit is clear and PTE_PRESENT_INVALID is set. But now, due to
>>> pmd_sect() testing VALID is set (via PMD_TYPE_SECT), we no longer return true in
>>> this case.
>>
>> Agreed, that will be problematic but the situation can be rectified by decoupling
>> pmd_present_invalid() from pte_present_invalid() by checking for both last bits
>> instead of just the valid bit against PTE_PRESENT_INVALID.
>>
>> #define pmd_sect(pmd)          ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>                                 PMD_TYPE_SECT)
> 
> I know this is pre-existing, but the fact that this depends on PMD_VALID being
> set feels like something waiting to bite us. From the SW's PoV, we should get
> the same answer regardless of whether PMD_VALID xor PTE_PRESENT_INVALID is set.

I guess you are talking about pmd_sect() above, right ? pmd_sect() is just a HW
level check about the entry being a block mapping, which is not aware about the
PTE_PRESENT_INVALID based mechanism used for pmd_present().

> I know there is nobody depending on that right now, but it feels like a bug
> waiting to happen. I'm not sure how you would fix that without having the SW
> explcitly know about the table bit's existance though.
> 
>>
>> #define pmd_present_invalid(pmd) \
>>        ((pmd_val(pmd) & (PMD_TYPE_MASK | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID)
> 
> I read this as "if the type field is 0 and PTE_PRESENT_INVALID is 1 then it's
> present-invalid". That doesn't really feel any better to me than the code
> knowing there is a table bit. What's the benefit of doing this vs what the code
> already does? It all feels quite hacky to me.

Table bit does not need to be known explicitly, which is the point of dropping
it off completely. 

#define pmd_mkinvalid()	pte_pmd(pte_mkinvalid(pmd_pte(pmd)))

static inline pte_t pte_mkinvalid(pte_t pte)
{
        pte = set_pte_bit(pte, __pgprot(PTE_PRESENT_INVALID));
        pte = clear_pte_bit(pte, __pgprot(PTE_VALID));
        return pte;
}

Instead pmd_mkinvalid() should be redefined as the following, asserting the fact
that PMD_TYPE_MASK bits as a block field which is being cleared and replaced with
PTE_PRESENT_INVALID, not only the table bit.

static inline pmd_t pmd_mkinvalid(pmd_t pmd)
{
        pmd = set_pmd_bit(pmd, __pgprot(PTE_PRESENT_INVALID));
        pmd = clear_pte_bit(pte, __pgprot(PMD_TYPE_MASK));
        return pte;
}

In fact currently PMD_TABLE_BIT check is required here only because pmd_present()
does not check against pmd_sect().

i.e pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT); 

pmd_present() should return true when

- pmd_sect() returns true
- pmd_present_invalid() returns true because PMD_TYPE_MASK bits field has been
  cleared and instead PTE_PRESENT_INVALID has been set.

In both the cases PMD_TABLE_BIT is not required.

> 
>>
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  static inline int pmd_trans_huge(pmd_t pmd)
>>  {
>> 	return pmd_sect(pmd) || pmd_present_invalid(pmd);
>>  }
>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>>
>>>>  }
>>>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>>  
>>>> @@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>>  
>>>>  #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
>>>>  
>>>> -#define pmd_mkhuge(pmd)		(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
>>>> +#define pmd_mkhuge(pmd)		(__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>>>
>>> I'm not sure if this also suffers from a similar problem? Is it possible that a
>>> present-invalid pmd would be passed to pmd_mkhuge()? If so, then we are now
>>> incorrectly setting the PTE_VALID bit.
>> pmd_mkhuge() converts a regular pmd into a huge page and on arm64
>> creating a huge page also involves setting PTE_VALID. Why would a
>> present-invalid pmd is passed into pmd_mkhuge() without intending
>> to make a huge entry ?
>>
>> There just two generic use cases for pmd_mkhuge().
>>
>> insert_pfn_pmd
>> 	   entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
>>
>> set_huge_zero_folio
>>         entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
>>         entry = pmd_mkhuge(entry);
>>
>> As instances in mm/debug_vm_pgtable.c, pmd_mkinvalid() should be
>> called on a PMD entry after pmd_mkhuge() not the other way around.
> 
> I guess it depends on your perspective. I agree there is no issue today. But
> from the core-mm's PoV, a present-invalid PMD should be indistinguishable from a
> present (-valid) one.

That's ideally is the case when huge page is achieved with a dedicated single HW
PTE bit apart from the valid HW PTE bit (which is not the case on arm64). So when
valid bit is cleared it still has the huge bit to test against for being "present"
with all other information on the entry, just that it's not mapped.

That's the reason PTE_PRESENT_INVALID is there to hold the fort while clearing out
PMD_VALID bit. But it actually rather clears entire PMD_TYPE_MASK completely and
depending on just PMD_TABLE_BIT being clear is not really prudent.

> 
> 
>>
>>>
>>>>  
>>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>  #define pmd_devmap(pmd)		pte_devmap(pmd_pte(pmd))
>>>> @@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
>>>>  #define pud_mkyoung(pud)	pte_pud(pte_mkyoung(pud_pte(pud)))
>>>>  #define pud_write(pud)		pte_write(pud_pte(pud))
>>>>  
>>>> -#define pud_mkhuge(pud)		(__pud(pud_val(pud) & ~PUD_TABLE_BIT))
>>>> +#define pud_mkhuge(pud)		(__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
>>>>  
>>>>  #define __pud_to_phys(pud)	__pte_to_phys(pud_pte(pud))
>>>>  #define __phys_to_pud_val(phys)	__phys_to_pte_val(phys)
>>>> @@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>>  
>>>>  #define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>>>  				 PMD_TYPE_TABLE)
>>>> -#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>>> -				 PMD_TYPE_SECT)
>>>>  #define pmd_leaf(pmd)		(pmd_present(pmd) && !pmd_table(pmd))
>>>>  #define pmd_bad(pmd)		(!pmd_table(pmd))
>>>>  
>>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] arm64/mm: Drop pte_mkhuge()
  2024-10-14 11:08       ` Ryan Roberts
@ 2024-10-15  4:23         ` Anshuman Khandual
  0 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-15  4:23 UTC (permalink / raw)
  To: Ryan Roberts, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel



On 10/14/24 16:38, Ryan Roberts wrote:
> On 14/10/2024 09:59, Anshuman Khandual wrote:
>>
>>
>> On 10/9/24 18:50, Ryan Roberts wrote:
>>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>>> Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
>>>> platform provided pte_mkhuge(). But if any platform already provides custom
>>>> arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
>>>> defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
>>>> This creates confusion as if both of these callbacks are being used in core
>>>> HugeTLB and required to be defined in the platform.
>>>>
>>>> This changes arch_make_huge_pte() to create block mapping directly and also
>>>> drops off now redundant helper pte_mkhuge(), making things clear. Also this
>>>> changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
>>>> to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>>>  arch/arm64/include/asm/pgtable.h       | 5 -----
>>>>  arch/arm64/mm/hugetlbpage.c            | 2 +-
>>>>  3 files changed, 2 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>>>> index fd330c1db289..956a702cb532 100644
>>>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>>>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>>>> @@ -158,6 +158,7 @@
>>>>  #define PTE_VALID		(_AT(pteval_t, 1) << 0)
>>>>  #define PTE_TYPE_MASK		(_AT(pteval_t, 3) << 0)
>>>>  #define PTE_TYPE_PAGE		(_AT(pteval_t, 3) << 0)
>>>> +#define PTE_TYPE_SECT		(_AT(pteval_t, 1) << 0)
>>>>  #define PTE_TABLE_BIT		(_AT(pteval_t, 1) << 1)
>>>>  #define PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
>>>>  #define PTE_RDONLY		(_AT(pteval_t, 1) << 7)		/* AP[2] */
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index c329ea061dc9..fa4c32a9f572 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
>>>>  	}
>>>>  }
>>>>  
>>>> -/*
>>>> - * Huge pte definitions.
>>>> - */
>>>> -#define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>>>> -
>>>>  /*
>>>>   * Hugetlb definitions.
>>>>   */
>>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>>> index 5f1e2103888b..5922c95630ad 100644
>>>> --- a/arch/arm64/mm/hugetlbpage.c
>>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>>> @@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>>>>  {
>>>>  	size_t pagesize = 1UL << shift;
>>>>  
>>>> -	entry = pte_mkhuge(entry);
>>>> +	entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);
>>>
>>> I think there may be an existing bug here; if pagesize == CONT_PTE_SIZE, then
>>> entry will be placed in the level 3 table. In this case, shouldn't bit 1 remain
>>> set, because at level 3, a page mapping is denoted by bits[1:0] = 3 ? Currently
>>> its being unconditionally cleared.
>>
>> That's not a problem, pte_mkcont() brings back both the bits
>> via PTE_TYPE_PAGE along with PTE_CONT.
>>
>>         if (pagesize == CONT_PTE_SIZE) {
>>                 entry = pte_mkcont(entry);
>>         } else if (pagesize == CONT_PMD_SIZE) {
>>                 entry = pmd_pte(pmd_mkcont(pte_pmd(entry)));
>>         } else if (pagesize != PUD_SIZE && pagesize != PMD_SIZE) {
>>                 pr_warn("%s: unrecognized huge page size 0x%lx\n",
>>                         __func__, pagesize);
>>         }
>>
>> static inline pte_t pte_mkcont(pte_t pte)
>> {
>> 	pte = set_pte_bit(pte, __pgprot(PTE_CONT));
>>       	return set_pte_bit(pte, __pgprot(PTE_TYPE_PAGE));
> 
> Oh wow, that's pretty hacky. Good job we never call pte_mkcont() on a
> PTE_PRESENT_INVALID PTE. This would turn it valid again.

Ideally each individual HW page table helper should not do more than one thing
at a time. Here pte_mkcont() should just take a pte which is already a valid
one with PTE_TYPE_PAGE and just set PTE_CONT.

> 
>> }
>>
>> Although the same is not required for CONT_PMD_SIZE size huge
>> pages where only PTE_CONT is enough.
>>
>> static inline pmd_t pmd_mkcont(pmd_t pmd)
>> {
>>         return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
>> }
>>  
>>>>  	if (pagesize == CONT_PTE_SIZE) {
>>>>  		entry = pte_mkcont(entry);
>>>>  	} else if (pagesize == CONT_PMD_SIZE) {
>>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT
  2024-10-09 13:32 ` [PATCH 0/5] " Ryan Roberts
@ 2024-10-15  6:47   ` Anshuman Khandual
  0 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2024-10-15  6:47 UTC (permalink / raw)
  To: Ryan Roberts, linux-arm-kernel
  Cc: Marc Zyngier, Oliver Upton, James Morse, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, kvmarm, linux-kernel



On 10/9/24 19:02, Ryan Roberts wrote:
> On 05/10/2024 13:38, Anshuman Khandual wrote:
>> Clearing PXD_TABLE_BIT i.e bit[1] on a page table entry always operates on
>> the assumption that subsequent PXD_VALID i.e bit[0] is set. That's because
>> bits[1:0]="01" makes a block mapping. So it is prudent to treat bits[1:0]
>> as a single register field, which should be updated as block or table etc.
>> Although mk_[pmd|pud]_sect_prot() helpers go to some extent in using these
>> PXD_TYPE_SECT macros, their usage is not really consistent else where.
>>
>> This series removes these table bit clearing for block mapping creation and
>> eventually completely drops off those table macros.
> Given the issue I just noticed in patch 2, I'm not sure if it's going to be
> practical to remove the table bit after all? Sorry I didn't spot this before.

Now that arch_make_huge_pte() seems to be all good in patch 1, and there might
be a solution for pmd_present() via de-coupled pmd_mkinvalid(), would it still
not possible to drop PXD_TABLE_BIT ?


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-10-15  6:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-05 12:38 [PATCH 0/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
2024-10-05 12:38 ` [PATCH 1/5] arm64/mm: Drop pte_mkhuge() Anshuman Khandual
2024-10-09 13:20   ` Ryan Roberts
2024-10-14  8:59     ` Anshuman Khandual
2024-10-14 11:08       ` Ryan Roberts
2024-10-15  4:23         ` Anshuman Khandual
2024-10-05 12:38 ` [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT] Anshuman Khandual
2024-10-09 13:28   ` Ryan Roberts
2024-10-14 10:48     ` Anshuman Khandual
2024-10-14 11:33       ` Ryan Roberts
2024-10-15  3:50         ` Anshuman Khandual
2024-10-05 12:38 ` [PATCH 3/5] arm64/ptdump: Test PMD_TYPE_MASK for block mapping Anshuman Khandual
2024-10-09 13:29   ` Ryan Roberts
2024-10-05 12:38 ` [PATCH 4/5] KVM: arm64: ptdump: " Anshuman Khandual
2024-10-09 13:30   ` Ryan Roberts
2024-10-05 12:38 ` [PATCH 5/5] arm64/mm: Drop PXD_TABLE_BIT Anshuman Khandual
2024-10-09 13:32 ` [PATCH 0/5] " Ryan Roberts
2024-10-15  6:47   ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).