linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings
@ 2016-10-12 11:23 Ard Biesheuvel
  2016-10-12 11:23 ` [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live " Ard Biesheuvel
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-12 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

This 5-piece series is a followup to the single patch 'arm64: mmu: set the
contiguous for kernel mappings when appropriate' sent out on the 10th [0].

Changes in v3:
- add support for contiguous PMDs for all granule sizes (not just 16k)
- add a separate patch to deal with contiguous PUDs (4k granule only), and
  contiguous PMDs for 2 levels of translation (which requires special handling)
- avoid pmd_none/pud_none in the BUG() statements in patch #1, since they
  may resolve in unexpected ways with folded PMDs/PUDs

Version v2 [1] addressed the following issues:
- the contiguous attribute is also useful for contigous PMD mappings on 16k
  granule kernels (i.e., 1 GB blocks)
- the function parameter 'block_mappings_allowed' does not clearly convey
  whether contiguous page mappings should be used, so it is renamed to
  'page_mappings_only', and its meaning inverted
- instead of BUGging on changes in the PTE_CONT attribute in PMD or PTE entries
  that have been populated already, BUG on any modification except for
  permission attributes, which don't require break-before-make when changed.

[0] http://marc.info/?l=linux-arm-kernel&m=147612332130714
[1] http://marc.info/?l=linux-arm-kernel&m=147618975314593

An example memory map from a Seattle system with 64 GB running a 4k/3 levels
kernel with KASLR enabled is included below.

Ard Biesheuvel (5):
  arm64: mm: BUG on unsupported manipulations of live kernel mappings
  arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only'
  arm64: mm: set the contiguous bit for kernel mappings where
    appropriate
  arm64: mm: support additional contiguous kernel mapping region sizes
  arm64: mm: round memstart_addr to contiguous PUD/PMD size

 arch/arm64/include/asm/kernel-pgtable.h |  11 +-
 arch/arm64/include/asm/mmu.h            |   2 +-
 arch/arm64/include/asm/pgtable-hwdef.h  |   6 +
 arch/arm64/kernel/efi.c                 |   8 +-
 arch/arm64/mm/mmu.c                     | 173 ++++++++++++++------
 5 files changed, 140 insertions(+), 60 deletions(-)

-- 
2.7.4

// Kernel mapping

0xffffff8a70100000-0xffffff8a70200000      1M PTE ro x  SHD AF    CON    
0xffffff8a70200000-0xffffff8a70600000      4M PMD ro x  SHD AF        BLK
0xffffff8a70600000-0xffffff8a706d0000    832K PTE ro x  SHD AF    CON    
0xffffff8a706d0000-0xffffff8a70920000   2368K PTE ro NX SHD AF    CON    
0xffffff8a70a90000-0xffffff8a70b40000    704K PTE RW NX SHD AF    CON    
0xffffff8a70b40000-0xffffff8a70b44000     16K PTE RW NX SHD AF           
...

---[ Linear Mapping ]---
0xffffffcc00e80000-0xffffffcc01000000   1536K PTE RW NX SHD AF    CON    
0xffffffcc01000000-0xffffffcc02000000     16M PMD RW NX SHD AF        BLK
0xffffffcc02000000-0xffffffcc40000000    992M PMD RW NX SHD AF    CON BLK
0xffffffcc40000000-0xffffffd180000000     21G PGD RW NX SHD AF        BLK
0xffffffd180000000-0xffffffd198000000    384M PMD RW NX SHD AF    CON BLK
0xffffffd198000000-0xffffffd199200000     18M PMD RW NX SHD AF        BLK
0xffffffd199200000-0xffffffd199300000      1M PTE RW NX SHD AF    CON    
0xffffffd199300000-0xffffffd199400000      1M PTE ro NX SHD AF    CON    
0xffffffd199400000-0xffffffd199a00000      6M PMD ro NX SHD AF        BLK
0xffffffd199a00000-0xffffffd199b20000   1152K PTE ro NX SHD AF    CON    
0xffffffd199b20000-0xffffffd199c00000    896K PTE RW NX SHD AF    CON    
0xffffffd199c00000-0xffffffd19a000000      4M PMD RW NX SHD AF        BLK
0xffffffd19a000000-0xffffffd1c0000000    608M PMD RW NX SHD AF    CON BLK
0xffffffd1c0000000-0xffffffd400000000      9G PGD RW NX SHD AF        BLK
0xffffffd400000000-0xffffffd800000000     16G PGD RW NX SHD AF    CON BLK
0xffffffd800000000-0xffffffdbc0000000     15G PGD RW NX SHD AF        BLK
0xffffffdbc0000000-0xffffffdbf8000000    896M PMD RW NX SHD AF    CON BLK
0xffffffdbf8000000-0xffffffdbf8600000      6M PMD RW NX SHD AF        BLK
0xffffffdbf8600000-0xffffffdbf8700000      1M PTE RW NX SHD AF    CON    
0xffffffdbf87a0000-0xffffffdbf8800000    384K PTE RW NX SHD AF    CON    
0xffffffdbf8800000-0xffffffdbfb800000     48M PMD RW NX SHD AF        BLK
0xffffffdbfb800000-0xffffffdbfb830000    192K PTE RW NX SHD AF    CON    
0xffffffdbfbc10000-0xffffffdbfbcd0000    768K PTE RW NX SHD AF    CON    
0xffffffdbfc000000-0xffffffdbfe000000     32M PMD RW NX SHD AF    CON BLK
0xffffffdbfe000000-0xffffffdbffe00000     30M PMD RW NX SHD AF        BLK
0xffffffdbffe00000-0xffffffdbfffd0000   1856K PTE RW NX SHD AF    CON    
0xffffffdbffff0000-0xffffffdc00000000     64K PTE RW NX SHD AF    CON    

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-12 11:23 [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings Ard Biesheuvel
@ 2016-10-12 11:23 ` Ard Biesheuvel
  2016-10-12 15:04   ` Catalin Marinas
  2016-10-12 11:23 ` [PATCH v3 2/5] arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only' Ard Biesheuvel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-12 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we take care not manipulate the live kernel page tables in a
way that may lead to TLB conflicts, the case where a table mapping is
replaced by a block mapping can no longer occur. So remove the handling
of this at the PUD and PMD levels, and instead, BUG() on any occurrence
of live kernel page table manipulations that modify anything other than
the permission bits.

Since mark_rodata_ro() is the only caller where the kernel mappings that
are being manipulated are actually live, drop the various conditional
flush_tlb_all() invocations, and add a single call to mark_rodata_ro()
instead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 68 ++++++++++++--------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 05615a3fdc6f..e1c34e5a1d7d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -28,8 +28,6 @@
 #include <linux/memblock.h>
 #include <linux/fs.h>
 #include <linux/io.h>
-#include <linux/slab.h>
-#include <linux/stop_machine.h>
 
 #include <asm/barrier.h>
 #include <asm/cputype.h>
@@ -95,6 +93,12 @@ static phys_addr_t __init early_pgtable_alloc(void)
 	return phys;
 }
 
+/*
+ * The following mapping attributes may be updated in live
+ * kernel mappings without the need for break-before-make.
+ */
+static const pteval_t modifiable_attr_mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
+
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
@@ -115,8 +119,18 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 
 	pte = pte_set_fixmap_offset(pmd, addr);
 	do {
+		pte_t old_pte = *pte;
+
 		set_pte(pte, pfn_pte(pfn, prot));
 		pfn++;
+
+		/*
+		 * After the PTE entry has been populated once, we
+		 * only allow updates to the permission attributes.
+		 */
+		BUG_ON(pte_val(old_pte) != 0 &&
+		       ((pte_val(old_pte) ^ pte_val(*pte)) &
+			~modifiable_attr_mask) != 0);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
 	pte_clear_fixmap();
@@ -146,27 +160,28 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 
 	pmd = pmd_set_fixmap_offset(pud, addr);
 	do {
+		pmd_t old_pmd = *pmd;
+
 		next = pmd_addr_end(addr, end);
+
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
 		      allow_block_mappings) {
-			pmd_t old_pmd =*pmd;
 			pmd_set_huge(pmd, phys, prot);
+
 			/*
-			 * Check for previous table entries created during
-			 * boot (__create_page_tables) and flush them.
+			 * After the PMD entry has been populated once, we
+			 * only allow updates to the permission attributes.
 			 */
-			if (!pmd_none(old_pmd)) {
-				flush_tlb_all();
-				if (pmd_table(old_pmd)) {
-					phys_addr_t table = pmd_page_paddr(old_pmd);
-					if (!WARN_ON_ONCE(slab_is_available()))
-						memblock_free(table, PAGE_SIZE);
-				}
-			}
+			BUG_ON(pmd_val(old_pmd) != 0 &&
+			       ((pmd_val(old_pmd) ^ pmd_val(*pmd)) &
+				~modifiable_attr_mask) != 0);
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
 				       prot, pgtable_alloc);
+
+			BUG_ON(pmd_val(old_pmd) != 0 &&
+			       pmd_val(old_pmd) != pmd_val(*pmd));
 		}
 		phys += next - addr;
 	} while (pmd++, addr = next, addr != end);
@@ -204,33 +219,29 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 
 	pud = pud_set_fixmap_offset(pgd, addr);
 	do {
+		pud_t old_pud = *pud;
+
 		next = pud_addr_end(addr, end);
 
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
 		if (use_1G_block(addr, next, phys) && allow_block_mappings) {
-			pud_t old_pud = *pud;
 			pud_set_huge(pud, phys, prot);
 
 			/*
-			 * If we have an old value for a pud, it will
-			 * be pointing to a pmd table that we no longer
-			 * need (from swapper_pg_dir).
-			 *
-			 * Look up the old pmd table and free it.
+			 * After the PUD entry has been populated once, we
+			 * only allow updates to the permission attributes.
 			 */
-			if (!pud_none(old_pud)) {
-				flush_tlb_all();
-				if (pud_table(old_pud)) {
-					phys_addr_t table = pud_page_paddr(old_pud);
-					if (!WARN_ON_ONCE(slab_is_available()))
-						memblock_free(table, PAGE_SIZE);
-				}
-			}
+			BUG_ON(pud_val(old_pud) != 0 &&
+			       ((pud_val(old_pud) ^ pud_val(*pud)) &
+				~modifiable_attr_mask) != 0);
 		} else {
 			alloc_init_pmd(pud, addr, next, phys, prot,
 				       pgtable_alloc, allow_block_mappings);
+
+			BUG_ON(pud_val(old_pud) != 0 &&
+			       pud_val(old_pud) != pud_val(*pud));
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
@@ -396,6 +407,9 @@ void mark_rodata_ro(void)
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
 	create_mapping_late(__pa(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
+
+	/* flush the TLBs after updating live kernel mappings */
+	flush_tlb_all();
 }
 
 static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/5] arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only'
  2016-10-12 11:23 [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings Ard Biesheuvel
  2016-10-12 11:23 ` [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live " Ard Biesheuvel
@ 2016-10-12 11:23 ` Ard Biesheuvel
  2016-10-12 15:07   ` Mark Rutland
  2016-10-12 11:23 ` [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-12 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation of adding support for contiguous PTE and PMD mappings,
let's replace 'block_mappings_allowed' with 'page_mappings_only', which
will be a more accurate description of the nature of the setting once we
add such contiguous mappings into the mix.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu.h |  2 +-
 arch/arm64/kernel/efi.c      |  8 ++---
 arch/arm64/mm/mmu.c          | 32 ++++++++++----------
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 8d9fce037b2f..a81454ad5455 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -34,7 +34,7 @@ extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
 extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot, bool allow_block_mappings);
+			       pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 
 #endif
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index ba9bee389fd5..5d17f377d905 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -62,8 +62,8 @@ struct screen_info screen_info __section(.data);
 int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
 {
 	pteval_t prot_val = create_mapping_protection(md);
-	bool allow_block_mappings = (md->type != EFI_RUNTIME_SERVICES_CODE &&
-				     md->type != EFI_RUNTIME_SERVICES_DATA);
+	bool page_mappings_only = (md->type == EFI_RUNTIME_SERVICES_CODE ||
+				   md->type == EFI_RUNTIME_SERVICES_DATA);
 
 	if (!PAGE_ALIGNED(md->phys_addr) ||
 	    !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
@@ -76,12 +76,12 @@ int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
 		 * from the MMU routines. So avoid block mappings altogether in
 		 * that case.
 		 */
-		allow_block_mappings = false;
+		page_mappings_only = true;
 	}
 
 	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,
 			   md->num_pages << EFI_PAGE_SHIFT,
-			   __pgprot(prot_val | PTE_NG), allow_block_mappings);
+			   __pgprot(prot_val | PTE_NG), page_mappings_only);
 	return 0;
 }
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e1c34e5a1d7d..bf1d71b62c4f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -139,7 +139,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool allow_block_mappings)
+				  bool page_mappings_only)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -166,7 +166,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
-		      allow_block_mappings) {
+		      !page_mappings_only) {
 			pmd_set_huge(pmd, phys, prot);
 
 			/*
@@ -204,7 +204,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool allow_block_mappings)
+				  bool page_mappings_only)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -226,7 +226,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
-		if (use_1G_block(addr, next, phys) && allow_block_mappings) {
+		if (use_1G_block(addr, next, phys) && !page_mappings_only) {
 			pud_set_huge(pud, phys, prot);
 
 			/*
@@ -238,7 +238,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				~modifiable_attr_mask) != 0);
 		} else {
 			alloc_init_pmd(pud, addr, next, phys, prot,
-				       pgtable_alloc, allow_block_mappings);
+				       pgtable_alloc, page_mappings_only);
 
 			BUG_ON(pud_val(old_pud) != 0 &&
 			       pud_val(old_pud) != pud_val(*pud));
@@ -253,7 +253,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
 				 phys_addr_t (*pgtable_alloc)(void),
-				 bool allow_block_mappings)
+				 bool page_mappings_only)
 {
 	unsigned long addr, length, end, next;
 	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
@@ -273,7 +273,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	do {
 		next = pgd_addr_end(addr, end);
 		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
-			       allow_block_mappings);
+			       page_mappings_only);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -302,17 +302,17 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, true);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot, bool allow_block_mappings)
+			       pgprot_t prot, bool page_mappings_only)
 {
 	BUG_ON(mm == &init_mm);
 
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-			     pgd_pgtable_alloc, allow_block_mappings);
+			     pgd_pgtable_alloc, page_mappings_only);
 }
 
 static void create_mapping_late(phys_addr_t phys, unsigned long virt,
@@ -325,7 +325,7 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     NULL, !debug_pagealloc_enabled());
+			     NULL, debug_pagealloc_enabled());
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
@@ -343,7 +343,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     !debug_pagealloc_enabled());
+				     debug_pagealloc_enabled());
 		return;
 	}
 
@@ -356,13 +356,13 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     __phys_to_virt(start),
 				     kernel_start - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     !debug_pagealloc_enabled());
+				     debug_pagealloc_enabled());
 	if (kernel_end < end)
 		__create_pgd_mapping(pgd, kernel_end,
 				     __phys_to_virt(kernel_end),
 				     end - kernel_end, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     !debug_pagealloc_enabled());
+				     debug_pagealloc_enabled());
 
 	/*
 	 * Map the linear alias of the [_text, __init_begin) interval as
@@ -372,7 +372,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
 			     kernel_end - kernel_start, PAGE_KERNEL_RO,
-			     early_pgtable_alloc, !debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
 static void __init map_mem(pgd_t *pgd)
@@ -422,7 +422,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(size));
 
 	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, !debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled());
 
 	vma->addr	= va_start;
 	vma->phys_addr	= pa_start;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2016-10-12 11:23 [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings Ard Biesheuvel
  2016-10-12 11:23 ` [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live " Ard Biesheuvel
  2016-10-12 11:23 ` [PATCH v3 2/5] arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only' Ard Biesheuvel
@ 2016-10-12 11:23 ` Ard Biesheuvel
  2016-10-13 16:28   ` Catalin Marinas
  2016-10-12 11:23 ` [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes Ard Biesheuvel
  2016-10-12 11:23 ` [PATCH v3 5/5] arm64: mm: round memstart_addr to contiguous PUD/PMD size Ard Biesheuvel
  4 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-12 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we no longer allow live kernel PMDs to be split, it is safe to
start using the contiguous bit for kernel mappings. So set the contiguous
bit in the kernel page mappings for regions whose size and alignment are
suitable for this.

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

          granule size |  cont PTE  |  cont PMD  |
          -------------+------------+------------+
               4 KB    |    64 KB   |   32 MB    |
              16 KB    |     2 MB   |    1 GB*   |
              64 KB    |     2 MB   |   16 GB*   |

* only when built for 3 or more levels of translation

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 35 +++++++++++++++++---
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index bf1d71b62c4f..40be4979102d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -102,8 +102,10 @@ static const pteval_t modifiable_attr_mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool page_mappings_only)
 {
+	pgprot_t __prot = prot;
 	pte_t *pte;
 
 	BUG_ON(pmd_sect(*pmd));
@@ -121,7 +123,18 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
+		/*
+		 * Set the contiguous bit for the subsequent group of PTEs if
+		 * its size and alignment are appropriate.
+		 */
+		if (((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
+			if (end - addr >= CONT_PTE_SIZE && !page_mappings_only)
+				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+			else
+				__prot = prot;
+		}
+
+		set_pte(pte, pfn_pte(pfn, __prot));
 		pfn++;
 
 		/*
@@ -141,6 +154,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t (*pgtable_alloc)(void),
 				  bool page_mappings_only)
 {
+	pgprot_t __prot = prot;
 	pmd_t *pmd;
 	unsigned long next;
 
@@ -167,7 +181,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
 		      !page_mappings_only) {
-			pmd_set_huge(pmd, phys, prot);
+			/*
+			 * Set the contiguous bit for the subsequent group of
+			 * PMDs if its size and alignment are appropriate.
+			 */
+			if (((addr | phys) & ~CONT_PMD_MASK) == 0) {
+				if (end - addr >= CONT_PMD_SIZE)
+					__prot = __pgprot(pgprot_val(prot) |
+							  PTE_CONT);
+				else
+					__prot = prot;
+			}
+
+			pmd_set_huge(pmd, phys, __prot);
 
 			/*
 			 * After the PMD entry has been populated once, we
@@ -178,7 +204,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				~modifiable_attr_mask) != 0);
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
-				       prot, pgtable_alloc);
+				       prot, pgtable_alloc,
+				       page_mappings_only);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
 			       pmd_val(old_pmd) != pmd_val(*pmd));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes
  2016-10-12 11:23 [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2016-10-12 11:23 ` [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
@ 2016-10-12 11:23 ` Ard Biesheuvel
  2016-10-14 10:28   ` Catalin Marinas
  2016-10-12 11:23 ` [PATCH v3 5/5] arm64: mm: round memstart_addr to contiguous PUD/PMD size Ard Biesheuvel
  4 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-12 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Extend the basic support for kernel mappings using contiguous regions
by adding support for contiguous PUDs (4k granule only), either as a
discrete level or folded into the PGDs. In the same way, handle folded
PMDs so that contiguous PMDs (for 16k and 64k granule kernels) will
work as expected for 2 levels of translation as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable-hwdef.h |  6 +++
 arch/arm64/mm/mmu.c                    | 40 +++++++++++++++++++-
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd90de9..4192072af932 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -93,12 +93,15 @@
 #ifdef CONFIG_ARM64_64K_PAGES
 #define CONT_PTE_SHIFT		5
 #define CONT_PMD_SHIFT		5
+#define CONT_PUD_SHIFT		0
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define CONT_PTE_SHIFT		7
 #define CONT_PMD_SHIFT		5
+#define CONT_PUD_SHIFT		0
 #else
 #define CONT_PTE_SHIFT		4
 #define CONT_PMD_SHIFT		4
+#define CONT_PUD_SHIFT		4
 #endif
 
 #define CONT_PTES		(1 << CONT_PTE_SHIFT)
@@ -107,6 +110,9 @@
 #define CONT_PMDS		(1 << CONT_PMD_SHIFT)
 #define CONT_PMD_SIZE		(CONT_PMDS * PMD_SIZE)
 #define CONT_PMD_MASK		(~(CONT_PMD_SIZE - 1))
+#define CONT_PUDS		(1 << CONT_PUD_SHIFT)
+#define CONT_PUD_SIZE		(CONT_PUDS * PUD_SIZE)
+#define CONT_PUD_MASK		(~(CONT_PUD_SIZE - 1))
 /* the the numerical offset of the PTE within a range of CONT_PTES */
 #define CONT_RANGE_OFFSET(addr) (((addr)>>PAGE_SHIFT)&(CONT_PTES-1))
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 40be4979102d..0e0eca45b54a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -233,6 +233,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				  phys_addr_t (*pgtable_alloc)(void),
 				  bool page_mappings_only)
 {
+	pgprot_t __prot = prot;
 	pud_t *pud;
 	unsigned long next;
 
@@ -254,7 +255,19 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
 		if (use_1G_block(addr, next, phys) && !page_mappings_only) {
-			pud_set_huge(pud, phys, prot);
+			/*
+			 * Set the contiguous bit for the subsequent group of
+			 * PUDs if its size and alignment are appropriate.
+			 */
+			if (((addr | phys) & ~CONT_PUD_MASK) == 0) {
+				if (end - addr >= CONT_PUD_SIZE)
+					__prot = __pgprot(pgprot_val(prot) |
+							  PTE_CONT);
+				else
+					__prot = prot;
+			}
+
+			pud_set_huge(pud, phys, __prot);
 
 			/*
 			 * After the PUD entry has been populated once, we
@@ -284,6 +297,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 {
 	unsigned long addr, length, end, next;
 	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
+	pgprot_t __prot = prot;
 
 	/*
 	 * If the virtual and physical address don't have the same offset
@@ -299,7 +313,29 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	end = addr + length;
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
+
+		/*
+		 * If any intermediate levels are folded into the PGDs, we
+		 * need to deal with the contiguous attributes here, since
+		 * the contiguity can only be observed at this level.
+		 */
+		if (PGDIR_SHIFT == PMD_SHIFT && !page_mappings_only &&
+		    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
+			if (end - addr >= CONT_PMD_SIZE)
+				__prot = __pgprot(pgprot_val(prot) |
+						  PTE_CONT);
+			else
+				__prot = prot;
+		} else if (PGDIR_SHIFT == PUD_SHIFT && CONT_PUD_SHIFT > 0 &&
+			   !page_mappings_only &&
+			   ((addr | phys) & ~CONT_PUD_MASK) == 0) {
+			if (end - addr >= CONT_PUD_SIZE)
+				__prot = __pgprot(pgprot_val(prot) |
+						  PTE_CONT);
+			else
+				__prot = prot;
+		}
+		alloc_init_pud(pgd, addr, next, phys, __prot, pgtable_alloc,
 			       page_mappings_only);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/5] arm64: mm: round memstart_addr to contiguous PUD/PMD size
  2016-10-12 11:23 [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2016-10-12 11:23 ` [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes Ard Biesheuvel
@ 2016-10-12 11:23 ` Ard Biesheuvel
  4 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-12 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Given that contiguous PUDs/PMDs allow the linear region to be mapped more
efficiently, ensure that the relative alignment between the linear virtual
and physical mappings of system RAM are equal modulo the contiguous PUD
size (or contiguous PMD size, for 16k/64k granule size).

Note that this reduces the KASLR randomization of the linear range by
PUD_SHIFT (or PMD_SHIFT) bits.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 7e51d1b57c0c..71dbd9e1e809 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -83,16 +83,13 @@
 /*
  * To make optimal use of block mappings when laying out the linear
  * mapping, round down the base of physical memory to a size that can
- * be mapped efficiently, i.e., either PUD_SIZE (4k granule) or PMD_SIZE
- * (64k granule), or a multiple that can be mapped using contiguous bits
- * in the page tables: 32 * PMD_SIZE (16k granule)
+ * be mapped efficiently, i.e., either CONT_PUD_SIZE (4k granule) or
+ * CONT_PMD_SIZE (16k/64k granules)
  */
 #if defined(CONFIG_ARM64_4K_PAGES)
-#define ARM64_MEMSTART_SHIFT		PUD_SHIFT
-#elif defined(CONFIG_ARM64_16K_PAGES)
-#define ARM64_MEMSTART_SHIFT		(PMD_SHIFT + 5)
+#define ARM64_MEMSTART_SHIFT		(PUD_SHIFT + CONT_PUD_SHIFT)
 #else
-#define ARM64_MEMSTART_SHIFT		PMD_SHIFT
+#define ARM64_MEMSTART_SHIFT		(PMD_SHIFT + CONT_PMD_SHIFT)
 #endif
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-12 11:23 ` [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live " Ard Biesheuvel
@ 2016-10-12 15:04   ` Catalin Marinas
  2016-10-13 12:25     ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2016-10-12 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 12, 2016 at 12:23:41PM +0100, Ard Biesheuvel wrote:
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -28,8 +28,6 @@
>  #include <linux/memblock.h>
>  #include <linux/fs.h>
>  #include <linux/io.h>
> -#include <linux/slab.h>
> -#include <linux/stop_machine.h>
>  
>  #include <asm/barrier.h>
>  #include <asm/cputype.h>
> @@ -95,6 +93,12 @@ static phys_addr_t __init early_pgtable_alloc(void)
>  	return phys;
>  }
>  
> +/*
> + * The following mapping attributes may be updated in live
> + * kernel mappings without the need for break-before-make.
> + */
> +static const pteval_t modifiable_attr_mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
> +
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, unsigned long pfn,
>  				  pgprot_t prot,
> @@ -115,8 +119,18 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  
>  	pte = pte_set_fixmap_offset(pmd, addr);
>  	do {
> +		pte_t old_pte = *pte;
> +
>  		set_pte(pte, pfn_pte(pfn, prot));
>  		pfn++;
> +
> +		/*
> +		 * After the PTE entry has been populated once, we
> +		 * only allow updates to the permission attributes.
> +		 */
> +		BUG_ON(pte_val(old_pte) != 0 &&
> +		       ((pte_val(old_pte) ^ pte_val(*pte)) &
> +			~modifiable_attr_mask) != 0);

Please turn this check into a single macro. You have it in three places
already (though with different types but a macro would do). Something
like below (feel free to come up with a better macro name):

		BUG_ON(!safe_pgattr_change(old_pte, *pte));

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 2/5] arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only'
  2016-10-12 11:23 ` [PATCH v3 2/5] arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only' Ard Biesheuvel
@ 2016-10-12 15:07   ` Mark Rutland
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2016-10-12 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 12, 2016 at 12:23:42PM +0100, Ard Biesheuvel wrote:
> In preparation of adding support for contiguous PTE and PMD mappings,
> let's replace 'block_mappings_allowed' with 'page_mappings_only', which
> will be a more accurate description of the nature of the setting once we
> add such contiguous mappings into the mix.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Regardless of the contiguous bit stuff, I think this makes the code 
clearer. As far as I can tell, this is correct. So FWIW:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/include/asm/mmu.h |  2 +-
>  arch/arm64/kernel/efi.c      |  8 ++---
>  arch/arm64/mm/mmu.c          | 32 ++++++++++----------
>  3 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 8d9fce037b2f..a81454ad5455 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -34,7 +34,7 @@ extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
>  extern void init_mem_pgprot(void);
>  extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>  			       unsigned long virt, phys_addr_t size,
> -			       pgprot_t prot, bool allow_block_mappings);
> +			       pgprot_t prot, bool page_mappings_only);
>  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
>  
>  #endif
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index ba9bee389fd5..5d17f377d905 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -62,8 +62,8 @@ struct screen_info screen_info __section(.data);
>  int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
>  {
>  	pteval_t prot_val = create_mapping_protection(md);
> -	bool allow_block_mappings = (md->type != EFI_RUNTIME_SERVICES_CODE &&
> -				     md->type != EFI_RUNTIME_SERVICES_DATA);
> +	bool page_mappings_only = (md->type == EFI_RUNTIME_SERVICES_CODE ||
> +				   md->type == EFI_RUNTIME_SERVICES_DATA);
>  
>  	if (!PAGE_ALIGNED(md->phys_addr) ||
>  	    !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
> @@ -76,12 +76,12 @@ int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
>  		 * from the MMU routines. So avoid block mappings altogether in
>  		 * that case.
>  		 */
> -		allow_block_mappings = false;
> +		page_mappings_only = true;
>  	}
>  
>  	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,
>  			   md->num_pages << EFI_PAGE_SHIFT,
> -			   __pgprot(prot_val | PTE_NG), allow_block_mappings);
> +			   __pgprot(prot_val | PTE_NG), page_mappings_only);
>  	return 0;
>  }
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index e1c34e5a1d7d..bf1d71b62c4f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -139,7 +139,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  				  phys_addr_t phys, pgprot_t prot,
>  				  phys_addr_t (*pgtable_alloc)(void),
> -				  bool allow_block_mappings)
> +				  bool page_mappings_only)
>  {
>  	pmd_t *pmd;
>  	unsigned long next;
> @@ -166,7 +166,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  
>  		/* try section mapping first */
>  		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
> -		      allow_block_mappings) {
> +		      !page_mappings_only) {
>  			pmd_set_huge(pmd, phys, prot);
>  
>  			/*
> @@ -204,7 +204,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
>  static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  				  phys_addr_t phys, pgprot_t prot,
>  				  phys_addr_t (*pgtable_alloc)(void),
> -				  bool allow_block_mappings)
> +				  bool page_mappings_only)
>  {
>  	pud_t *pud;
>  	unsigned long next;
> @@ -226,7 +226,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  		/*
>  		 * For 4K granule only, attempt to put down a 1GB block
>  		 */
> -		if (use_1G_block(addr, next, phys) && allow_block_mappings) {
> +		if (use_1G_block(addr, next, phys) && !page_mappings_only) {
>  			pud_set_huge(pud, phys, prot);
>  
>  			/*
> @@ -238,7 +238,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  				~modifiable_attr_mask) != 0);
>  		} else {
>  			alloc_init_pmd(pud, addr, next, phys, prot,
> -				       pgtable_alloc, allow_block_mappings);
> +				       pgtable_alloc, page_mappings_only);
>  
>  			BUG_ON(pud_val(old_pud) != 0 &&
>  			       pud_val(old_pud) != pud_val(*pud));
> @@ -253,7 +253,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>  				 unsigned long virt, phys_addr_t size,
>  				 pgprot_t prot,
>  				 phys_addr_t (*pgtable_alloc)(void),
> -				 bool allow_block_mappings)
> +				 bool page_mappings_only)
>  {
>  	unsigned long addr, length, end, next;
>  	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
> @@ -273,7 +273,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>  	do {
>  		next = pgd_addr_end(addr, end);
>  		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
> -			       allow_block_mappings);
> +			       page_mappings_only);
>  		phys += next - addr;
>  	} while (pgd++, addr = next, addr != end);
>  }
> @@ -302,17 +302,17 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
>  			&phys, virt);
>  		return;
>  	}
> -	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, true);
> +	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
>  }
>  
>  void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>  			       unsigned long virt, phys_addr_t size,
> -			       pgprot_t prot, bool allow_block_mappings)
> +			       pgprot_t prot, bool page_mappings_only)
>  {
>  	BUG_ON(mm == &init_mm);
>  
>  	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
> -			     pgd_pgtable_alloc, allow_block_mappings);
> +			     pgd_pgtable_alloc, page_mappings_only);
>  }
>  
>  static void create_mapping_late(phys_addr_t phys, unsigned long virt,
> @@ -325,7 +325,7 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
>  	}
>  
>  	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
> -			     NULL, !debug_pagealloc_enabled());
> +			     NULL, debug_pagealloc_enabled());
>  }
>  
>  static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
> @@ -343,7 +343,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
>  				     end - start, PAGE_KERNEL,
>  				     early_pgtable_alloc,
> -				     !debug_pagealloc_enabled());
> +				     debug_pagealloc_enabled());
>  		return;
>  	}
>  
> @@ -356,13 +356,13 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  				     __phys_to_virt(start),
>  				     kernel_start - start, PAGE_KERNEL,
>  				     early_pgtable_alloc,
> -				     !debug_pagealloc_enabled());
> +				     debug_pagealloc_enabled());
>  	if (kernel_end < end)
>  		__create_pgd_mapping(pgd, kernel_end,
>  				     __phys_to_virt(kernel_end),
>  				     end - kernel_end, PAGE_KERNEL,
>  				     early_pgtable_alloc,
> -				     !debug_pagealloc_enabled());
> +				     debug_pagealloc_enabled());
>  
>  	/*
>  	 * Map the linear alias of the [_text, __init_begin) interval as
> @@ -372,7 +372,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	 */
>  	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>  			     kernel_end - kernel_start, PAGE_KERNEL_RO,
> -			     early_pgtable_alloc, !debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled());
>  }
>  
>  static void __init map_mem(pgd_t *pgd)
> @@ -422,7 +422,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	BUG_ON(!PAGE_ALIGNED(size));
>  
>  	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
> -			     early_pgtable_alloc, !debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled());
>  
>  	vma->addr	= va_start;
>  	vma->phys_addr	= pa_start;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-12 15:04   ` Catalin Marinas
@ 2016-10-13 12:25     ` Ard Biesheuvel
  2016-10-13 14:44       ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-13 12:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 October 2016 at 16:04, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Oct 12, 2016 at 12:23:41PM +0100, Ard Biesheuvel wrote:
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -28,8 +28,6 @@
>>  #include <linux/memblock.h>
>>  #include <linux/fs.h>
>>  #include <linux/io.h>
>> -#include <linux/slab.h>
>> -#include <linux/stop_machine.h>
>>
>>  #include <asm/barrier.h>
>>  #include <asm/cputype.h>
>> @@ -95,6 +93,12 @@ static phys_addr_t __init early_pgtable_alloc(void)
>>       return phys;
>>  }
>>
>> +/*
>> + * The following mapping attributes may be updated in live
>> + * kernel mappings without the need for break-before-make.
>> + */
>> +static const pteval_t modifiable_attr_mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
>> +
>>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>                                 unsigned long end, unsigned long pfn,
>>                                 pgprot_t prot,
>> @@ -115,8 +119,18 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>
>>       pte = pte_set_fixmap_offset(pmd, addr);
>>       do {
>> +             pte_t old_pte = *pte;
>> +
>>               set_pte(pte, pfn_pte(pfn, prot));
>>               pfn++;
>> +
>> +             /*
>> +              * After the PTE entry has been populated once, we
>> +              * only allow updates to the permission attributes.
>> +              */
>> +             BUG_ON(pte_val(old_pte) != 0 &&
>> +                    ((pte_val(old_pte) ^ pte_val(*pte)) &
>> +                     ~modifiable_attr_mask) != 0);
>
> Please turn this check into a single macro. You have it in three places
> already (though with different types but a macro would do). Something
> like below (feel free to come up with a better macro name):
>
>                 BUG_ON(!safe_pgattr_change(old_pte, *pte));
>

Something like below? With that, I can also fold the PMD and PUD
versions of the BUG() together.

"""
/*
 * Returns whether updating a live page table entry is safe:
 * - if the old and new values are identical,
 * - if an invalid mapping is turned into a valid one (or vice versa),
 * - if the entry is a block or page mapping, and the old and new values
 *   only differ in the PXN/RDONLY/WRITE bits.
 *
 * NOTE: 'safe' does not imply that no TLB maintenance is required, it only
 *       means that no TLB conflicts should occur as a result of the update.
 */
#define __set_pgattr_is_safe(type, old, new, blocktype) \
(type ## _val(old) == type ## _val(new) || \
((type ## _val(old) ^ type ## _val(new)) & PTE_VALID) != 0 || \
(((type ## _val(old) & PTE_TYPE_MASK) == blocktype) && \
 (((type ## _val(old) ^ type ## _val(new)) & \
   ~(PTE_PXN | PTE_RDONLY | PTE_WRITE)) == 0)))

static inline bool set_live_pte_is_safe(pte_t old, pte_t new)
{
return __set_pgattr_is_safe(pte, old, new, PTE_TYPE_PAGE);
}

static inline bool set_live_pmd_is_safe(pmd_t old, pmd_t new)
{
return __set_pgattr_is_safe(pmd, old, new, PMD_TYPE_SECT);
}

static inline bool set_live_pud_is_safe(pud_t old, pud_t new)
{
return __set_pgattr_is_safe(pud, old, new, PUD_TYPE_SECT);
}
"""

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-13 12:25     ` Ard Biesheuvel
@ 2016-10-13 14:44       ` Catalin Marinas
  2016-10-13 14:48         ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2016-10-13 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 13, 2016 at 01:25:53PM +0100, Ard Biesheuvel wrote:
> On 12 October 2016 at 16:04, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Wed, Oct 12, 2016 at 12:23:41PM +0100, Ard Biesheuvel wrote:
> >> +/*
> >> + * The following mapping attributes may be updated in live
> >> + * kernel mappings without the need for break-before-make.
> >> + */
> >> +static const pteval_t modifiable_attr_mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
> >> +
> >>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> >>                                 unsigned long end, unsigned long pfn,
> >>                                 pgprot_t prot,
> >> @@ -115,8 +119,18 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> >>
> >>       pte = pte_set_fixmap_offset(pmd, addr);
> >>       do {
> >> +             pte_t old_pte = *pte;
> >> +
> >>               set_pte(pte, pfn_pte(pfn, prot));
> >>               pfn++;
> >> +
> >> +             /*
> >> +              * After the PTE entry has been populated once, we
> >> +              * only allow updates to the permission attributes.
> >> +              */
> >> +             BUG_ON(pte_val(old_pte) != 0 &&
> >> +                    ((pte_val(old_pte) ^ pte_val(*pte)) &
> >> +                     ~modifiable_attr_mask) != 0);
> >
> > Please turn this check into a single macro. You have it in three places
> > already (though with different types but a macro would do). Something
> > like below (feel free to come up with a better macro name):
> >
> >                 BUG_ON(!safe_pgattr_change(old_pte, *pte));
> 
> Something like below? With that, I can also fold the PMD and PUD
> versions of the BUG() together.

(fixing up alignment to make it readable)

> """
> /*
>  * Returns whether updating a live page table entry is safe:
>  * - if the old and new values are identical,
>  * - if an invalid mapping is turned into a valid one (or vice versa),
>  * - if the entry is a block or page mapping, and the old and new values
>  *   only differ in the PXN/RDONLY/WRITE bits.
>  *
>  * NOTE: 'safe' does not imply that no TLB maintenance is required, it only
>  *       means that no TLB conflicts should occur as a result of the update.
>  */
> #define __set_pgattr_is_safe(type, old, new, blocktype) \
>	(type ## _val(old) == type ## _val(new) || \
>	 ((type ## _val(old) ^ type ## _val(new)) & PTE_VALID) != 0 || \
>	 (((type ## _val(old) & PTE_TYPE_MASK) == blocktype) && \
>	  (((type ## _val(old) ^ type ## _val(new)) & \
>	 ~(PTE_PXN | PTE_RDONLY | PTE_WRITE)) == 0)))
> 
> static inline bool set_live_pte_is_safe(pte_t old, pte_t new)
> {
>	return __set_pgattr_is_safe(pte, old, new, PTE_TYPE_PAGE);
> }
> 
> static inline bool set_live_pmd_is_safe(pmd_t old, pmd_t new)
> {
>	return __set_pgattr_is_safe(pmd, old, new, PMD_TYPE_SECT);
> }
> 
> static inline bool set_live_pud_is_safe(pud_t old, pud_t new)
> {
>	return __set_pgattr_is_safe(pud, old, new, PUD_TYPE_SECT);
> }

The set_ prefix is slightly confusing as it suggests (to me) having a
side effect. Maybe pgattr_set_is_safe()?

But it looks like we make it more complicated needed by using pte_t
instead of pteval_t as argument. How about just using the pteval_t as
argument (and it's fine to call it with pmdval_t, pudval_t as well):

#define pgattr_set_is_safe(oldval, newval) \
	...

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-13 14:44       ` Catalin Marinas
@ 2016-10-13 14:48         ` Ard Biesheuvel
  2016-10-13 16:51           ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-13 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 October 2016 at 15:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Oct 13, 2016 at 01:25:53PM +0100, Ard Biesheuvel wrote:
>
> (fixing up alignment to make it readable)
>

I apologise on Gmail's behalf

>> """
>> /*
>>  * Returns whether updating a live page table entry is safe:
>>  * - if the old and new values are identical,
>>  * - if an invalid mapping is turned into a valid one (or vice versa),
>>  * - if the entry is a block or page mapping, and the old and new values
>>  *   only differ in the PXN/RDONLY/WRITE bits.
>>  *
>>  * NOTE: 'safe' does not imply that no TLB maintenance is required, it only
>>  *       means that no TLB conflicts should occur as a result of the update.
>>  */
>> #define __set_pgattr_is_safe(type, old, new, blocktype) \
>>       (type ## _val(old) == type ## _val(new) || \
>>        ((type ## _val(old) ^ type ## _val(new)) & PTE_VALID) != 0 || \
>>        (((type ## _val(old) & PTE_TYPE_MASK) == blocktype) && \
>>         (((type ## _val(old) ^ type ## _val(new)) & \
>>        ~(PTE_PXN | PTE_RDONLY | PTE_WRITE)) == 0)))
>>
>> static inline bool set_live_pte_is_safe(pte_t old, pte_t new)
>> {
>>       return __set_pgattr_is_safe(pte, old, new, PTE_TYPE_PAGE);
>> }
>>
>> static inline bool set_live_pmd_is_safe(pmd_t old, pmd_t new)
>> {
>>       return __set_pgattr_is_safe(pmd, old, new, PMD_TYPE_SECT);
>> }
>>
>> static inline bool set_live_pud_is_safe(pud_t old, pud_t new)
>> {
>>       return __set_pgattr_is_safe(pud, old, new, PUD_TYPE_SECT);
>> }
>
> The set_ prefix is slightly confusing as it suggests (to me) having a
> side effect. Maybe pgattr_set_is_safe()?
>
> But it looks like we make it more complicated needed by using pte_t
> instead of pteval_t as argument. How about just using the pteval_t as
> argument (and it's fine to call it with pmdval_t, pudval_t as well):
>
> #define pgattr_set_is_safe(oldval, newval) \
>         ...
>

Well, the only problem there is that the permission bit check should
only apply to level 3 page mappings (bit[1] == 1) and level 1/2 block
mappings (bit[1] == 0), so we would still need two versions

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2016-10-12 11:23 ` [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
@ 2016-10-13 16:28   ` Catalin Marinas
  2016-10-13 16:57     ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2016-10-13 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 12, 2016 at 12:23:43PM +0100, Ard Biesheuvel wrote:
> Now that we no longer allow live kernel PMDs to be split, it is safe to
> start using the contiguous bit for kernel mappings. So set the contiguous
> bit in the kernel page mappings for regions whose size and alignment are
> suitable for this.
> 
> This enables the following contiguous range sizes for the virtual mapping
> of the kernel image, and for the linear mapping:
> 
>           granule size |  cont PTE  |  cont PMD  |
>           -------------+------------+------------+
>                4 KB    |    64 KB   |   32 MB    |
>               16 KB    |     2 MB   |    1 GB*   |
>               64 KB    |     2 MB   |   16 GB*   |
> 
> * only when built for 3 or more levels of translation

I assume the limitation to have contiguous PMD only with 3 or move
levels is because of the way p*d folding was implemented in the kernel.
With nopmd, looping over pmds is done in __create_pgd_mapping() rather
than alloc_init_pmd().

A potential solution would be to replicate the contiguous pmd code to
the pud and pgd level, though we probably won't benefit from any
contiguous entries at higher level (when more than 2 levels).
Alternatively, with an #ifdef __PGTABLE_PMD_FOLDED, we could set the
PMD_CONT in prot in __create_pgd_mapping() directly (if the right
addr/phys alignment).

Anyway, it's probably not worth the effort given that 42-bit VA with 64K
pages is becoming a less likely configuration (36-bit VA with 16K pages
is even less likely, also depending on EXPERT).

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-13 14:48         ` Ard Biesheuvel
@ 2016-10-13 16:51           ` Catalin Marinas
  2016-10-13 16:58             ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2016-10-13 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 13, 2016 at 03:48:04PM +0100, Ard Biesheuvel wrote:
> On 13 October 2016 at 15:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Oct 13, 2016 at 01:25:53PM +0100, Ard Biesheuvel wrote:
> >
> > (fixing up alignment to make it readable)
> >
> 
> I apologise on Gmail's behalf
> 
> >> """
> >> /*
> >>  * Returns whether updating a live page table entry is safe:
> >>  * - if the old and new values are identical,
> >>  * - if an invalid mapping is turned into a valid one (or vice versa),
> >>  * - if the entry is a block or page mapping, and the old and new values
> >>  *   only differ in the PXN/RDONLY/WRITE bits.
> >>  *
> >>  * NOTE: 'safe' does not imply that no TLB maintenance is required, it only
> >>  *       means that no TLB conflicts should occur as a result of the update.
> >>  */
> >> #define __set_pgattr_is_safe(type, old, new, blocktype) \
> >>       (type ## _val(old) == type ## _val(new) || \
> >>        ((type ## _val(old) ^ type ## _val(new)) & PTE_VALID) != 0 || \
> >>        (((type ## _val(old) & PTE_TYPE_MASK) == blocktype) && \
> >>         (((type ## _val(old) ^ type ## _val(new)) & \
> >>        ~(PTE_PXN | PTE_RDONLY | PTE_WRITE)) == 0)))
> >>
> >> static inline bool set_live_pte_is_safe(pte_t old, pte_t new)
> >> {
> >>       return __set_pgattr_is_safe(pte, old, new, PTE_TYPE_PAGE);
> >> }
> >>
> >> static inline bool set_live_pmd_is_safe(pmd_t old, pmd_t new)
> >> {
> >>       return __set_pgattr_is_safe(pmd, old, new, PMD_TYPE_SECT);
> >> }
> >>
> >> static inline bool set_live_pud_is_safe(pud_t old, pud_t new)
> >> {
> >>       return __set_pgattr_is_safe(pud, old, new, PUD_TYPE_SECT);
> >> }
> >
> > The set_ prefix is slightly confusing as it suggests (to me) having a
> > side effect. Maybe pgattr_set_is_safe()?
> >
> > But it looks like we make it more complicated needed by using pte_t
> > instead of pteval_t as argument. How about just using the pteval_t as
> > argument (and it's fine to call it with pmdval_t, pudval_t as well):
> >
> > #define pgattr_set_is_safe(oldval, newval) \
> >         ...
> 
> Well, the only problem there is that the permission bit check should
> only apply to level 3 page mappings (bit[1] == 1) and level 1/2 block
> mappings (bit[1] == 0), so we would still need two versions

I was suggesting that you only replace the "... & ~modifiable_attr_mask"
check in your patch to avoid writing it three times. The macro that you
proposed above does more but it is also more unreadable.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2016-10-13 16:28   ` Catalin Marinas
@ 2016-10-13 16:57     ` Ard Biesheuvel
  2016-10-13 17:27       ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-13 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 October 2016 at 17:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Oct 12, 2016 at 12:23:43PM +0100, Ard Biesheuvel wrote:
>> Now that we no longer allow live kernel PMDs to be split, it is safe to
>> start using the contiguous bit for kernel mappings. So set the contiguous
>> bit in the kernel page mappings for regions whose size and alignment are
>> suitable for this.
>>
>> This enables the following contiguous range sizes for the virtual mapping
>> of the kernel image, and for the linear mapping:
>>
>>           granule size |  cont PTE  |  cont PMD  |
>>           -------------+------------+------------+
>>                4 KB    |    64 KB   |   32 MB    |
>>               16 KB    |     2 MB   |    1 GB*   |
>>               64 KB    |     2 MB   |   16 GB*   |
>>
>> * only when built for 3 or more levels of translation
>
> I assume the limitation to have contiguous PMD only with 3 or move
> levels is because of the way p*d folding was implemented in the kernel.
> With nopmd, looping over pmds is done in __create_pgd_mapping() rather
> than alloc_init_pmd().
>
> A potential solution would be to replicate the contiguous pmd code to
> the pud and pgd level, though we probably won't benefit from any
> contiguous entries at higher level (when more than 2 levels).
> Alternatively, with an #ifdef __PGTABLE_PMD_FOLDED, we could set the
> PMD_CONT in prot in __create_pgd_mapping() directly (if the right
> addr/phys alignment).
>

Indeed. See the next patch :-)

> Anyway, it's probably not worth the effort given that 42-bit VA with 64K
> pages is becoming a less likely configuration (36-bit VA with 16K pages
> is even less likely, also depending on EXPERT).
>

This is the reason I put it in a separate patch: this one contains the
most useful combinations, and the next patch adds the missing ones,
but clutters up the code significantly. I'm perfectly happy to drop 4
and 5 if you don't think it is worth the trouble.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live kernel mappings
  2016-10-13 16:51           ` Catalin Marinas
@ 2016-10-13 16:58             ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-13 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 October 2016 at 17:51, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Oct 13, 2016 at 03:48:04PM +0100, Ard Biesheuvel wrote:
>> On 13 October 2016 at 15:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Thu, Oct 13, 2016 at 01:25:53PM +0100, Ard Biesheuvel wrote:
>> >
>> > (fixing up alignment to make it readable)
>> >
>>
>> I apologise on Gmail's behalf
>>
>> >> """
>> >> /*
>> >>  * Returns whether updating a live page table entry is safe:
>> >>  * - if the old and new values are identical,
>> >>  * - if an invalid mapping is turned into a valid one (or vice versa),
>> >>  * - if the entry is a block or page mapping, and the old and new values
>> >>  *   only differ in the PXN/RDONLY/WRITE bits.
>> >>  *
>> >>  * NOTE: 'safe' does not imply that no TLB maintenance is required, it only
>> >>  *       means that no TLB conflicts should occur as a result of the update.
>> >>  */
>> >> #define __set_pgattr_is_safe(type, old, new, blocktype) \
>> >>       (type ## _val(old) == type ## _val(new) || \
>> >>        ((type ## _val(old) ^ type ## _val(new)) & PTE_VALID) != 0 || \
>> >>        (((type ## _val(old) & PTE_TYPE_MASK) == blocktype) && \
>> >>         (((type ## _val(old) ^ type ## _val(new)) & \
>> >>        ~(PTE_PXN | PTE_RDONLY | PTE_WRITE)) == 0)))
>> >>
>> >> static inline bool set_live_pte_is_safe(pte_t old, pte_t new)
>> >> {
>> >>       return __set_pgattr_is_safe(pte, old, new, PTE_TYPE_PAGE);
>> >> }
>> >>
>> >> static inline bool set_live_pmd_is_safe(pmd_t old, pmd_t new)
>> >> {
>> >>       return __set_pgattr_is_safe(pmd, old, new, PMD_TYPE_SECT);
>> >> }
>> >>
>> >> static inline bool set_live_pud_is_safe(pud_t old, pud_t new)
>> >> {
>> >>       return __set_pgattr_is_safe(pud, old, new, PUD_TYPE_SECT);
>> >> }
>> >
>> > The set_ prefix is slightly confusing as it suggests (to me) having a
>> > side effect. Maybe pgattr_set_is_safe()?
>> >
>> > But it looks like we make it more complicated needed by using pte_t
>> > instead of pteval_t as argument. How about just using the pteval_t as
>> > argument (and it's fine to call it with pmdval_t, pudval_t as well):
>> >
>> > #define pgattr_set_is_safe(oldval, newval) \
>> >         ...
>>
>> Well, the only problem there is that the permission bit check should
>> only apply to level 3 page mappings (bit[1] == 1) and level 1/2 block
>> mappings (bit[1] == 0), so we would still need two versions
>
> I was suggesting that you only replace the "... & ~modifiable_attr_mask"
> check in your patch to avoid writing it three times. The macro that you
> proposed above does more but it is also more unreadable.
>

Ah ok, fair enough

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2016-10-13 16:57     ` Ard Biesheuvel
@ 2016-10-13 17:27       ` Catalin Marinas
  0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2016-10-13 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 13, 2016 at 05:57:33PM +0100, Ard Biesheuvel wrote:
> On 13 October 2016 at 17:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Wed, Oct 12, 2016 at 12:23:43PM +0100, Ard Biesheuvel wrote:
> >> Now that we no longer allow live kernel PMDs to be split, it is safe to
> >> start using the contiguous bit for kernel mappings. So set the contiguous
> >> bit in the kernel page mappings for regions whose size and alignment are
> >> suitable for this.
> >>
> >> This enables the following contiguous range sizes for the virtual mapping
> >> of the kernel image, and for the linear mapping:
> >>
> >>           granule size |  cont PTE  |  cont PMD  |
> >>           -------------+------------+------------+
> >>                4 KB    |    64 KB   |   32 MB    |
> >>               16 KB    |     2 MB   |    1 GB*   |
> >>               64 KB    |     2 MB   |   16 GB*   |
> >>
> >> * only when built for 3 or more levels of translation
> >
> > I assume the limitation to have contiguous PMD only with 3 or move
> > levels is because of the way p*d folding was implemented in the kernel.
> > With nopmd, looping over pmds is done in __create_pgd_mapping() rather
> > than alloc_init_pmd().
> >
> > A potential solution would be to replicate the contiguous pmd code to
> > the pud and pgd level, though we probably won't benefit from any
> > contiguous entries at higher level (when more than 2 levels).
> > Alternatively, with an #ifdef __PGTABLE_PMD_FOLDED, we could set the
> > PMD_CONT in prot in __create_pgd_mapping() directly (if the right
> > addr/phys alignment).
> 
> Indeed. See the next patch :-)

I got there eventually ;).

> > Anyway, it's probably not worth the effort given that 42-bit VA with 64K
> > pages is becoming a less likely configuration (36-bit VA with 16K pages
> > is even less likely, also depending on EXPERT).
> 
> This is the reason I put it in a separate patch: this one contains the
> most useful combinations, and the next patch adds the missing ones,
> but clutters up the code significantly. I'm perfectly happy to drop 4
> and 5 if you don't think it is worth the trouble.

I'll have a look at patch 4 first.

Both 64KB contiguous pmd and 4K contiguous pud give us a 16GB range
which (AFAIK) is less likely to be optimised in hardware.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes
  2016-10-12 11:23 ` [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes Ard Biesheuvel
@ 2016-10-14 10:28   ` Catalin Marinas
  2016-10-14 17:51     ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2016-10-14 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 12, 2016 at 12:23:44PM +0100, Ard Biesheuvel wrote:
> Extend the basic support for kernel mappings using contiguous regions
> by adding support for contiguous PUDs (4k granule only), either as a
> discrete level or folded into the PGDs. In the same way, handle folded
> PMDs so that contiguous PMDs (for 16k and 64k granule kernels) will
> work as expected for 2 levels of translation as well.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/pgtable-hwdef.h |  6 +++
>  arch/arm64/mm/mmu.c                    | 40 +++++++++++++++++++-
>  2 files changed, 44 insertions(+), 2 deletions(-)

After looking at this patch, I concluded it's not worth hassle as no
hardware I'm aware of currently would benefit from it.  We can revisit
it in the future if we hear otherwise.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes
  2016-10-14 10:28   ` Catalin Marinas
@ 2016-10-14 17:51     ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2016-10-14 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 October 2016 at 11:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Oct 12, 2016 at 12:23:44PM +0100, Ard Biesheuvel wrote:
>> Extend the basic support for kernel mappings using contiguous regions
>> by adding support for contiguous PUDs (4k granule only), either as a
>> discrete level or folded into the PGDs. In the same way, handle folded
>> PMDs so that contiguous PMDs (for 16k and 64k granule kernels) will
>> work as expected for 2 levels of translation as well.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/pgtable-hwdef.h |  6 +++
>>  arch/arm64/mm/mmu.c                    | 40 +++++++++++++++++++-
>>  2 files changed, 44 insertions(+), 2 deletions(-)
>
> After looking at this patch, I concluded it's not worth hassle as no
> hardware I'm aware of currently would benefit from it.  We can revisit
> it in the future if we hear otherwise.
>

That is what I expected, which is why this is a separate patch. I will
drop this patch and the next one from the next version of the series.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-10-14 17:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-12 11:23 [PATCH v3 0/5] arm64/mm: use the contiguous attribute for kernel mappings Ard Biesheuvel
2016-10-12 11:23 ` [PATCH v3 1/5] arm64: mm: BUG on unsupported manipulations of live " Ard Biesheuvel
2016-10-12 15:04   ` Catalin Marinas
2016-10-13 12:25     ` Ard Biesheuvel
2016-10-13 14:44       ` Catalin Marinas
2016-10-13 14:48         ` Ard Biesheuvel
2016-10-13 16:51           ` Catalin Marinas
2016-10-13 16:58             ` Ard Biesheuvel
2016-10-12 11:23 ` [PATCH v3 2/5] arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only' Ard Biesheuvel
2016-10-12 15:07   ` Mark Rutland
2016-10-12 11:23 ` [PATCH v3 3/5] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
2016-10-13 16:28   ` Catalin Marinas
2016-10-13 16:57     ` Ard Biesheuvel
2016-10-13 17:27       ` Catalin Marinas
2016-10-12 11:23 ` [PATCH v3 4/5] arm64: mm: support additional contiguous kernel mapping region sizes Ard Biesheuvel
2016-10-14 10:28   ` Catalin Marinas
2016-10-14 17:51     ` Ard Biesheuvel
2016-10-12 11:23 ` [PATCH v3 5/5] arm64: mm: round memstart_addr to contiguous PUD/PMD size Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).