[PATCH 0/5] arm64: avoid block entries that we need to split later

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] arm64: avoid block entries that we need to split later
@ 2016-06-29 12:51 Ard Biesheuvel
  2016-06-29 12:51 ` [PATCH 1/5] arm64: mm: add param to force create_pgd_mapping() to use page mappings Ard Biesheuvel
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

This is a followup to Catalin's series 'arm64: Avoid overlapping EFI regions'
whose aim was to get rid of the split_pmd() and split_pud() routines in
mm/mmu.c, which may violate the architecture when used in the wrong way.

This series updates the MMU mapping routines invoked by the code that sets
up the UEFI runtime services environment so that regions that may be subject
to splitting are always mapped down to pages. On the one hand, these are
regions that are covered by the Memory Attributes table as well as the UEFI
memory map, in which case we need to map down to pages in the first pass to
ensure that the more granular permission attributes do not require splitting
block entries. On the other hand, any region that is not aligned to the OS's
page size is mapped down to pages as well, so that the page such a region may
share with the next region is always mapped using pages, and no splitting is
required for mapping the second region.

Finally, remove the split_pmd() and split_pud() routines. Patch #5 is a cleanup
patch,

[0] http://thread.gmane.org/gmane.linux.kernel.efi/8649

Ard Biesheuvel (4):
  arm64: mm: add param to force create_pgd_mapping() to use page
    mappings
  arm64: efi: always map runtime services code and data regions down to
    pages
  arm64: efi: avoid block mappings for unaligned UEFI memory regions
  arm64: mm: fold init_pgd() into __create_pgd_mapping()

Catalin Marinas (1):
  arm64: mm: Remove split_p*d() functions

 arch/arm64/include/asm/efi.h |   3 +-
 arch/arm64/include/asm/mmu.h |   2 +-
 arch/arm64/kernel/efi.c      |  50 +++++++-
 arch/arm64/mm/mmu.c          | 126 +++++---------------
 4 files changed, 82 insertions(+), 99 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] arm64: mm: add param to force create_pgd_mapping() to use page mappings
  2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
@ 2016-06-29 12:51 ` Ard Biesheuvel
  2016-06-29 12:51 ` [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages Ard Biesheuvel
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

Add a bool parameter 'allow_block_mappings' to create_pgd_mapping() and
the various helper functions that it descends into, to give the caller
control over whether block entries may be used to create the mapping.

The UEFI runtime mapping routines will use this to avoid creating block
entries that would need to split up into page entries when applying the
permissions listed in the Memory Attributes firmware table.

This also replaces the block_mappings_allowed() helper function that was
added for DEBUG_PAGEALLOC functionality, but the resulting code is
functionally equivalent (given that debug_page_alloc does not operate on
EFI page table entries anyway)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu.h |  2 +-
 arch/arm64/kernel/efi.c      |  2 +-
 arch/arm64/mm/mmu.c          | 67 ++++++++------------
 3 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 97b1d8f26b9c..8d9fce037b2f 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -34,7 +34,7 @@ extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
 extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot);
+			       pgprot_t prot, bool allow_block_mappings);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 
 #endif
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 78f52488f9ff..981604948521 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -65,7 +65,7 @@ int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
 
 	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,
 			   md->num_pages << EFI_PAGE_SHIFT,
-			   __pgprot(prot_val | PTE_NG));
+			   __pgprot(prot_val | PTE_NG), true);
 	return 0;
 }
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0f85a46c3e18..a86d1acb3a7b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -156,29 +156,10 @@ static void split_pud(pud_t *old_pud, pmd_t *pmd)
 	} while (pmd++, i++, i < PTRS_PER_PMD);
 }
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-static bool block_mappings_allowed(phys_addr_t (*pgtable_alloc)(void))
-{
-
-	/*
-	 * If debug_page_alloc is enabled we must map the linear map
-	 * using pages. However, other mappings created by
-	 * create_mapping_noalloc must use sections in some cases. Allow
-	 * sections to be used in those cases, where no pgtable_alloc
-	 * function is provided.
-	 */
-	return !pgtable_alloc || !debug_pagealloc_enabled();
-}
-#else
-static bool block_mappings_allowed(phys_addr_t (*pgtable_alloc)(void))
-{
-	return true;
-}
-#endif
-
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool allow_block_mappings)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -209,7 +190,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 		next = pmd_addr_end(addr, end);
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
-		      block_mappings_allowed(pgtable_alloc)) {
+		      allow_block_mappings) {
 			pmd_t old_pmd =*pmd;
 			pmd_set_huge(pmd, phys, prot);
 			/*
@@ -248,7 +229,8 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 
 static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool allow_block_mappings)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -268,8 +250,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
-		if (use_1G_block(addr, next, phys) &&
-		    block_mappings_allowed(pgtable_alloc)) {
+		if (use_1G_block(addr, next, phys) && allow_block_mappings) {
 			pud_t old_pud = *pud;
 			pud_set_huge(pud, phys, prot);
 
@@ -290,7 +271,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 			}
 		} else {
 			alloc_init_pmd(pud, addr, next, phys, prot,
-				       pgtable_alloc);
+				       pgtable_alloc, allow_block_mappings);
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
@@ -304,7 +285,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
  */
 static void init_pgd(pgd_t *pgd, phys_addr_t phys, unsigned long virt,
 				    phys_addr_t size, pgprot_t prot,
-				    phys_addr_t (*pgtable_alloc)(void))
+				    phys_addr_t (*pgtable_alloc)(void),
+				    bool allow_block_mappings)
 {
 	unsigned long addr, length, end, next;
 
@@ -322,7 +304,8 @@ static void init_pgd(pgd_t *pgd, phys_addr_t phys, unsigned long virt,
 	end = addr + length;
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc);
+		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
+			       allow_block_mappings);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -340,9 +323,11 @@ static phys_addr_t late_pgtable_alloc(void)
 static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
-				 phys_addr_t (*alloc)(void))
+				 phys_addr_t (*alloc)(void),
+				 bool allow_block_mappings)
 {
-	init_pgd(pgd_offset_raw(pgdir, virt), phys, virt, size, prot, alloc);
+	init_pgd(pgd_offset_raw(pgdir, virt), phys, virt, size, prot, alloc,
+		 allow_block_mappings);
 }
 
 /*
@@ -358,16 +343,15 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     NULL);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, true);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
-			       pgprot_t prot)
+			       pgprot_t prot, bool allow_block_mappings)
 {
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-			     late_pgtable_alloc);
+			     late_pgtable_alloc, allow_block_mappings);
 }
 
 static void create_mapping_late(phys_addr_t phys, unsigned long virt,
@@ -380,7 +364,7 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     late_pgtable_alloc);
+			     late_pgtable_alloc, !debug_pagealloc_enabled());
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
@@ -397,7 +381,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	if (end < kernel_start || start >= kernel_end) {
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
-				     early_pgtable_alloc);
+				     early_pgtable_alloc,
+				     !debug_pagealloc_enabled());
 		return;
 	}
 
@@ -409,12 +394,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 		__create_pgd_mapping(pgd, start,
 				     __phys_to_virt(start),
 				     kernel_start - start, PAGE_KERNEL,
-				     early_pgtable_alloc);
+				     early_pgtable_alloc,
+				     !debug_pagealloc_enabled());
 	if (kernel_end < end)
 		__create_pgd_mapping(pgd, kernel_end,
 				     __phys_to_virt(kernel_end),
 				     end - kernel_end, PAGE_KERNEL,
-				     early_pgtable_alloc);
+				     early_pgtable_alloc,
+				     !debug_pagealloc_enabled());
 
 	/*
 	 * Map the linear alias of the [_text, _etext) interval as
@@ -424,7 +411,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
 			     kernel_end - kernel_start, PAGE_KERNEL_RO,
-			     early_pgtable_alloc);
+			     early_pgtable_alloc, !debug_pagealloc_enabled());
 }
 
 static void __init map_mem(pgd_t *pgd)
@@ -481,7 +468,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(size));
 
 	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc);
+			     early_pgtable_alloc, !debug_pagealloc_enabled());
 
 	vma->addr	= va_start;
 	vma->phys_addr	= pa_start;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages
  2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
  2016-06-29 12:51 ` [PATCH 1/5] arm64: mm: add param to force create_pgd_mapping() to use page mappings Ard Biesheuvel
@ 2016-06-29 12:51 ` Ard Biesheuvel
  2016-07-22 14:30   ` Sudeep Holla
  2016-06-29 12:51 ` [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions Ard Biesheuvel
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

To avoid triggering diagnostics in the MMU code that are finicky about
splitting block mappings into more granular mappings, ensure that regions
that are likely to appear in the Memory Attributes table as well as the
UEFI memory map are always mapped down to pages. This way, we can use
apply_to_page_range() instead of create_pgd_mapping() for the second pass,
which cannot split or merge block entries, and operates strictly on PTEs.

Note that this aligns the arm64 Memory Attributes table handling code with
the ARM code, which already uses apply_to_page_range() to set the strict
permissions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/efi.h |  3 +-
 arch/arm64/kernel/efi.c      | 36 +++++++++++++++++++-
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 622db3c6474e..8b13476cdf96 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -14,8 +14,7 @@ extern void efi_init(void);
 #endif
 
 int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md);
-
-#define efi_set_mapping_permissions	efi_create_mapping
+int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md);
 
 #define arch_efi_call_virt_setup()					\
 ({									\
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 981604948521..4aef89f37049 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -62,13 +62,47 @@ struct screen_info screen_info __section(.data);
 int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
 {
 	pteval_t prot_val = create_mapping_protection(md);
+	bool allow_block_mappings = (md->type != EFI_RUNTIME_SERVICES_CODE &&
+				     md->type != EFI_RUNTIME_SERVICES_DATA);
 
 	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,
 			   md->num_pages << EFI_PAGE_SHIFT,
-			   __pgprot(prot_val | PTE_NG), true);
+			   __pgprot(prot_val | PTE_NG), allow_block_mappings);
 	return 0;
 }
 
+static int __init set_permissions(pte_t *ptep, pgtable_t token,
+				  unsigned long addr, void *data)
+{
+	efi_memory_desc_t *md = data;
+	pte_t pte = *ptep;
+
+	if (md->attribute & EFI_MEMORY_RO)
+		pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
+	if (md->attribute & EFI_MEMORY_XP)
+		pte = set_pte_bit(pte, __pgprot(PTE_PXN));
+	set_pte(ptep, pte);
+	return 0;
+}
+
+int __init efi_set_mapping_permissions(struct mm_struct *mm,
+				       efi_memory_desc_t *md)
+{
+	BUG_ON(md->type != EFI_RUNTIME_SERVICES_CODE &&
+	       md->type != EFI_RUNTIME_SERVICES_DATA);
+
+	/*
+	 * Calling apply_to_page_range() is only safe on regions that are
+	 * guaranteed to be mapped down to pages. Since we are only called
+	 * for regions that have been mapped using efi_create_mapping() above
+	 * (and this is checked by the generic Memory Attributes table parsing
+	 * routines), there is no need to check that again here.
+	 */
+	return apply_to_page_range(mm, md->virt_addr,
+				   md->num_pages << EFI_PAGE_SHIFT,
+				   set_permissions, md);
+}
+
 static int __init arm64_dmi_init(void)
 {
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
  2016-06-29 12:51 ` [PATCH 1/5] arm64: mm: add param to force create_pgd_mapping() to use page mappings Ard Biesheuvel
  2016-06-29 12:51 ` [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages Ard Biesheuvel
@ 2016-06-29 12:51 ` Ard Biesheuvel
  2016-06-29 16:45   ` Catalin Marinas
  2016-06-29 12:51 ` [PATCH 4/5] arm64: mm: Remove split_p*d() functions Ard Biesheuvel
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

When running the OS with a page size > 4 KB, we need to round up mappings
for regions that are not aligned to the OS's page size. We already avoid
block mappings for EfiRuntimeServicesCode/Data regions for other reasons,
but in the unlikely event that other unaliged regions exists that have the
EFI_MEMORY_RUNTIME attribute set, ensure that unaligned regions are always
mapped down to pages. This way, the overlapping page is guaranteed not to
be covered by a block mapping that needs to be split.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/efi.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 4aef89f37049..ba9bee389fd5 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -65,6 +65,20 @@ int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
 	bool allow_block_mappings = (md->type != EFI_RUNTIME_SERVICES_CODE &&
 				     md->type != EFI_RUNTIME_SERVICES_DATA);
 
+	if (!PAGE_ALIGNED(md->phys_addr) ||
+	    !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
+		/*
+		 * If the end address of this region is not aligned to page
+		 * size, the mapping is rounded up, and may end up sharing a
+		 * page frame with the next UEFI memory region. If we create
+		 * a block entry now, we may need to split it again when mapping
+		 * the next region, and support for that is going to be removed
+		 * from the MMU routines. So avoid block mappings altogether in
+		 * that case.
+		 */
+		allow_block_mappings = false;
+	}
+
 	create_pgd_mapping(mm, md->phys_addr, md->virt_addr,
 			   md->num_pages << EFI_PAGE_SHIFT,
 			   __pgprot(prot_val | PTE_NG), allow_block_mappings);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] arm64: mm: Remove split_p*d() functions
  2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2016-06-29 12:51 ` [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions Ard Biesheuvel
@ 2016-06-29 12:51 ` Ard Biesheuvel
  2016-06-29 12:51 ` [PATCH 5/5] arm64: mm: fold init_pgd() into __create_pgd_mapping() Ard Biesheuvel
  2016-06-30 16:13 ` [PATCH 0/5] arm64: avoid block entries that we need to split later Catalin Marinas
  5 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

From: Catalin Marinas <catalin.marinas@arm.com>

Since the efi_create_mapping() no longer generates block mappings
and being the last user of the split_p*d code, remove these functions
and the corresponding TLBI.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ardb: replace 'overlapping regions' with 'block mappings' in commit log]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 47 ++------------------
 1 file changed, 4 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a86d1acb3a7b..f233c885042d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -97,24 +97,6 @@ static phys_addr_t __init early_pgtable_alloc(void)
 	return phys;
 }
 
-/*
- * remap a PMD into pages
- */
-static void split_pmd(pmd_t *pmd, pte_t *pte)
-{
-	unsigned long pfn = pmd_pfn(*pmd);
-	int i = 0;
-
-	do {
-		/*
-		 * Need to have the least restrictive permissions available
-		 * permissions will be fixed up later
-		 */
-		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
-		pfn++;
-	} while (pte++, i++, i < PTRS_PER_PTE);
-}
-
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
@@ -122,15 +104,13 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 {
 	pte_t *pte;
 
-	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
+	BUG_ON(pmd_sect(*pmd));
+	if (pmd_none(*pmd)) {
 		phys_addr_t pte_phys;
 		BUG_ON(!pgtable_alloc);
 		pte_phys = pgtable_alloc();
 		pte = pte_set_fixmap(pte_phys);
-		if (pmd_sect(*pmd))
-			split_pmd(pmd, pte);
 		__pmd_populate(pmd, pte_phys, PMD_TYPE_TABLE);
-		flush_tlb_all();
 		pte_clear_fixmap();
 	}
 	BUG_ON(pmd_bad(*pmd));
@@ -144,18 +124,6 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	pte_clear_fixmap();
 }
 
-static void split_pud(pud_t *old_pud, pmd_t *pmd)
-{
-	unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
-	pgprot_t prot = __pgprot(pud_val(*old_pud) ^ addr);
-	int i = 0;
-
-	do {
-		set_pmd(pmd, __pmd(addr | pgprot_val(prot)));
-		addr += PMD_SIZE;
-	} while (pmd++, i++, i < PTRS_PER_PMD);
-}
-
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
@@ -167,20 +135,13 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 	/*
 	 * Check for initial section mappings in the pgd/pud and remove them.
 	 */
-	if (pud_none(*pud) || pud_sect(*pud)) {
+	BUG_ON(pud_sect(*pud));
+	if (pud_none(*pud)) {
 		phys_addr_t pmd_phys;
 		BUG_ON(!pgtable_alloc);
 		pmd_phys = pgtable_alloc();
 		pmd = pmd_set_fixmap(pmd_phys);
-		if (pud_sect(*pud)) {
-			/*
-			 * need to have the 1G of mappings continue to be
-			 * present
-			 */
-			split_pud(pud, pmd);
-		}
 		__pud_populate(pud, pmd_phys, PUD_TYPE_TABLE);
-		flush_tlb_all();
 		pmd_clear_fixmap();
 	}
 	BUG_ON(pud_bad(*pud));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] arm64: mm: fold init_pgd() into __create_pgd_mapping()
  2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2016-06-29 12:51 ` [PATCH 4/5] arm64: mm: Remove split_p*d() functions Ard Biesheuvel
@ 2016-06-29 12:51 ` Ard Biesheuvel
  2016-06-30 16:13 ` [PATCH 0/5] arm64: avoid block entries that we need to split later Catalin Marinas
  5 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

The routine __create_pgd_mapping() does nothing except calling init_pgd(),
which has no other callers. So fold the latter into the former. Also, drop
a comment that has gone stale.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 24 +++++---------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f233c885042d..8cf4da36909a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -240,16 +240,14 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	pud_clear_fixmap();
 }
 
-/*
- * Create the page directory entries and any necessary page tables for the
- * mapping specified by 'md'.
- */
-static void init_pgd(pgd_t *pgd, phys_addr_t phys, unsigned long virt,
-				    phys_addr_t size, pgprot_t prot,
-				    phys_addr_t (*pgtable_alloc)(void),
-				    bool allow_block_mappings)
+static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
+				 unsigned long virt, phys_addr_t size,
+				 pgprot_t prot,
+				 phys_addr_t (*pgtable_alloc)(void),
+				 bool allow_block_mappings)
 {
 	unsigned long addr, length, end, next;
+	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
 
 	/*
 	 * If the virtual and physical address don't have the same offset
@@ -281,16 +279,6 @@ static phys_addr_t late_pgtable_alloc(void)
 	return __pa(ptr);
 }
 
-static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
-				 unsigned long virt, phys_addr_t size,
-				 pgprot_t prot,
-				 phys_addr_t (*alloc)(void),
-				 bool allow_block_mappings)
-{
-	init_pgd(pgd_offset_raw(pgdir, virt), phys, virt, size, prot, alloc,
-		 allow_block_mappings);
-}
-
 /*
  * This function can only be used to modify existing table entries,
  * without allocating new levels of table. Note that this permits the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 12:51 ` [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions Ard Biesheuvel
@ 2016-06-29 16:45   ` Catalin Marinas
  2016-06-29 16:50     ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2016-06-29 16:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
> +	if (!PAGE_ALIGNED(md->phys_addr) ||
> +	    !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
> +		/*
> +		 * If the end address of this region is not aligned to page
> +		 * size, the mapping is rounded up, and may end up sharing a
> +		 * page frame with the next UEFI memory region. If we create
> +		 * a block entry now, we may need to split it again when mapping
> +		 * the next region, and support for that is going to be removed
> +		 * from the MMU routines. So avoid block mappings altogether in
> +		 * that case.
> +		 */
> +		allow_block_mappings = false;
> +	}

How common is it for large areas to have unaligned start/end? I wonder
whether it's worth implementing my approach to look ahead and explicitly
check the overlap with the next section instead of disabling block
mappings altogether for this region.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 16:45   ` Catalin Marinas
@ 2016-06-29 16:50     ` Ard Biesheuvel
  2016-06-29 16:53       ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 16:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
>> +             /*
>> +              * If the end address of this region is not aligned to page
>> +              * size, the mapping is rounded up, and may end up sharing a
>> +              * page frame with the next UEFI memory region. If we create
>> +              * a block entry now, we may need to split it again when mapping
>> +              * the next region, and support for that is going to be removed
>> +              * from the MMU routines. So avoid block mappings altogether in
>> +              * that case.
>> +              */
>> +             allow_block_mappings = false;
>> +     }
>
> How common is it for large areas to have unaligned start/end? I wonder
> whether it's worth implementing my approach to look ahead and explicitly
> check the overlap with the next section instead of disabling block
> mappings altogether for this region.
>

Very uncommon. Typically, only MMIO regions that represent NOR flash
are larger than a couple of pages. Taken from QEMU:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 16:50     ` Ard Biesheuvel
@ 2016-06-29 16:53       ` Ard Biesheuvel
  2016-06-29 17:00         ` Leif Lindholm
  0 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 June 2016 at 18:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
>>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
>>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
>>> +             /*
>>> +              * If the end address of this region is not aligned to page
>>> +              * size, the mapping is rounded up, and may end up sharing a
>>> +              * page frame with the next UEFI memory region. If we create
>>> +              * a block entry now, we may need to split it again when mapping
>>> +              * the next region, and support for that is going to be removed
>>> +              * from the MMU routines. So avoid block mappings altogether in
>>> +              * that case.
>>> +              */
>>> +             allow_block_mappings = false;
>>> +     }
>>
>> How common is it for large areas to have unaligned start/end? I wonder
>> whether it's worth implementing my approach to look ahead and explicitly
>> check the overlap with the next section instead of disabling block
>> mappings altogether for this region.
>>
>
> Very uncommon. Typically, only MMIO regions that represent NOR flash
> are larger than a couple of pages. Taken from QEMU:

  RT_Code   :            640 Pages (2,621,440 Bytes)
  RT_Data   :            880 Pages (3,604,480 Bytes)

so all RT_Code regions *combined* are 2.5 MB in total, and all RT_Data
regions 3.5 MB. Ideally, they are grouped together, but in reality,
there are always a couple of regions of each type, so there is little
to gain here from using block mappings

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 16:53       ` Ard Biesheuvel
@ 2016-06-29 17:00         ` Leif Lindholm
  2016-06-29 17:04           ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Leif Lindholm @ 2016-06-29 17:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 06:53:18PM +0200, Ard Biesheuvel wrote:
> On 29 June 2016 at 18:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
> >>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
> >>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
> >>> +             /*
> >>> +              * If the end address of this region is not aligned to page
> >>> +              * size, the mapping is rounded up, and may end up sharing a
> >>> +              * page frame with the next UEFI memory region. If we create
> >>> +              * a block entry now, we may need to split it again when mapping
> >>> +              * the next region, and support for that is going to be removed
> >>> +              * from the MMU routines. So avoid block mappings altogether in
> >>> +              * that case.
> >>> +              */
> >>> +             allow_block_mappings = false;
> >>> +     }
> >>
> >> How common is it for large areas to have unaligned start/end? I wonder
> >> whether it's worth implementing my approach to look ahead and explicitly
> >> check the overlap with the next section instead of disabling block
> >> mappings altogether for this region.
> >>
> >
> > Very uncommon. Typically, only MMIO regions that represent NOR flash
> > are larger than a couple of pages. Taken from QEMU:
> 
>   RT_Code   :            640 Pages (2,621,440 Bytes)
>   RT_Data   :            880 Pages (3,604,480 Bytes)
> 
> so all RT_Code regions *combined* are 2.5 MB in total, and all RT_Data
> regions 3.5 MB. Ideally, they are grouped together, but in reality,
> there are always a couple of regions of each type, so there is little
> to gain here from using block mappings

Is this representative for real platforms?

What about efifb and reserved regions?

My (x86) Lenovo workstation has one 64MB and one 16MB Runtime/Uncached
MMIO region. As well as a 3MB and a 4MB RT_Data one.

/
    Leif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 17:00         ` Leif Lindholm
@ 2016-06-29 17:04           ` Ard Biesheuvel
  2016-06-29 17:40             ` Ard Biesheuvel
  2016-06-29 17:54             ` Leif Lindholm
  0 siblings, 2 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 June 2016 at 19:00, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Wed, Jun 29, 2016 at 06:53:18PM +0200, Ard Biesheuvel wrote:
>> On 29 June 2016 at 18:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
>> >>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
>> >>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
>> >>> +             /*
>> >>> +              * If the end address of this region is not aligned to page
>> >>> +              * size, the mapping is rounded up, and may end up sharing a
>> >>> +              * page frame with the next UEFI memory region. If we create
>> >>> +              * a block entry now, we may need to split it again when mapping
>> >>> +              * the next region, and support for that is going to be removed
>> >>> +              * from the MMU routines. So avoid block mappings altogether in
>> >>> +              * that case.
>> >>> +              */
>> >>> +             allow_block_mappings = false;
>> >>> +     }
>> >>
>> >> How common is it for large areas to have unaligned start/end? I wonder
>> >> whether it's worth implementing my approach to look ahead and explicitly
>> >> check the overlap with the next section instead of disabling block
>> >> mappings altogether for this region.
>> >>
>> >
>> > Very uncommon. Typically, only MMIO regions that represent NOR flash
>> > are larger than a couple of pages. Taken from QEMU:
>>
>>   RT_Code   :            640 Pages (2,621,440 Bytes)
>>   RT_Data   :            880 Pages (3,604,480 Bytes)
>>
>> so all RT_Code regions *combined* are 2.5 MB in total, and all RT_Data
>> regions 3.5 MB. Ideally, they are grouped together, but in reality,
>> there are always a couple of regions of each type, so there is little
>> to gain here from using block mappings
>
> Is this representative for real platforms?
>

I think it is a reasonable ballpark figure

> What about efifb and reserved regions?
>

Those are not tagged as EFI_MEMORY_RUNTIME so they are not covered by
the UEFI runtime mappings, and not relevant to this discussion.

> My (x86) Lenovo workstation has one 64MB and one 16MB Runtime/Uncached
> MMIO region. As well as a 3MB and a 4MB RT_Data one.
>

Are those MMIO regions naturally aligned? And how about the RT_Data ones?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 17:04           ` Ard Biesheuvel
@ 2016-06-29 17:40             ` Ard Biesheuvel
  2016-06-29 17:54             ` Leif Lindholm
  1 sibling, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 June 2016 at 19:04, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 29 June 2016 at 19:00, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>> On Wed, Jun 29, 2016 at 06:53:18PM +0200, Ard Biesheuvel wrote:
>>> On 29 June 2016 at 18:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> > On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> >> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
>>> >>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
>>> >>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
>>> >>> +             /*
>>> >>> +              * If the end address of this region is not aligned to page
>>> >>> +              * size, the mapping is rounded up, and may end up sharing a
>>> >>> +              * page frame with the next UEFI memory region. If we create
>>> >>> +              * a block entry now, we may need to split it again when mapping
>>> >>> +              * the next region, and support for that is going to be removed
>>> >>> +              * from the MMU routines. So avoid block mappings altogether in
>>> >>> +              * that case.
>>> >>> +              */
>>> >>> +             allow_block_mappings = false;
>>> >>> +     }
>>> >>
>>> >> How common is it for large areas to have unaligned start/end? I wonder
>>> >> whether it's worth implementing my approach to look ahead and explicitly
>>> >> check the overlap with the next section instead of disabling block
>>> >> mappings altogether for this region.
>>> >>
>>> >
>>> > Very uncommon. Typically, only MMIO regions that represent NOR flash
>>> > are larger than a couple of pages. Taken from QEMU:
>>>
>>>   RT_Code   :            640 Pages (2,621,440 Bytes)
>>>   RT_Data   :            880 Pages (3,604,480 Bytes)
>>>
>>> so all RT_Code regions *combined* are 2.5 MB in total, and all RT_Data
>>> regions 3.5 MB. Ideally, they are grouped together, but in reality,
>>> there are always a couple of regions of each type, so there is little
>>> to gain here from using block mappings
>>
>> Is this representative for real platforms?
>>
>
> I think it is a reasonable ballpark figure
>
>> What about efifb and reserved regions?
>>
>
> Those are not tagged as EFI_MEMORY_RUNTIME so they are not covered by
> the UEFI runtime mappings, and not relevant to this discussion.
>
>> My (x86) Lenovo workstation has one 64MB and one 16MB Runtime/Uncached
>> MMIO region. As well as a 3MB and a 4MB RT_Data one.
>>
>
> Are those MMIO regions naturally aligned? And how about the RT_Data ones?

Just to be clear: we are talking about regions that
a) are larger than 2 MB
b) whose naturally aligned 2 MB subregions are relatively aligned with
their physical mapping
c) share a 16 KB or 64 KB page frame at either end with an adjacent region

So first of all, the virtual mapping code in the stub does take the 2
MB alignment into account, but only if the physical base of the region
is aligned to 2 MB and the size of the region is at least 2 MB. This
is intended for things like the NOR flash mapping (or the MMIO regions
that Leif refers to). But a region that happens to exceed 2 MB is not
mapped with the relative physical alignment in mind, and usually does
not end up aligned correctly for block mappings to be usable.

On top of that, the v2.6 Memory Attributes table feature (which I
think should be recommended in all cases, but is optional in the spec)
breaks RT_Code regions into read-only and non-exec slices, so even if
a RT_Code region existed whose size and alignment would happen to
allow a block mapping to be used, it will subsequently be split
anyway.

The bottom line is that I don't think it makes a huge amount of sense
to go out of our way to deal with large regions with unaligned
boundaries until we encounter any in real life.

-- 
Ard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 17:04           ` Ard Biesheuvel
  2016-06-29 17:40             ` Ard Biesheuvel
@ 2016-06-29 17:54             ` Leif Lindholm
  2016-06-29 17:56               ` Ard Biesheuvel
  1 sibling, 1 reply; 20+ messages in thread
From: Leif Lindholm @ 2016-06-29 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 07:04:57PM +0200, Ard Biesheuvel wrote:
> On 29 June 2016 at 19:00, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> > On Wed, Jun 29, 2016 at 06:53:18PM +0200, Ard Biesheuvel wrote:
> >> On 29 June 2016 at 18:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> > On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
> >> >>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
> >> >>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
> >> >>> +             /*
> >> >>> +              * If the end address of this region is not aligned to page
> >> >>> +              * size, the mapping is rounded up, and may end up sharing a
> >> >>> +              * page frame with the next UEFI memory region. If we create
> >> >>> +              * a block entry now, we may need to split it again when mapping
> >> >>> +              * the next region, and support for that is going to be removed
> >> >>> +              * from the MMU routines. So avoid block mappings altogether in
> >> >>> +              * that case.
> >> >>> +              */
> >> >>> +             allow_block_mappings = false;
> >> >>> +     }
> >> >>
> >> >> How common is it for large areas to have unaligned start/end? I wonder
> >> >> whether it's worth implementing my approach to look ahead and explicitly
> >> >> check the overlap with the next section instead of disabling block
> >> >> mappings altogether for this region.
> >> >>
> >> >
> >> > Very uncommon. Typically, only MMIO regions that represent NOR flash
> >> > are larger than a couple of pages. Taken from QEMU:
> >>
> >>   RT_Code   :            640 Pages (2,621,440 Bytes)
> >>   RT_Data   :            880 Pages (3,604,480 Bytes)
> >>
> >> so all RT_Code regions *combined* are 2.5 MB in total, and all RT_Data
> >> regions 3.5 MB. Ideally, they are grouped together, but in reality,
> >> there are always a couple of regions of each type, so there is little
> >> to gain here from using block mappings
> >
> > Is this representative for real platforms?
> 
> I think it is a reasonable ballpark figure
> 
> > What about efifb and reserved regions?
> 
> Those are not tagged as EFI_MEMORY_RUNTIME so they are not covered by
> the UEFI runtime mappings, and not relevant to this discussion.

OK.

> > My (x86) Lenovo workstation has one 64MB and one 16MB Runtime/Uncached
> > MMIO region. As well as a 3MB and a 4MB RT_Data one.
> 
> Are those MMIO regions naturally aligned? And how about the RT_Data ones?

So, I've now gone home and don't have access to the Lenovo, however I
have a machine at home also with an AMI UEFI implementation, and
identical MMIO regions. And they do look naturally aligned.

The RT_Data ones are not naturally aligned.

/
    Leif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions
  2016-06-29 17:54             ` Leif Lindholm
@ 2016-06-29 17:56               ` Ard Biesheuvel
  0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-29 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 June 2016 at 19:54, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Wed, Jun 29, 2016 at 07:04:57PM +0200, Ard Biesheuvel wrote:
>> On 29 June 2016 at 19:00, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>> > On Wed, Jun 29, 2016 at 06:53:18PM +0200, Ard Biesheuvel wrote:
>> >> On 29 June 2016 at 18:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> >> > On 29 June 2016 at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> >> On Wed, Jun 29, 2016 at 02:51:28PM +0200, Ard Biesheuvel wrote:
>> >> >>> +     if (!PAGE_ALIGNED(md->phys_addr) ||
>> >> >>> +         !PAGE_ALIGNED(md->num_pages << EFI_PAGE_SHIFT)) {
>> >> >>> +             /*
>> >> >>> +              * If the end address of this region is not aligned to page
>> >> >>> +              * size, the mapping is rounded up, and may end up sharing a
>> >> >>> +              * page frame with the next UEFI memory region. If we create
>> >> >>> +              * a block entry now, we may need to split it again when mapping
>> >> >>> +              * the next region, and support for that is going to be removed
>> >> >>> +              * from the MMU routines. So avoid block mappings altogether in
>> >> >>> +              * that case.
>> >> >>> +              */
>> >> >>> +             allow_block_mappings = false;
>> >> >>> +     }
>> >> >>
>> >> >> How common is it for large areas to have unaligned start/end? I wonder
>> >> >> whether it's worth implementing my approach to look ahead and explicitly
>> >> >> check the overlap with the next section instead of disabling block
>> >> >> mappings altogether for this region.
>> >> >>
>> >> >
>> >> > Very uncommon. Typically, only MMIO regions that represent NOR flash
>> >> > are larger than a couple of pages. Taken from QEMU:
>> >>
>> >>   RT_Code   :            640 Pages (2,621,440 Bytes)
>> >>   RT_Data   :            880 Pages (3,604,480 Bytes)
>> >>
>> >> so all RT_Code regions *combined* are 2.5 MB in total, and all RT_Data
>> >> regions 3.5 MB. Ideally, they are grouped together, but in reality,
>> >> there are always a couple of regions of each type, so there is little
>> >> to gain here from using block mappings
>> >
>> > Is this representative for real platforms?
>>
>> I think it is a reasonable ballpark figure
>>
>> > What about efifb and reserved regions?
>>
>> Those are not tagged as EFI_MEMORY_RUNTIME so they are not covered by
>> the UEFI runtime mappings, and not relevant to this discussion.
>
> OK.
>
>> > My (x86) Lenovo workstation has one 64MB and one 16MB Runtime/Uncached
>> > MMIO region. As well as a 3MB and a 4MB RT_Data one.
>>
>> Are those MMIO regions naturally aligned? And how about the RT_Data ones?
>
> So, I've now gone home and don't have access to the Lenovo, however I
> have a machine at home also with an AMI UEFI implementation, and
> identical MMIO regions. And they do look naturally aligned.
>
> The RT_Data ones are not naturally aligned.
>

OK, that confirms my suspicion. See other email

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/5] arm64: avoid block entries that we need to split later
  2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2016-06-29 12:51 ` [PATCH 5/5] arm64: mm: fold init_pgd() into __create_pgd_mapping() Ard Biesheuvel
@ 2016-06-30 16:13 ` Catalin Marinas
  2016-06-30 16:16   ` Ard Biesheuvel
  5 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2016-06-30 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 02:51:25PM +0200, Ard Biesheuvel wrote:
> Ard Biesheuvel (4):
>   arm64: mm: add param to force create_pgd_mapping() to use page
>     mappings
>   arm64: efi: always map runtime services code and data regions down to
>     pages
>   arm64: efi: avoid block mappings for unaligned UEFI memory regions
>   arm64: mm: fold init_pgd() into __create_pgd_mapping()
> 
> Catalin Marinas (1):
>   arm64: mm: Remove split_p*d() functions

I plan to queue these patches for 4.8. Do you have any other patches
based on these so that I need to create a separate branch? Otherwise,
I'll just push them on top of arm64 for-next/core branch.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/5] arm64: avoid block entries that we need to split later
  2016-06-30 16:13 ` [PATCH 0/5] arm64: avoid block entries that we need to split later Catalin Marinas
@ 2016-06-30 16:16   ` Ard Biesheuvel
  0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-06-30 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 June 2016 at 18:13, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Jun 29, 2016 at 02:51:25PM +0200, Ard Biesheuvel wrote:
>> Ard Biesheuvel (4):
>>   arm64: mm: add param to force create_pgd_mapping() to use page
>>     mappings
>>   arm64: efi: always map runtime services code and data regions down to
>>     pages
>>   arm64: efi: avoid block mappings for unaligned UEFI memory regions
>>   arm64: mm: fold init_pgd() into __create_pgd_mapping()
>>
>> Catalin Marinas (1):
>>   arm64: mm: Remove split_p*d() functions
>
> I plan to queue these patches for 4.8. Do you have any other patches
> based on these so that I need to create a separate branch? Otherwise,
> I'll just push them on top of arm64 for-next/core branch.
>

No, they can go right on top of arm64/for-next/core AFAICT

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages
  2016-06-29 12:51 ` [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages Ard Biesheuvel
@ 2016-07-22 14:30   ` Sudeep Holla
  2016-07-22 16:27     ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Sudeep Holla @ 2016-07-22 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On 29/06/16 13:51, Ard Biesheuvel wrote:
> To avoid triggering diagnostics in the MMU code that are finicky about
> splitting block mappings into more granular mappings, ensure that regions
> that are likely to appear in the Memory Attributes table as well as the
> UEFI memory map are always mapped down to pages. This way, we can use
> apply_to_page_range() instead of create_pgd_mapping() for the second pass,
> which cannot split or merge block entries, and operates strictly on PTEs.
>
> Note that this aligns the arm64 Memory Attributes table handling code with
> the ARM code, which already uses apply_to_page_range() to set the strict
> permissions.
>

This patch is merged in arm64/for-next/core now and when I try that
branch with defconfig + CONFIG_PROVE_LOCKING, I get the following splat
on boot and it fails to boot further on Juno.

I could bisect that to this patch(Commit bd264d046aad ("arm64: efi:
always map runtime services code and data regions down to pages") in
that branch)

Regards,
Sudeep

-->8

efi: memattr: Processing EFI Memory Attributes table:
efi: memattr:  0x0000f9400000-0x0000f942ffff [Runtime Data       |RUN| | 
  |XP|  |  |  |   |  |  |  |  ]
Unable to handle kernel NULL pointer dereference at virtual address 00000018
pgd = ffff000009aa4000
[00000018] *pgd=00000009ffffe003, *pud=00000009ffffd003, 
*pmd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc7-next-20160722 #134
Hardware name: ARM Juno development board (r2) (DT)
task: ffff800976ca0000 task.stack: ffff800976c3c000
PC is at __lock_acquire+0x13c/0x19e0
LR is at lock_acquire+0x4c/0x68
pc : [<ffff000008104544>] lr : [<ffff000008106114>] pstate: 200000c5
....
  __lock_acquire+0x13c/0x19e0
  lock_acquire+0x4c/0x68
  _raw_spin_lock+0x40/0x58
  apply_to_page_range+0x18c/0x388
  efi_set_mapping_permissions+0x34/0x44
  efi_memattr_apply_permissions+0x200/0x2a8
  arm_enable_runtime_services+0x1b4/0x1fc
  do_one_initcall+0x38/0x128
  kernel_init_freeable+0x84/0x1f0
  kernel_init+0x10/0x100
  ret_from_fork+0x10/0x40
Code: 5280003c 79004401 140000b5 b000b880 (f9400282)
---[ end trace 892120beb6681b4e ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages
  2016-07-22 14:30   ` Sudeep Holla
@ 2016-07-22 16:27     ` Ard Biesheuvel
  2016-07-22 16:36       ` Suzuki K Poulose
  0 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2016-07-22 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 22 July 2016 at 16:30, Sudeep Holla <sudeep.holla@arm.com> wrote:
> Hi Ard,
>
> On 29/06/16 13:51, Ard Biesheuvel wrote:
>>
>> To avoid triggering diagnostics in the MMU code that are finicky about
>> splitting block mappings into more granular mappings, ensure that regions
>> that are likely to appear in the Memory Attributes table as well as the
>> UEFI memory map are always mapped down to pages. This way, we can use
>> apply_to_page_range() instead of create_pgd_mapping() for the second pass,
>> which cannot split or merge block entries, and operates strictly on PTEs.
>>
>> Note that this aligns the arm64 Memory Attributes table handling code with
>> the ARM code, which already uses apply_to_page_range() to set the strict
>> permissions.
>>
>
> This patch is merged in arm64/for-next/core now and when I try that
> branch with defconfig + CONFIG_PROVE_LOCKING, I get the following splat
> on boot and it fails to boot further on Juno.
>
> I could bisect that to this patch(Commit bd264d046aad ("arm64: efi:
> always map runtime services code and data regions down to pages") in
> that branch)
>

Hi Sudeep,

I can reproduce this on QEMU as well. It appears that
apply_to_page_range() expects pages containing translation tables to
have their per-page spinlock initialized if they are not part of
init_mm.

This

--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -272,6 +272,7 @@ static phys_addr_t late_pgtable_alloc(void)
 {
        void *ptr = (void *)__get_free_page(PGALLOC_GFP);
        BUG_ON(!ptr);
+       BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));

        /* Ensure the zeroed page is visible to the page table walker */
        dsb(ishst);

makes the problem go away for me (just as a temporary hack) but I will
try to come up with something more appropriate, and check if ARM has
the same issue (since it uses apply_to_page_range() as well)

-- 
Ard.

> -->8
>
> efi: memattr: Processing EFI Memory Attributes table:
> efi: memattr:  0x0000f9400000-0x0000f942ffff [Runtime Data       |RUN| |
> |XP|  |  |  |   |  |  |  |  ]
> Unable to handle kernel NULL pointer dereference at virtual address 00000018
> pgd = ffff000009aa4000
> [00000018] *pgd=00000009ffffe003, *pud=00000009ffffd003,
> *pmd=0000000000000000
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc7-next-20160722 #134
> Hardware name: ARM Juno development board (r2) (DT)
> task: ffff800976ca0000 task.stack: ffff800976c3c000
> PC is at __lock_acquire+0x13c/0x19e0
> LR is at lock_acquire+0x4c/0x68
> pc : [<ffff000008104544>] lr : [<ffff000008106114>] pstate: 200000c5
> ....
>  __lock_acquire+0x13c/0x19e0
>  lock_acquire+0x4c/0x68
>  _raw_spin_lock+0x40/0x58
>  apply_to_page_range+0x18c/0x388
>  efi_set_mapping_permissions+0x34/0x44
>  efi_memattr_apply_permissions+0x200/0x2a8
>  arm_enable_runtime_services+0x1b4/0x1fc
>  do_one_initcall+0x38/0x128
>  kernel_init_freeable+0x84/0x1f0
>  kernel_init+0x10/0x100
>  ret_from_fork+0x10/0x40
> Code: 5280003c 79004401 140000b5 b000b880 (f9400282)
> ---[ end trace 892120beb6681b4e ]---
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages
  2016-07-22 16:27     ` Ard Biesheuvel
@ 2016-07-22 16:36       ` Suzuki K Poulose
  2016-07-22 16:45         ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2016-07-22 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/16 17:27, Ard Biesheuvel wrote:
> On 22 July 2016 at 16:30, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> Hi Ard,
>>
>> On 29/06/16 13:51, Ard Biesheuvel wrote:
>>>
>>> To avoid triggering diagnostics in the MMU code that are finicky about
>>> splitting block mappings into more granular mappings, ensure that regions
>>> that are likely to appear in the Memory Attributes table as well as the
>>> UEFI memory map are always mapped down to pages. This way, we can use
>>> apply_to_page_range() instead of create_pgd_mapping() for the second pass,
>>> which cannot split or merge block entries, and operates strictly on PTEs.
>>>
>>> Note that this aligns the arm64 Memory Attributes table handling code with
>>> the ARM code, which already uses apply_to_page_range() to set the strict
>>> permissions.
>>>
>>
>> This patch is merged in arm64/for-next/core now and when I try that
>> branch with defconfig + CONFIG_PROVE_LOCKING, I get the following splat
>> on boot and it fails to boot further on Juno.
>>
>> I could bisect that to this patch(Commit bd264d046aad ("arm64: efi:
>> always map runtime services code and data regions down to pages") in
>> that branch)
>>
>
> Hi Sudeep,
>
> I can reproduce this on QEMU as well. It appears that
> apply_to_page_range() expects pages containing translation tables to
> have their per-page spinlock initialized if they are not part of
> init_mm.
>
> This
>
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -272,6 +272,7 @@ static phys_addr_t late_pgtable_alloc(void)
>  {
>         void *ptr = (void *)__get_free_page(PGALLOC_GFP);
>         BUG_ON(!ptr);
> +       BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
>
>         /* Ensure the zeroed page is visible to the page table walker */
>         dsb(ishst);
>
> makes the problem go away for me (just as a temporary hack) but I will
> try to come up with something more appropriate, and check if ARM has
> the same issue (since it uses apply_to_page_range() as well)
>

Ard,

I took a quick look at it. Looks like we don't initialise the page-table
pages allocated via late_pgtable_alloc. Since we allocate it for an mm != init_mm,
the lock validator comes into picture and finds a lock which is not initialised.
The following patch fixes the issue. But is not a perfect one. May need to polish it
a little bit.

----8>----

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a96a241..d312667 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -270,12 +270,12 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
  
  static phys_addr_t late_pgtable_alloc(void)
  {
-	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
-	BUG_ON(!ptr);
+	struct page *page = pte_alloc_one(NULL, 0);
+	BUG_ON(!page);
  
  	/* Ensure the zeroed page is visible to the page table walker */
  	dsb(ishst);
-	return __pa(ptr);
+	return __pa(page_address(page));
  }



Thanks
Suzuki

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages
  2016-07-22 16:36       ` Suzuki K Poulose
@ 2016-07-22 16:45         ` Ard Biesheuvel
  0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-07-22 16:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 22 July 2016 at 18:36, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> On 22/07/16 17:27, Ard Biesheuvel wrote:
>>
>> On 22 July 2016 at 16:30, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>>
>>> Hi Ard,
>>>
>>> On 29/06/16 13:51, Ard Biesheuvel wrote:
>>>>
>>>>
>>>> To avoid triggering diagnostics in the MMU code that are finicky about
>>>> splitting block mappings into more granular mappings, ensure that
>>>> regions
>>>> that are likely to appear in the Memory Attributes table as well as the
>>>> UEFI memory map are always mapped down to pages. This way, we can use
>>>> apply_to_page_range() instead of create_pgd_mapping() for the second
>>>> pass,
>>>> which cannot split or merge block entries, and operates strictly on
>>>> PTEs.
>>>>
>>>> Note that this aligns the arm64 Memory Attributes table handling code
>>>> with
>>>> the ARM code, which already uses apply_to_page_range() to set the strict
>>>> permissions.
>>>>
>>>
>>> This patch is merged in arm64/for-next/core now and when I try that
>>> branch with defconfig + CONFIG_PROVE_LOCKING, I get the following splat
>>> on boot and it fails to boot further on Juno.
>>>
>>> I could bisect that to this patch(Commit bd264d046aad ("arm64: efi:
>>> always map runtime services code and data regions down to pages") in
>>> that branch)
>>>
>>
>> Hi Sudeep,
>>
>> I can reproduce this on QEMU as well. It appears that
>> apply_to_page_range() expects pages containing translation tables to
>> have their per-page spinlock initialized if they are not part of
>> init_mm.
>>
>> This
>>
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -272,6 +272,7 @@ static phys_addr_t late_pgtable_alloc(void)
>>  {
>>         void *ptr = (void *)__get_free_page(PGALLOC_GFP);
>>         BUG_ON(!ptr);
>> +       BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
>>
>>         /* Ensure the zeroed page is visible to the page table walker */
>>         dsb(ishst);
>>
>> makes the problem go away for me (just as a temporary hack) but I will
>> try to come up with something more appropriate, and check if ARM has
>> the same issue (since it uses apply_to_page_range() as well)
>>
>
> Ard,
>
> I took a quick look at it. Looks like we don't initialise the page-table
> pages allocated via late_pgtable_alloc. Since we allocate it for an mm !=
> init_mm,
> the lock validator comes into picture and finds a lock which is not
> initialised.
> The following patch fixes the issue. But is not a perfect one. May need to
> polish it
> a little bit.
>
> ----8>----
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index a96a241..d312667 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -270,12 +270,12 @@ static void __create_pgd_mapping(pgd_t *pgdir,
> phys_addr_t phys,
>   static phys_addr_t late_pgtable_alloc(void)
>  {
> -       void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> -       BUG_ON(!ptr);
> +       struct page *page = pte_alloc_one(NULL, 0);
> +       BUG_ON(!page);
>         /* Ensure the zeroed page is visible to the page table walker */
>         dsb(ishst);
> -       return __pa(ptr);
> +       return __pa(page_address(page));
>  }
>

Actually, I just sent a response to the same effect. Alternatively, we
could educate apply_to_page_range() to treat the EFI page tables
specially (or simply all statically allocated mm_struct instances),
e.g.,

diff --git a/mm/memory.c b/mm/memory.c
index 15322b73636b..dc6145129170 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1852,8 +1852,11 @@ static int apply_to_pte_range(struct mm_struct
*mm, pmd_t *pmd,
        int err;
        pgtable_t token;
        spinlock_t *uninitialized_var(ptl);
+       bool is_kernel_mm;

-       pte = (mm == &init_mm) ?
+       is_kernel_mm = (mm == &init_mm || core_kernel_data((unsigned long)mm));
+
+       pte = is_kernel_mm ?
                pte_alloc_kernel(pmd, addr) :
                pte_alloc_map_lock(mm, pmd, addr, &ptl);
        if (!pte)
@@ -1873,7 +1876,7 @@ static int apply_to_pte_range(struct mm_struct
*mm, pmd_t *pmd,

        arch_leave_lazy_mmu_mode();

-       if (mm != &init_mm)
+       if (!is_kernel_mm)
                pte_unmap_unlock(pte-1, ptl);
        return err;
 }

but it seems more appropriate to initialize the struct pages
correctly, to preemptively deal with other code that may make similar
assumptions.

I also noticed that create_mapping_late() and create_mapping_noalloc()
are essentially the same, since the only two invocations of the former
should not split block entries, and simply remaps regions that have
already been mapped with stricter permissions. This means
late_pgtable_alloc is only used by create_pgd_mapping, which is only
used by the EFI code. So that allows for some shortcuts to be taken, I
would think. The only downside is that I will need to fix it in two
places (arm64 and ARM)

Anyway, thanks for the suggestion.

-- 
Ard.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-07-22 16:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-29 12:51 [PATCH 0/5] arm64: avoid block entries that we need to split later Ard Biesheuvel
2016-06-29 12:51 ` [PATCH 1/5] arm64: mm: add param to force create_pgd_mapping() to use page mappings Ard Biesheuvel
2016-06-29 12:51 ` [PATCH 2/5] arm64: efi: always map runtime services code and data regions down to pages Ard Biesheuvel
2016-07-22 14:30   ` Sudeep Holla
2016-07-22 16:27     ` Ard Biesheuvel
2016-07-22 16:36       ` Suzuki K Poulose
2016-07-22 16:45         ` Ard Biesheuvel
2016-06-29 12:51 ` [PATCH 3/5] arm64: efi: avoid block mappings for unaligned UEFI memory regions Ard Biesheuvel
2016-06-29 16:45   ` Catalin Marinas
2016-06-29 16:50     ` Ard Biesheuvel
2016-06-29 16:53       ` Ard Biesheuvel
2016-06-29 17:00         ` Leif Lindholm
2016-06-29 17:04           ` Ard Biesheuvel
2016-06-29 17:40             ` Ard Biesheuvel
2016-06-29 17:54             ` Leif Lindholm
2016-06-29 17:56               ` Ard Biesheuvel
2016-06-29 12:51 ` [PATCH 4/5] arm64: mm: Remove split_p*d() functions Ard Biesheuvel
2016-06-29 12:51 ` [PATCH 5/5] arm64: mm: fold init_pgd() into __create_pgd_mapping() Ard Biesheuvel
2016-06-30 16:13 ` [PATCH 0/5] arm64: avoid block entries that we need to split later Catalin Marinas
2016-06-30 16:16   ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).