linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] arm64: stable UEFI mappings for kexec
@ 2014-10-24 12:39 Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping() Ard Biesheuvel
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

This series, that can be applied on top of the 'efi for 3.19' series I posted a
couple of days ago, reworks the UEFI virtual remapping logic so that we can
support kexec cleanly.

The main premise of these patches is that, in order to support kexec, we need
to add code to the kernel that is able to deal with the state of the firmware
after SetVirtualAddressMap() [SVAM] has been called. However, if we are going to
deal with that anyway, why not make that the default state, and have only a
single code path for both cases.

This means SVAM() needs to move to the stub, and hence the code that invents
the virtual layout needs to move with it. The result is that the kernel proper
is entered with the virt_addr members of all EFI_MEMORY_RUNTIME regions
assigned, and the mapping installed into the firmware. The kernel proper needs
to set up the page tables, and switch to them while performing the runtime
services calls. Note that there is also an efi_to_phys() to translate the values
of the fw_vendor and tables fields of the EFI system table. Again, this is
something we need to do anyway under kexec, or we end up handing over state
between one kernel and the next, which implies different code paths between
non-kexec and kexec.

The layout is chosen such that it used the userland half of the virtual address
space (TTBR0), which means no additional alignment or reservation is required
to ensure that it will always be available.

One thing that may stand out is the reordering of the memory map. The reason
for doing this is that we can use the same memory map as input to SVAM(). The
alternative is allocating memory for it using boot services, but that clutters
up the existing logic a bit between getting the memory map, populating the fdt,
and loop again if it didn't fit. The current code works perfectly fine, but
I am aware that it is an acquired taste :-)

Ard Biesheuvel (6):
  arm64/mm: add explicit struct_mm argument to __create_mapping()
  arm64/mm: add create_pgd_mapping() to create private page tables
  efi: split off remapping code from efi_config_init()
  arm64/efi: move SetVirtualAddressMap() to UEFI stub
  arm64/efi: remove free_boot_services() and friends
  arm64/efi: remove idmap manipulations from UEFI code

 arch/arm64/include/asm/efi.h       |  23 ++-
 arch/arm64/include/asm/mmu.h       |   5 +-
 arch/arm64/kernel/efi.c            | 336 +++++++++++--------------------------
 arch/arm64/kernel/setup.c          |   2 +-
 arch/arm64/mm/mmu.c                |  47 +++---
 drivers/firmware/efi/efi.c         |  49 ++++--
 drivers/firmware/efi/libstub/fdt.c | 104 +++++++++++-
 include/linux/efi.h                |   2 +
 8 files changed, 281 insertions(+), 287 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping()
  2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
@ 2014-10-24 12:39 ` Ard Biesheuvel
  2014-10-24 15:23   ` Steve Capper
  2014-10-24 12:39 ` [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables Ard Biesheuvel
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, swapper_pg_dir and idmap_pg_dir share the init_mm mm_struct
instance. To allow the introduction of other pg_dir instances, for instance,
for UEFI's mapping of Runtime Services, make the struct_mm instance an
explicit argument that gets passed down to the pmd and pte instantiation
functions. Note that the consumers (pmd_populate/pgd_populate) of the
mm_struct argument don't actually inspect it, but let's fix it for
correctness' sake.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6894ef3e6234..7eaa6a8c8467 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -155,9 +155,9 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
 
-static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
-				  unsigned long end, phys_addr_t phys,
-				  int map_io)
+static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
+				  unsigned long addr, unsigned long end,
+				  phys_addr_t phys, int map_io)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -177,7 +177,7 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
 	 */
 	if (pud_none(*pud) || pud_bad(*pud)) {
 		pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
-		pud_populate(&init_mm, pud, pmd);
+		pud_populate(mm, pud, pmd);
 	}
 
 	pmd = pmd_offset(pud, addr);
@@ -201,16 +201,16 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
 	} while (pmd++, addr = next, addr != end);
 }
 
-static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
-				  unsigned long end, unsigned long phys,
-				  int map_io)
+static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
+				  unsigned long addr, unsigned long end,
+				  unsigned long phys, int map_io)
 {
 	pud_t *pud;
 	unsigned long next;
 
 	if (pgd_none(*pgd)) {
 		pud = early_alloc(PTRS_PER_PUD * sizeof(pud_t));
-		pgd_populate(&init_mm, pgd, pud);
+		pgd_populate(mm, pgd, pud);
 	}
 	BUG_ON(pgd_bad(*pgd));
 
@@ -239,7 +239,7 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
 				flush_tlb_all();
 			}
 		} else {
-			alloc_init_pmd(pud, addr, next, phys, map_io);
+			alloc_init_pmd(mm, pud, addr, next, phys, map_io);
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
@@ -249,9 +249,9 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
  * Create the page directory entries and any necessary page tables for the
  * mapping specified by 'md'.
  */
-static void __init __create_mapping(pgd_t *pgd, phys_addr_t phys,
-				    unsigned long virt, phys_addr_t size,
-				    int map_io)
+static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
+				    phys_addr_t phys, unsigned long virt,
+				    phys_addr_t size, int map_io)
 {
 	unsigned long addr, length, end, next;
 
@@ -261,7 +261,7 @@ static void __init __create_mapping(pgd_t *pgd, phys_addr_t phys,
 	end = addr + length;
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(pgd, addr, next, phys, map_io);
+		alloc_init_pud(mm, pgd, addr, next, phys, map_io);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -274,7 +274,8 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt, size, 0);
+	__create_mapping(&init_mm, pgd_offset_k(virt & PAGE_MASK), phys, virt,
+			 size, 0);
 }
 
 void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
@@ -283,7 +284,7 @@ void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
 		pr_warn("BUG: not creating id mapping for %pa\n", &addr);
 		return;
 	}
-	__create_mapping(&idmap_pg_dir[pgd_index(addr)],
+	__create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
 			 addr, addr, size, map_io);
 }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables
  2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping() Ard Biesheuvel
@ 2014-10-24 12:39 ` Ard Biesheuvel
  2014-10-24 15:17   ` Steve Capper
  2014-10-24 12:39 ` [PATCH 3/6] efi: split off remapping code from efi_config_init() Ard Biesheuvel
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

For UEFI, we need to install the memory mappings used for Runtime Services
in a dedicated set of page tables. Add create_pgd_mapping(), which allows
us to allocate and install those page table entries early.
This also adds a 'map_xn' option, that creates regions with the PXN and
UXN bits set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu.h |  3 +++
 arch/arm64/mm/mmu.c          | 28 ++++++++++++++++++++--------
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index c2f006c48bdb..bcf166043a8b 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -33,5 +33,8 @@ extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
 extern void init_mem_pgprot(void);
 /* create an identity mapping for memory (or io if map_io is true) */
 extern void create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io);
+extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
+			       unsigned long virt, phys_addr_t size,
+			       int map_io, int map_xn);
 
 #endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7eaa6a8c8467..f7d17a5a1f56 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -157,7 +157,7 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
 
 static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 				  unsigned long addr, unsigned long end,
-				  phys_addr_t phys, int map_io)
+				  phys_addr_t phys, int map_io, int map_xn)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -167,6 +167,9 @@ static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 	if (map_io) {
 		prot_sect = PROT_SECT_DEVICE_nGnRE;
 		prot_pte = __pgprot(PROT_DEVICE_nGnRE);
+	} else if (map_xn) {
+		prot_sect = PROT_SECT_NORMAL;
+		prot_pte = PAGE_KERNEL;
 	} else {
 		prot_sect = PROT_SECT_NORMAL_EXEC;
 		prot_pte = PAGE_KERNEL_EXEC;
@@ -203,7 +206,7 @@ static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 
 static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 				  unsigned long addr, unsigned long end,
-				  unsigned long phys, int map_io)
+				  unsigned long phys, int map_io, int map_xn)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -221,7 +224,7 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
-		if (!map_io && (PAGE_SHIFT == 12) &&
+		if (!map_io && !map_xn && (PAGE_SHIFT == 12) &&
 		    ((addr | next | phys) & ~PUD_MASK) == 0) {
 			pud_t old_pud = *pud;
 			set_pud(pud, __pud(phys | PROT_SECT_NORMAL_EXEC));
@@ -239,7 +242,8 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 				flush_tlb_all();
 			}
 		} else {
-			alloc_init_pmd(mm, pud, addr, next, phys, map_io);
+			alloc_init_pmd(mm, pud, addr, next, phys, map_io,
+				       map_xn);
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
@@ -251,7 +255,7 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
  */
 static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 				    phys_addr_t phys, unsigned long virt,
-				    phys_addr_t size, int map_io)
+				    phys_addr_t size, int map_io, int map_xn)
 {
 	unsigned long addr, length, end, next;
 
@@ -261,7 +265,7 @@ static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 	end = addr + length;
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(mm, pgd, addr, next, phys, map_io);
+		alloc_init_pud(mm, pgd, addr, next, phys, map_io, map_xn);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -275,7 +279,7 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 		return;
 	}
 	__create_mapping(&init_mm, pgd_offset_k(virt & PAGE_MASK), phys, virt,
-			 size, 0);
+			 size, 0, 0);
 }
 
 void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
@@ -285,7 +289,15 @@ void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
 		return;
 	}
 	__create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
-			 addr, addr, size, map_io);
+			 addr, addr, size, map_io, 0);
+}
+
+void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
+			       unsigned long virt, phys_addr_t size,
+			       int map_io, int map_xn)
+{
+	__create_mapping(mm, pgd_offset(mm, virt), phys, virt, size, map_io,
+			 map_xn);
 }
 
 static void __init map_mem(void)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] efi: split off remapping code from efi_config_init()
  2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping() Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables Ard Biesheuvel
@ 2014-10-24 12:39 ` Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub Ard Biesheuvel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Split of the remapping code from efi_config_init() so that the caller
can perform its own remapping. This is necessary to correctly handle
virtually remapped UEFI memory regions under kexec, as efi.systab will
have been updated to a virtual address.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/efi.c | 49 +++++++++++++++++++++++++++++-----------------
 include/linux/efi.h        |  2 ++
 2 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 9035c1b74d58..0de77db4fb88 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -293,9 +293,10 @@ static __init int match_config_table(efi_guid_t *guid,
 	return 0;
 }
 
-int __init efi_config_init(efi_config_table_type_t *arch_tables)
+int __init efi_config_parse_tables(void *config_tables, int count,
+				   efi_config_table_type_t *arch_tables)
 {
-	void *config_tables, *tablep;
+	void *tablep;
 	int i, sz;
 
 	if (efi_enabled(EFI_64BIT))
@@ -303,19 +304,9 @@ int __init efi_config_init(efi_config_table_type_t *arch_tables)
 	else
 		sz = sizeof(efi_config_table_32_t);
 
-	/*
-	 * Let's see what config tables the firmware passed to us.
-	 */
-	config_tables = early_memremap(efi.systab->tables,
-				       efi.systab->nr_tables * sz);
-	if (config_tables == NULL) {
-		pr_err("Could not map Configuration table!\n");
-		return -ENOMEM;
-	}
-
 	tablep = config_tables;
 	pr_info("");
-	for (i = 0; i < efi.systab->nr_tables; i++) {
+	for (i = 0; i < count; i++) {
 		efi_guid_t guid;
 		unsigned long table;
 
@@ -328,8 +319,6 @@ int __init efi_config_init(efi_config_table_type_t *arch_tables)
 			if (table64 >> 32) {
 				pr_cont("\n");
 				pr_err("Table located above 4GB, disabling EFI.\n");
-				early_memunmap(config_tables,
-					       efi.systab->nr_tables * sz);
 				return -EINVAL;
 			}
 #endif
@@ -344,13 +333,37 @@ int __init efi_config_init(efi_config_table_type_t *arch_tables)
 		tablep += sz;
 	}
 	pr_cont("\n");
-	early_memunmap(config_tables, efi.systab->nr_tables * sz);
-
 	set_bit(EFI_CONFIG_TABLES, &efi.flags);
-
 	return 0;
 }
 
+int __init efi_config_init(efi_config_table_type_t *arch_tables)
+{
+	void *config_tables;
+	int sz, ret;
+
+	if (efi_enabled(EFI_64BIT))
+		sz = sizeof(efi_config_table_64_t);
+	else
+		sz = sizeof(efi_config_table_32_t);
+
+	/*
+	 * Let's see what config tables the firmware passed to us.
+	 */
+	config_tables = early_memremap(efi.systab->tables,
+				       efi.systab->nr_tables * sz);
+	if (config_tables == NULL) {
+		pr_err("Could not map Configuration table!\n");
+		return -ENOMEM;
+	}
+
+	ret = efi_config_parse_tables(config_tables, efi.systab->nr_tables,
+				      arch_tables);
+
+	early_memunmap(config_tables, efi.systab->nr_tables * sz);
+	return ret;
+}
+
 #ifdef CONFIG_EFI_VARS_MODULE
 static int __init efi_load_efivars(void)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 27d69388ded0..abd3e91964a6 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -875,6 +875,8 @@ static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned lon
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 extern int efi_config_init(efi_config_table_type_t *arch_tables);
+extern int efi_config_parse_tables(void *config_tables, int count,
+				   efi_config_table_type_t *arch_tables);
 extern u64 efi_get_iobase (void);
 extern u32 efi_mem_type (unsigned long phys_addr);
 extern u64 efi_mem_attributes (unsigned long phys_addr);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub
  2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2014-10-24 12:39 ` [PATCH 3/6] efi: split off remapping code from efi_config_init() Ard Biesheuvel
@ 2014-10-24 12:39 ` Ard Biesheuvel
  2014-10-24 13:39   ` Grant Likely
  2014-10-28  7:47   ` Dave Young
  2014-10-24 12:39 ` [PATCH 5/6] arm64/efi: remove free_boot_services() and friends Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code Ard Biesheuvel
  5 siblings, 2 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

In order to support kexec, the kernel needs to be able to deal with the
state of the UEFI firmware after SetVirtualAddressMap() has been called.
To avoid having separate code paths for non-kexec and kexec, let's move
the call to SetVirtualAddressMap() to the stub: this will guarantee us
that it will only be called once (since the stub is not executed during
kexec), and ensures that the UEFI state is identical between kexec and
normal boot.

This implies that the layout of the virtual mapping needs to be created
by the stub as well. All regions are rounded up to a naturally aligned
multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
in the UEFI memory map. The kernel proper reads those values and installs
the mappings in a dedicated set of page tables that are swapped in during
UEFI Runtime Services calls.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/efi.h       |  19 +++-
 arch/arm64/kernel/efi.c            | 205 +++++++++++++++++++------------------
 drivers/firmware/efi/libstub/fdt.c | 104 ++++++++++++++++++-
 3 files changed, 224 insertions(+), 104 deletions(-)

diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index a34fd3b12e2b..d752e5480096 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -12,23 +12,32 @@ extern void efi_idmap_init(void);
 #define efi_idmap_init()
 #endif
 
+void efi_load_rt_mapping(void);
+void efi_unload_rt_mapping(void);
+
 #define efi_call_virt(f, ...)						\
 ({									\
-	efi_##f##_t *__f = efi.systab->runtime->f;			\
+	efi_##f##_t *__f;						\
 	efi_status_t __s;						\
 									\
-	kernel_neon_begin();						\
+	kernel_neon_begin();	/* disables preemption */		\
+	efi_load_rt_mapping();						\
+	__f = efi.systab->runtime->f;					\
 	__s = __f(__VA_ARGS__);						\
+	efi_unload_rt_mapping();					\
 	kernel_neon_end();						\
 	__s;								\
 })
 
 #define __efi_call_virt(f, ...)						\
 ({									\
-	efi_##f##_t *__f = efi.systab->runtime->f;			\
+	efi_##f##_t *__f;						\
 									\
-	kernel_neon_begin();						\
+	kernel_neon_begin();	/* disables preemption */		\
+	efi_load_rt_mapping();						\
+	__f = efi.systab->runtime->f;					\
 	__f(__VA_ARGS__);						\
+	efi_unload_rt_mapping();					\
 	kernel_neon_end();						\
 })
 
@@ -44,4 +53,6 @@ extern void efi_idmap_init(void);
 
 #define efi_call_early(f, ...) sys_table_arg->boottime->f(__VA_ARGS__)
 
+#define EFI_VIRTMAP		EFI_ARCH_1
+
 #endif /* _ASM_EFI_H */
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index baab9344a32b..324398c03acd 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -20,16 +20,21 @@
 #include <linux/of_fdt.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/mm_types.h>
+#include <linux/rbtree.h>
+#include <linux/rwsem.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
 
 #include <asm/cacheflush.h>
 #include <asm/efi.h>
 #include <asm/tlbflush.h>
+#include <asm/pgtable.h>
 #include <asm/mmu_context.h>
+#include <asm/mmu.h>
 
 struct efi_memory_map memmap;
 
-static efi_runtime_services_t *runtime;
-
 static u64 efi_system_table;
 
 static int uefi_debug __initdata;
@@ -69,9 +74,36 @@ static void __init efi_setup_idmap(void)
 	}
 }
 
+/*
+ * Translate a EFI virtual address into a physical address: this is necessary,
+ * as some data members of the EFI system table are virtually remapped after
+ * SetVirtualAddressMap() has been called.
+ */
+static phys_addr_t __init efi_to_phys(unsigned long addr)
+{
+	efi_memory_desc_t *md;
+
+	for_each_efi_memory_desc(&memmap, md) {
+		if (!(md->attribute & EFI_MEMORY_RUNTIME))
+			continue;
+		if (md->virt_addr == 0)
+			/* no virtual mapping has been installed by the stub */
+			break;
+		if (md->virt_addr < addr &&
+		    (addr - md->virt_addr) < (md->num_pages << EFI_PAGE_SHIFT))
+			return md->phys_addr + addr - md->virt_addr;
+	}
+
+	WARN_ONCE(1, "UEFI virtual mapping incomplete or missing -- no entry found for 0x%lx\n",
+		  addr);
+	return addr;
+}
+
 static int __init uefi_init(void)
 {
 	efi_char16_t *c16;
+	void *config_tables;
+	u64 table_size;
 	char vendor[100] = "unknown";
 	int i, retval;
 
@@ -99,7 +131,7 @@ static int __init uefi_init(void)
 			efi.systab->hdr.revision & 0xffff);
 
 	/* Show what we know for posterity */
-	c16 = early_memremap(efi.systab->fw_vendor,
+	c16 = early_memremap(efi_to_phys(efi.systab->fw_vendor),
 			     sizeof(vendor));
 	if (c16) {
 		for (i = 0; i < (int) sizeof(vendor) - 1 && *c16; ++i)
@@ -112,8 +144,14 @@ static int __init uefi_init(void)
 		efi.systab->hdr.revision >> 16,
 		efi.systab->hdr.revision & 0xffff, vendor);
 
-	retval = efi_config_init(NULL);
+	table_size = sizeof(efi_config_table_64_t) * efi.systab->nr_tables;
+	config_tables = early_memremap(efi_to_phys(efi.systab->tables),
+				       table_size);
+
+	retval = efi_config_parse_tables(config_tables,
+					 efi.systab->nr_tables, NULL);
 
+	early_memunmap(config_tables, table_size);
 out:
 	early_memunmap(efi.systab,  sizeof(efi_system_table_t));
 	return retval;
@@ -319,60 +357,64 @@ void __init efi_init(void)
 	reserve_regions();
 }
 
+static pgd_t efi_pgd[PTRS_PER_PGD] __page_aligned_bss;
+
+static struct mm_struct efi_mm = {
+	.mm_rb			= RB_ROOT,
+	.pgd			= efi_pgd,
+	.mm_users		= ATOMIC_INIT(2),
+	.mm_count		= ATOMIC_INIT(1),
+	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
+	INIT_MM_CONTEXT(efi_mm)
+};
+
 void __init efi_idmap_init(void)
 {
+	efi_memory_desc_t *md;
+
 	if (!efi_enabled(EFI_BOOT))
 		return;
 
 	/* boot time idmap_pg_dir is incomplete, so fill in missing parts */
 	efi_setup_idmap();
-}
-
-static int __init remap_region(efi_memory_desc_t *md, void **new)
-{
-	u64 paddr, vaddr, npages, size;
-
-	paddr = md->phys_addr;
-	npages = md->num_pages;
-	memrange_efi_to_native(&paddr, &npages);
-	size = npages << PAGE_SHIFT;
-
-	if (is_normal_ram(md))
-		vaddr = (__force u64)ioremap_cache(paddr, size);
-	else
-		vaddr = (__force u64)ioremap(paddr, size);
 
-	if (!vaddr) {
-		pr_err("Unable to remap 0x%llx pages @ %p\n",
-		       npages, (void *)paddr);
-		return 0;
-	}
+	for_each_efi_memory_desc(&memmap, md) {
+		u64 paddr, npages, size;
 
-	/* adjust for any rounding when EFI and system pagesize differs */
-	md->virt_addr = vaddr + (md->phys_addr - paddr);
+		if (!(md->attribute & EFI_MEMORY_RUNTIME))
+			continue;
+		if (md->virt_addr == 0)
+			/* no virtual mapping has been installed by the stub */
+			return;
 
-	if (uefi_debug)
-		pr_info("  EFI remap 0x%012llx => %p\n",
-			md->phys_addr, (void *)md->virt_addr);
+		paddr = md->phys_addr;
+		npages = md->num_pages;
+		memrange_efi_to_native(&paddr, &npages);
+		size = npages << PAGE_SHIFT;
 
-	memcpy(*new, md, memmap.desc_size);
-	*new += memmap.desc_size;
+		if (uefi_debug)
+			pr_info("  EFI remap 0x%012llx => %p\n",
+				md->phys_addr, (void *)md->virt_addr);
 
-	return 1;
+		/*
+		 * Only regions of type EFI_RUNTIME_SERVICES_CODE need to be
+		 * executable, everything else can be mapped with the XN bits
+		 * set. Unfortunately, we cannot map those code regions write
+		 * protect, as they may contain read-write .data sections as
+		 * well.
+		 */
+		create_pgd_mapping(&efi_mm, paddr, md->virt_addr, size,
+				   !is_normal_ram(md),
+				   md->type != EFI_RUNTIME_SERVICES_CODE);
+	}
+	set_bit(EFI_VIRTMAP, &efi.flags);
 }
 
-/*
- * Switch UEFI from an identity map to a kernel virtual map
- */
 static int __init arm64_enter_virtual_mode(void)
 {
-	efi_memory_desc_t *md;
-	phys_addr_t virtmap_phys;
-	void *virtmap, *virt_md;
-	efi_status_t status;
 	u64 mapsize;
-	int count = 0;
-	unsigned long flags;
 
 	if (!efi_enabled(EFI_MEMMAP))
 		return 0;
@@ -393,79 +435,28 @@ static int __init arm64_enter_virtual_mode(void)
 
 	efi.memmap = &memmap;
 
-	/* Map the runtime regions */
-	virtmap = kmalloc(mapsize, GFP_KERNEL);
-	if (!virtmap) {
-		pr_err("Failed to allocate EFI virtual memmap\n");
-		return -1;
-	}
-	virtmap_phys = virt_to_phys(virtmap);
-	virt_md = virtmap;
-
-	for_each_efi_memory_desc(&memmap, md) {
-		if (!(md->attribute & EFI_MEMORY_RUNTIME))
-			continue;
-		if (!remap_region(md, &virt_md))
-			goto err_unmap;
-		++count;
-	}
-
-	efi.systab = (__force void *)efi_lookup_mapped_addr(efi_system_table);
+	efi.systab = (__force void *)ioremap_cache(efi_system_table,
+						   sizeof(efi_system_table_t));
 	if (!efi.systab) {
-		/*
-		 * If we have no virtual mapping for the System Table at this
-		 * point, the memory map doesn't cover the physical offset where
-		 * it resides. This means the System Table will be inaccessible
-		 * to Runtime Services themselves once the virtual mapping is
-		 * installed.
-		 */
-		pr_err("Failed to remap EFI System Table -- buggy firmware?\n");
-		goto err_unmap;
+		pr_err("Failed to remap EFI System Table\n");
+		return -1;
 	}
 	set_bit(EFI_SYSTEM_TABLES, &efi.flags);
 
-	local_irq_save(flags);
-	cpu_switch_mm(idmap_pg_dir, &init_mm);
-
-	/* Call SetVirtualAddressMap with the physical address of the map */
-	runtime = efi.systab->runtime;
-	efi.set_virtual_address_map = runtime->set_virtual_address_map;
-
-	status = efi.set_virtual_address_map(count * memmap.desc_size,
-					     memmap.desc_size,
-					     memmap.desc_version,
-					     (efi_memory_desc_t *)virtmap_phys);
-	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
-	local_irq_restore(flags);
-
-	kfree(virtmap);
-
 	free_boot_services();
 
-	if (status != EFI_SUCCESS) {
-		pr_err("Failed to set EFI virtual address map! [%lx]\n",
-			status);
+	if (!efi_enabled(EFI_VIRTMAP)) {
+		pr_err("No UEFI virtual mapping was installed -- runtime services will not be available\n");
 		return -1;
 	}
 
 	/* Set up runtime services function pointers */
-	runtime = efi.systab->runtime;
 	efi_native_runtime_setup();
 	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
 
 	efi.runtime_version = efi.systab->hdr.revision;
 
 	return 0;
-
-err_unmap:
-	/* unmap all mappings that succeeded: there are 'count' of those */
-	for (virt_md = virtmap; count--; virt_md += memmap.desc_size) {
-		md = virt_md;
-		iounmap((__force void __iomem *)md->virt_addr);
-	}
-	kfree(virtmap);
-	return -1;
 }
 early_initcall(arm64_enter_virtual_mode);
 
@@ -482,3 +473,21 @@ static int __init arm64_dmi_init(void)
 	return 0;
 }
 core_initcall(arm64_dmi_init);
+
+static void efi_set_pgd(struct mm_struct *mm)
+{
+	cpu_switch_mm(mm->pgd, mm);
+	flush_tlb_all();
+	if (icache_is_aivivt())
+		__flush_icache_all();
+}
+
+void efi_load_rt_mapping(void)
+{
+	efi_set_pgd(&efi_mm);
+}
+
+void efi_unload_rt_mapping(void)
+{
+	efi_set_pgd(current->active_mm);
+}
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index c846a9608cbd..9a5f6e54a423 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -168,6 +168,68 @@ fdt_set_fail:
 #endif
 
 /*
+ * This is the base address at which to start allocating virtual memory ranges
+ * for UEFI Runtime Services. This is a userland range so that we can use any
+ * allocation we choose, and eliminate the risk of a conflict after kexec.
+ */
+#define EFI_RT_VIRTUAL_BASE	0x40000000
+
+static void update_memory_map(efi_memory_desc_t *memory_map,
+			      unsigned long map_size, unsigned long desc_size,
+			      int *count)
+{
+	u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
+	union {
+		efi_memory_desc_t entry;
+		u8 pad[desc_size];
+	} *p, *q, tmp;
+	int i = map_size / desc_size;
+
+	p = (void *)memory_map;
+	for (q = p; i >= 0; i--, q++) {
+		u64 paddr, size;
+
+		if (!(q->entry.attribute & EFI_MEMORY_RUNTIME))
+			continue;
+
+		/*
+		 * Swap the entries around so that all EFI_MEMORY_RUNTIME
+		 * entries bubble to the top. This will allow us to reuse the
+		 * table as input to SetVirtualAddressMap().
+		 */
+		if (q != p) {
+			tmp = *p;
+			*p = *q;
+			*q = tmp;
+		}
+
+		/*
+		 * Make the mapping compatible with 64k pages: this allows
+		 * a 4k page size kernel to kexec a 64k page size kernel and
+		 * vice versa.
+		 */
+		paddr = round_down(p->entry.phys_addr, SZ_64K);
+		size = round_up(p->entry.num_pages * EFI_PAGE_SIZE +
+				p->entry.phys_addr - paddr, SZ_64K);
+
+		/*
+		 * Avoid wasting memory on PTEs by choosing a virtual base that
+		 * is compatible with section mappings if this region has the
+		 * appropriate size and physical alignment. (Sections are 2 MB
+		 * on 4k granule kernels)
+		 */
+		if (IS_ALIGNED(p->entry.phys_addr, SZ_2M) && size >= SZ_2M)
+			efi_virt_base = round_up(efi_virt_base, SZ_2M);
+
+		p->entry.virt_addr = efi_virt_base + p->entry.phys_addr - paddr;
+		efi_virt_base += size;
+
+		++p;
+		++*count;
+	}
+}
+
+/*
  * Allocate memory for a new FDT, then add EFI, commandline, and
  * initrd related fields to the FDT.  This routine increases the
  * FDT allocation size until the allocated memory is large
@@ -196,6 +258,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 	efi_memory_desc_t *memory_map;
 	unsigned long new_fdt_size;
 	efi_status_t status;
+	int runtime_entry_count = 0;
 
 	/*
 	 * Estimate size of new FDT, and allocate memory for it. We
@@ -248,12 +311,49 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
 		}
 	}
 
+	/*
+	 * Update the memory map with virtual addresses, and reorder the entries
+	 * so that we can pass it straight into SetVirtualAddressMap()
+	 */
+	update_memory_map(memory_map, map_size, desc_size,
+			  &runtime_entry_count);
+
 	/* Now we are ready to exit_boot_services.*/
 	status = sys_table->boottime->exit_boot_services(handle, mmap_key);
 
+	if (status == EFI_SUCCESS) {
+		efi_set_virtual_address_map_t *svam;
+
+		/* Install the new virtual address map */
+		svam = sys_table->runtime->set_virtual_address_map;
+		status = svam(runtime_entry_count * desc_size, desc_size,
+			      desc_ver, memory_map);
 
-	if (status == EFI_SUCCESS)
-		return status;
+		/*
+		 * We are beyond the point of no return here, so if the call to
+		 * SetVirtualAddressMap() failed, we need to signal that to the
+		 * incoming kernel but proceed normally otherwise.
+		 */
+		if (status != EFI_SUCCESS) {
+			int i;
+
+			/*
+			 * Set the virtual address field of all
+			 * EFI_MEMORY_RUNTIME entries to 0. This will signal
+			 * the incoming kernel that no virtual translation has
+			 * been installed.
+			 */
+			for (i = 0; i < map_size; i += desc_size) {
+				efi_memory_desc_t *p;
+
+				p = (efi_memory_desc_t *)((u8 *)memory_map + i);
+				if (!(p->attribute & EFI_MEMORY_RUNTIME))
+					break;
+				p->virt_addr = 0;
+			}
+		}
+		return EFI_SUCCESS;
+	}
 
 	pr_efi_err(sys_table, "Exit boot services failed.\n");
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] arm64/efi: remove free_boot_services() and friends
  2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2014-10-24 12:39 ` [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub Ard Biesheuvel
@ 2014-10-24 12:39 ` Ard Biesheuvel
  2014-10-24 12:39 ` [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code Ard Biesheuvel
  5 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we are calling SetVirtualAddressMap() from the stub, there is no
need to reserve boot-only memory regions, which implies that there is also
no reason to free them again later.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/efi.c | 123 +-----------------------------------------------
 1 file changed, 1 insertion(+), 122 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 324398c03acd..cbff3e8eff11 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -202,9 +202,7 @@ static __init void reserve_regions(void)
 		if (is_normal_ram(md))
 			early_init_dt_add_memory_arch(paddr, size);
 
-		if (is_reserve_region(md) ||
-		    md->type == EFI_BOOT_SERVICES_CODE ||
-		    md->type == EFI_BOOT_SERVICES_DATA) {
+		if (is_reserve_region(md)) {
 			memblock_reserve(paddr, size);
 			if (uefi_debug)
 				pr_cont("*");
@@ -217,123 +215,6 @@ static __init void reserve_regions(void)
 	set_bit(EFI_MEMMAP, &efi.flags);
 }
 
-
-static u64 __init free_one_region(u64 start, u64 end)
-{
-	u64 size = end - start;
-
-	if (uefi_debug)
-		pr_info("  EFI freeing: 0x%012llx-0x%012llx\n",	start, end - 1);
-
-	free_bootmem_late(start, size);
-	return size;
-}
-
-static u64 __init free_region(u64 start, u64 end)
-{
-	u64 map_start, map_end, total = 0;
-
-	if (end <= start)
-		return total;
-
-	map_start = (u64)memmap.phys_map;
-	map_end = PAGE_ALIGN(map_start + (memmap.map_end - memmap.map));
-	map_start &= PAGE_MASK;
-
-	if (start < map_end && end > map_start) {
-		/* region overlaps UEFI memmap */
-		if (start < map_start)
-			total += free_one_region(start, map_start);
-
-		if (map_end < end)
-			total += free_one_region(map_end, end);
-	} else
-		total += free_one_region(start, end);
-
-	return total;
-}
-
-static void __init free_boot_services(void)
-{
-	u64 total_freed = 0;
-	u64 keep_end, free_start, free_end;
-	efi_memory_desc_t *md;
-
-	/*
-	 * If kernel uses larger pages than UEFI, we have to be careful
-	 * not to inadvertantly free memory we want to keep if there is
-	 * overlap@the kernel page size alignment. We do not want to
-	 * free is_reserve_region() memory nor the UEFI memmap itself.
-	 *
-	 * The memory map is sorted, so we keep track of the end of
-	 * any previous region we want to keep, remember any region
-	 * we want to free and defer freeing it until we encounter
-	 * the next region we want to keep. This way, before freeing
-	 * it, we can clip it as needed to avoid freeing memory we
-	 * want to keep for UEFI.
-	 */
-
-	keep_end = 0;
-	free_start = 0;
-
-	for_each_efi_memory_desc(&memmap, md) {
-		u64 paddr, npages, size;
-
-		if (is_reserve_region(md)) {
-			/*
-			 * We don't want to free any memory from this region.
-			 */
-			if (free_start) {
-				/* adjust free_end then free region */
-				if (free_end > md->phys_addr)
-					free_end -= PAGE_SIZE;
-				total_freed += free_region(free_start, free_end);
-				free_start = 0;
-			}
-			keep_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
-			continue;
-		}
-
-		if (md->type != EFI_BOOT_SERVICES_CODE &&
-		    md->type != EFI_BOOT_SERVICES_DATA) {
-			/* no need to free this region */
-			continue;
-		}
-
-		/*
-		 * We want to free memory from this region.
-		 */
-		paddr = md->phys_addr;
-		npages = md->num_pages;
-		memrange_efi_to_native(&paddr, &npages);
-		size = npages << PAGE_SHIFT;
-
-		if (free_start) {
-			if (paddr <= free_end)
-				free_end = paddr + size;
-			else {
-				total_freed += free_region(free_start, free_end);
-				free_start = paddr;
-				free_end = paddr + size;
-			}
-		} else {
-			free_start = paddr;
-			free_end = paddr + size;
-		}
-		if (free_start < keep_end) {
-			free_start += PAGE_SIZE;
-			if (free_start >= free_end)
-				free_start = 0;
-		}
-	}
-	if (free_start)
-		total_freed += free_region(free_start, free_end);
-
-	if (total_freed)
-		pr_info("Freed 0x%llx bytes of EFI boot services memory",
-			total_freed);
-}
-
 void __init efi_init(void)
 {
 	struct efi_fdt_params params;
@@ -443,8 +324,6 @@ static int __init arm64_enter_virtual_mode(void)
 	}
 	set_bit(EFI_SYSTEM_TABLES, &efi.flags);
 
-	free_boot_services();
-
 	if (!efi_enabled(EFI_VIRTMAP)) {
 		pr_err("No UEFI virtual mapping was installed -- runtime services will not be available\n");
 		return -1;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code
  2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2014-10-24 12:39 ` [PATCH 5/6] arm64/efi: remove free_boot_services() and friends Ard Biesheuvel
@ 2014-10-24 12:39 ` Ard Biesheuvel
  2014-10-24 13:41   ` Grant Likely
  5 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have moved the call to SetVirtualAddressMap() to the stub,
UEFI has no use for the ID map, so we can drop the code that installs
ID mappings for UEFI memory regions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/efi.h |  4 ++--
 arch/arm64/include/asm/mmu.h |  2 --
 arch/arm64/kernel/efi.c      | 26 +-------------------------
 arch/arm64/kernel/setup.c    |  2 +-
 arch/arm64/mm/mmu.c          | 10 ----------
 5 files changed, 4 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index d752e5480096..09854df5a2a1 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -6,10 +6,10 @@
 
 #ifdef CONFIG_EFI
 extern void efi_init(void);
-extern void efi_idmap_init(void);
+extern void efi_virtmap_init(void);
 #else
 #define efi_init()
-#define efi_idmap_init()
+#define efi_virtmap_init()
 #endif
 
 void efi_load_rt_mapping(void);
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index bcf166043a8b..df33cf02300a 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -31,8 +31,6 @@ extern void paging_init(void);
 extern void setup_mm_for_reboot(void);
 extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
 extern void init_mem_pgprot(void);
-/* create an identity mapping for memory (or io if map_io is true) */
-extern void create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       int map_io, int map_xn);
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index cbff3e8eff11..02bd7e877f0b 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -53,27 +53,6 @@ static int __init is_normal_ram(efi_memory_desc_t *md)
 	return 0;
 }
 
-static void __init efi_setup_idmap(void)
-{
-	struct memblock_region *r;
-	efi_memory_desc_t *md;
-	u64 paddr, npages, size;
-
-	for_each_memblock(memory, r)
-		create_id_mapping(r->base, r->size, 0);
-
-	/* map runtime io spaces */
-	for_each_efi_memory_desc(&memmap, md) {
-		if (!(md->attribute & EFI_MEMORY_RUNTIME) || is_normal_ram(md))
-			continue;
-		paddr = md->phys_addr;
-		npages = md->num_pages;
-		memrange_efi_to_native(&paddr, &npages);
-		size = npages << PAGE_SHIFT;
-		create_id_mapping(paddr, size, 1);
-	}
-}
-
 /*
  * Translate a EFI virtual address into a physical address: this is necessary,
  * as some data members of the EFI system table are virtually remapped after
@@ -251,16 +230,13 @@ static struct mm_struct efi_mm = {
 	INIT_MM_CONTEXT(efi_mm)
 };
 
-void __init efi_idmap_init(void)
+void __init efi_virtmap_init(void)
 {
 	efi_memory_desc_t *md;
 
 	if (!efi_enabled(EFI_BOOT))
 		return;
 
-	/* boot time idmap_pg_dir is incomplete, so fill in missing parts */
-	efi_setup_idmap();
-
 	for_each_efi_memory_desc(&memmap, md) {
 		u64 paddr, npages, size;
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 2437196cc5d4..a6028490a28f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -392,7 +392,7 @@ void __init setup_arch(char **cmdline_p)
 	paging_init();
 	request_standard_resources();
 
-	efi_idmap_init();
+	efi_virtmap_init();
 
 	unflatten_device_tree();
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f7d17a5a1f56..19eae06aab81 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -282,16 +282,6 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 			 size, 0, 0);
 }
 
-void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
-{
-	if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
-		pr_warn("BUG: not creating id mapping for %pa\n", &addr);
-		return;
-	}
-	__create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
-			 addr, addr, size, map_io, 0);
-}
-
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       int map_io, int map_xn)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub
  2014-10-24 12:39 ` [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub Ard Biesheuvel
@ 2014-10-24 13:39   ` Grant Likely
  2014-10-24 13:47     ` Ard Biesheuvel
  2014-10-28  7:47   ` Dave Young
  1 sibling, 1 reply; 18+ messages in thread
From: Grant Likely @ 2014-10-24 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 24, 2014 at 1:39 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> In order to support kexec, the kernel needs to be able to deal with the
> state of the UEFI firmware after SetVirtualAddressMap() has been called.
> To avoid having separate code paths for non-kexec and kexec, let's move
> the call to SetVirtualAddressMap() to the stub: this will guarantee us
> that it will only be called once (since the stub is not executed during
> kexec), and ensures that the UEFI state is identical between kexec and
> normal boot.
>
> This implies that the layout of the virtual mapping needs to be created
> by the stub as well. All regions are rounded up to a naturally aligned
> multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
> in the UEFI memory map. The kernel proper reads those values and installs
> the mappings in a dedicated set of page tables that are swapped in during
> UEFI Runtime Services calls.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

A questions grounded in my ignorance below...

> ---
>  arch/arm64/include/asm/efi.h       |  19 +++-
>  arch/arm64/kernel/efi.c            | 205 +++++++++++++++++++------------------
>  drivers/firmware/efi/libstub/fdt.c | 104 ++++++++++++++++++-
>  3 files changed, 224 insertions(+), 104 deletions(-)
>
> @@ -69,9 +74,36 @@ static void __init efi_setup_idmap(void)
>         }
>  }
>
> +/*
> + * Translate a EFI virtual address into a physical address: this is necessary,
> + * as some data members of the EFI system table are virtually remapped after
> + * SetVirtualAddressMap() has been called.
> + */
> +static phys_addr_t __init efi_to_phys(unsigned long addr)
> +{
> +       efi_memory_desc_t *md;
> +
> +       for_each_efi_memory_desc(&memmap, md) {
> +               if (!(md->attribute & EFI_MEMORY_RUNTIME))
> +                       continue;
> +               if (md->virt_addr == 0)
> +                       /* no virtual mapping has been installed by the stub */
> +                       break;
> +               if (md->virt_addr < addr &&
> +                   (addr - md->virt_addr) < (md->num_pages << EFI_PAGE_SHIFT))
> +                       return md->phys_addr + addr - md->virt_addr;
> +       }
> +
> +       WARN_ONCE(1, "UEFI virtual mapping incomplete or missing -- no entry found for 0x%lx\n",
> +                 addr);
> +       return addr;
> +}
> +

Is function applicable to all EFI implementations? Should it be in the
common EFI code?

How much of this patch series is (theoretically) applicable to aarch32?

g.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code
  2014-10-24 12:39 ` [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code Ard Biesheuvel
@ 2014-10-24 13:41   ` Grant Likely
  2014-10-24 13:53     ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Grant Likely @ 2014-10-24 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 24, 2014 at 1:39 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> Now that we have moved the call to SetVirtualAddressMap() to the stub,
> UEFI has no use for the ID map, so we can drop the code that installs
> ID mappings for UEFI memory regions.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I have to say, this series makes me happy. :-)

This method will go a long way to catching UEFI implementations that
do incorrect things after exitbootservices is called. I'm assuming
that any attempt to access a region that boot services has not
requested will get trapped by the kernel, correct?

g.

> ---
>  arch/arm64/include/asm/efi.h |  4 ++--
>  arch/arm64/include/asm/mmu.h |  2 --
>  arch/arm64/kernel/efi.c      | 26 +-------------------------
>  arch/arm64/kernel/setup.c    |  2 +-
>  arch/arm64/mm/mmu.c          | 10 ----------
>  5 files changed, 4 insertions(+), 40 deletions(-)
>
> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> index d752e5480096..09854df5a2a1 100644
> --- a/arch/arm64/include/asm/efi.h
> +++ b/arch/arm64/include/asm/efi.h
> @@ -6,10 +6,10 @@
>
>  #ifdef CONFIG_EFI
>  extern void efi_init(void);
> -extern void efi_idmap_init(void);
> +extern void efi_virtmap_init(void);
>  #else
>  #define efi_init()
> -#define efi_idmap_init()
> +#define efi_virtmap_init()
>  #endif
>
>  void efi_load_rt_mapping(void);
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index bcf166043a8b..df33cf02300a 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -31,8 +31,6 @@ extern void paging_init(void);
>  extern void setup_mm_for_reboot(void);
>  extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
>  extern void init_mem_pgprot(void);
> -/* create an identity mapping for memory (or io if map_io is true) */
> -extern void create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io);
>  extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>                                unsigned long virt, phys_addr_t size,
>                                int map_io, int map_xn);
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index cbff3e8eff11..02bd7e877f0b 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -53,27 +53,6 @@ static int __init is_normal_ram(efi_memory_desc_t *md)
>         return 0;
>  }
>
> -static void __init efi_setup_idmap(void)
> -{
> -       struct memblock_region *r;
> -       efi_memory_desc_t *md;
> -       u64 paddr, npages, size;
> -
> -       for_each_memblock(memory, r)
> -               create_id_mapping(r->base, r->size, 0);
> -
> -       /* map runtime io spaces */
> -       for_each_efi_memory_desc(&memmap, md) {
> -               if (!(md->attribute & EFI_MEMORY_RUNTIME) || is_normal_ram(md))
> -                       continue;
> -               paddr = md->phys_addr;
> -               npages = md->num_pages;
> -               memrange_efi_to_native(&paddr, &npages);
> -               size = npages << PAGE_SHIFT;
> -               create_id_mapping(paddr, size, 1);
> -       }
> -}
> -
>  /*
>   * Translate a EFI virtual address into a physical address: this is necessary,
>   * as some data members of the EFI system table are virtually remapped after
> @@ -251,16 +230,13 @@ static struct mm_struct efi_mm = {
>         INIT_MM_CONTEXT(efi_mm)
>  };
>
> -void __init efi_idmap_init(void)
> +void __init efi_virtmap_init(void)
>  {
>         efi_memory_desc_t *md;
>
>         if (!efi_enabled(EFI_BOOT))
>                 return;
>
> -       /* boot time idmap_pg_dir is incomplete, so fill in missing parts */
> -       efi_setup_idmap();
> -
>         for_each_efi_memory_desc(&memmap, md) {
>                 u64 paddr, npages, size;
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 2437196cc5d4..a6028490a28f 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -392,7 +392,7 @@ void __init setup_arch(char **cmdline_p)
>         paging_init();
>         request_standard_resources();
>
> -       efi_idmap_init();
> +       efi_virtmap_init();
>
>         unflatten_device_tree();
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index f7d17a5a1f56..19eae06aab81 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -282,16 +282,6 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
>                          size, 0, 0);
>  }
>
> -void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
> -{
> -       if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
> -               pr_warn("BUG: not creating id mapping for %pa\n", &addr);
> -               return;
> -       }
> -       __create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
> -                        addr, addr, size, map_io, 0);
> -}
> -
>  void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>                                unsigned long virt, phys_addr_t size,
>                                int map_io, int map_xn)
> --
> 1.8.3.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub
  2014-10-24 13:39   ` Grant Likely
@ 2014-10-24 13:47     ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 October 2014 15:39, Grant Likely <grant.likely@linaro.org> wrote:
> On Fri, Oct 24, 2014 at 1:39 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> In order to support kexec, the kernel needs to be able to deal with the
>> state of the UEFI firmware after SetVirtualAddressMap() has been called.
>> To avoid having separate code paths for non-kexec and kexec, let's move
>> the call to SetVirtualAddressMap() to the stub: this will guarantee us
>> that it will only be called once (since the stub is not executed during
>> kexec), and ensures that the UEFI state is identical between kexec and
>> normal boot.
>>
>> This implies that the layout of the virtual mapping needs to be created
>> by the stub as well. All regions are rounded up to a naturally aligned
>> multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
>> in the UEFI memory map. The kernel proper reads those values and installs
>> the mappings in a dedicated set of page tables that are swapped in during
>> UEFI Runtime Services calls.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> A questions grounded in my ignorance below...
>
>> ---
>>  arch/arm64/include/asm/efi.h       |  19 +++-
>>  arch/arm64/kernel/efi.c            | 205 +++++++++++++++++++------------------
>>  drivers/firmware/efi/libstub/fdt.c | 104 ++++++++++++++++++-
>>  3 files changed, 224 insertions(+), 104 deletions(-)
>>
>> @@ -69,9 +74,36 @@ static void __init efi_setup_idmap(void)
>>         }
>>  }
>>
>> +/*
>> + * Translate a EFI virtual address into a physical address: this is necessary,
>> + * as some data members of the EFI system table are virtually remapped after
>> + * SetVirtualAddressMap() has been called.
>> + */
>> +static phys_addr_t __init efi_to_phys(unsigned long addr)
>> +{
>> +       efi_memory_desc_t *md;
>> +
>> +       for_each_efi_memory_desc(&memmap, md) {
>> +               if (!(md->attribute & EFI_MEMORY_RUNTIME))
>> +                       continue;
>> +               if (md->virt_addr == 0)
>> +                       /* no virtual mapping has been installed by the stub */
>> +                       break;
>> +               if (md->virt_addr < addr &&
>> +                   (addr - md->virt_addr) < (md->num_pages << EFI_PAGE_SHIFT))
>> +                       return md->phys_addr + addr - md->virt_addr;
>> +       }
>> +
>> +       WARN_ONCE(1, "UEFI virtual mapping incomplete or missing -- no entry found for 0x%lx\n",
>> +                 addr);
>> +       return addr;
>> +}
>> +
>
> Is function applicable to all EFI implementations? Should it be in the
> common EFI code?
>

This could potentially be used on any arch after
SetVirtualAddressMap() is called. It is essentially the inverse of
efi_lookup_mapped_addr(). So yes, it should probably move to generic
code as well, especially if we will be needing it for 32-bit ARM.

> How much of this patch series is (theoretically) applicable to aarch32?
>

I chose the virtual base such (0x4000_0000) that 32-bit ARM can use
the exact same approach, even with a 2/2 split. I haven't tried
implementing it yet, though

-- 
Ard.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code
  2014-10-24 13:41   ` Grant Likely
@ 2014-10-24 13:53     ` Ard Biesheuvel
  2014-10-24 13:55       ` Grant Likely
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 October 2014 15:41, Grant Likely <grant.likely@linaro.org> wrote:
> On Fri, Oct 24, 2014 at 1:39 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> Now that we have moved the call to SetVirtualAddressMap() to the stub,
>> UEFI has no use for the ID map, so we can drop the code that installs
>> ID mappings for UEFI memory regions.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> I have to say, this series makes me happy. :-)
>
> This method will go a long way to catching UEFI implementations that
> do incorrect things after exitbootservices is called. I'm assuming
> that any attempt to access a region that boot services has not
> requested will get trapped by the kernel, correct?
>

If we really want to catch firmware problems, we should probably wipe
all boot services regions between the calls to ExitBootServices() and
SetVirtualAddressMap(). Mark Salter's original approach here was
fairly cautious here, i.e., reserving boot services regions until
after the call to SetVirtualAddressMap(), but there is no point in
doing that for kexec, that's why I removed it.

-- 
Ard.



>> ---
>>  arch/arm64/include/asm/efi.h |  4 ++--
>>  arch/arm64/include/asm/mmu.h |  2 --
>>  arch/arm64/kernel/efi.c      | 26 +-------------------------
>>  arch/arm64/kernel/setup.c    |  2 +-
>>  arch/arm64/mm/mmu.c          | 10 ----------
>>  5 files changed, 4 insertions(+), 40 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
>> index d752e5480096..09854df5a2a1 100644
>> --- a/arch/arm64/include/asm/efi.h
>> +++ b/arch/arm64/include/asm/efi.h
>> @@ -6,10 +6,10 @@
>>
>>  #ifdef CONFIG_EFI
>>  extern void efi_init(void);
>> -extern void efi_idmap_init(void);
>> +extern void efi_virtmap_init(void);
>>  #else
>>  #define efi_init()
>> -#define efi_idmap_init()
>> +#define efi_virtmap_init()
>>  #endif
>>
>>  void efi_load_rt_mapping(void);
>> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
>> index bcf166043a8b..df33cf02300a 100644
>> --- a/arch/arm64/include/asm/mmu.h
>> +++ b/arch/arm64/include/asm/mmu.h
>> @@ -31,8 +31,6 @@ extern void paging_init(void);
>>  extern void setup_mm_for_reboot(void);
>>  extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
>>  extern void init_mem_pgprot(void);
>> -/* create an identity mapping for memory (or io if map_io is true) */
>> -extern void create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io);
>>  extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>>                                unsigned long virt, phys_addr_t size,
>>                                int map_io, int map_xn);
>> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
>> index cbff3e8eff11..02bd7e877f0b 100644
>> --- a/arch/arm64/kernel/efi.c
>> +++ b/arch/arm64/kernel/efi.c
>> @@ -53,27 +53,6 @@ static int __init is_normal_ram(efi_memory_desc_t *md)
>>         return 0;
>>  }
>>
>> -static void __init efi_setup_idmap(void)
>> -{
>> -       struct memblock_region *r;
>> -       efi_memory_desc_t *md;
>> -       u64 paddr, npages, size;
>> -
>> -       for_each_memblock(memory, r)
>> -               create_id_mapping(r->base, r->size, 0);
>> -
>> -       /* map runtime io spaces */
>> -       for_each_efi_memory_desc(&memmap, md) {
>> -               if (!(md->attribute & EFI_MEMORY_RUNTIME) || is_normal_ram(md))
>> -                       continue;
>> -               paddr = md->phys_addr;
>> -               npages = md->num_pages;
>> -               memrange_efi_to_native(&paddr, &npages);
>> -               size = npages << PAGE_SHIFT;
>> -               create_id_mapping(paddr, size, 1);
>> -       }
>> -}
>> -
>>  /*
>>   * Translate a EFI virtual address into a physical address: this is necessary,
>>   * as some data members of the EFI system table are virtually remapped after
>> @@ -251,16 +230,13 @@ static struct mm_struct efi_mm = {
>>         INIT_MM_CONTEXT(efi_mm)
>>  };
>>
>> -void __init efi_idmap_init(void)
>> +void __init efi_virtmap_init(void)
>>  {
>>         efi_memory_desc_t *md;
>>
>>         if (!efi_enabled(EFI_BOOT))
>>                 return;
>>
>> -       /* boot time idmap_pg_dir is incomplete, so fill in missing parts */
>> -       efi_setup_idmap();
>> -
>>         for_each_efi_memory_desc(&memmap, md) {
>>                 u64 paddr, npages, size;
>>
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 2437196cc5d4..a6028490a28f 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -392,7 +392,7 @@ void __init setup_arch(char **cmdline_p)
>>         paging_init();
>>         request_standard_resources();
>>
>> -       efi_idmap_init();
>> +       efi_virtmap_init();
>>
>>         unflatten_device_tree();
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index f7d17a5a1f56..19eae06aab81 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -282,16 +282,6 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
>>                          size, 0, 0);
>>  }
>>
>> -void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
>> -{
>> -       if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
>> -               pr_warn("BUG: not creating id mapping for %pa\n", &addr);
>> -               return;
>> -       }
>> -       __create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
>> -                        addr, addr, size, map_io, 0);
>> -}
>> -
>>  void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>>                                unsigned long virt, phys_addr_t size,
>>                                int map_io, int map_xn)
>> --
>> 1.8.3.2
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code
  2014-10-24 13:53     ` Ard Biesheuvel
@ 2014-10-24 13:55       ` Grant Likely
  0 siblings, 0 replies; 18+ messages in thread
From: Grant Likely @ 2014-10-24 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 24, 2014 at 2:53 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 24 October 2014 15:41, Grant Likely <grant.likely@linaro.org> wrote:
>> On Fri, Oct 24, 2014 at 1:39 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> Now that we have moved the call to SetVirtualAddressMap() to the stub,
>>> UEFI has no use for the ID map, so we can drop the code that installs
>>> ID mappings for UEFI memory regions.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>
>> I have to say, this series makes me happy. :-)
>>
>> This method will go a long way to catching UEFI implementations that
>> do incorrect things after exitbootservices is called. I'm assuming
>> that any attempt to access a region that boot services has not
>> requested will get trapped by the kernel, correct?
>>
>
> If we really want to catch firmware problems, we should probably wipe
> all boot services regions between the calls to ExitBootServices() and
> SetVirtualAddressMap(). Mark Salter's original approach here was
> fairly cautious here, i.e., reserving boot services regions until
> after the call to SetVirtualAddressMap(), but there is no point in
> doing that for kexec, that's why I removed it.

I quite like that idea. Let's do that and see if anyone screams in agony.

g.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables
  2014-10-24 12:39 ` [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables Ard Biesheuvel
@ 2014-10-24 15:17   ` Steve Capper
  2014-10-24 15:42     ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Steve Capper @ 2014-10-24 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 24, 2014 at 02:39:40PM +0200, Ard Biesheuvel wrote:
> For UEFI, we need to install the memory mappings used for Runtime Services
> in a dedicated set of page tables. Add create_pgd_mapping(), which allows
> us to allocate and install those page table entries early.
> This also adds a 'map_xn' option, that creates regions with the PXN and
> UXN bits set.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Hi Ard,

Some comments below...

> ---
>  arch/arm64/include/asm/mmu.h |  3 +++
>  arch/arm64/mm/mmu.c          | 28 ++++++++++++++++++++--------
>  2 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index c2f006c48bdb..bcf166043a8b 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -33,5 +33,8 @@ extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
>  extern void init_mem_pgprot(void);
>  /* create an identity mapping for memory (or io if map_io is true) */
>  extern void create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io);
> +extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> +			       unsigned long virt, phys_addr_t size,
> +			       int map_io, int map_xn);

I really don't like these map_io and map_xn parameters.
Further down we have logic "if (map_io)...", which is out of place, and
doesn't match the normal style of mapping page table entries.

Could we not instead have something like:
extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
			       unsigned long virt, phys_addr_t size,
			       pgprot_t prot_sect, pgprot_t prot_pte);

Then we can remove all the conditional logic for map_io, map_xn?

>  
>  #endif
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7eaa6a8c8467..f7d17a5a1f56 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -157,7 +157,7 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  
>  static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
>  				  unsigned long addr, unsigned long end,
> -				  phys_addr_t phys, int map_io)
> +				  phys_addr_t phys, int map_io, int map_xn)
>  {
>  	pmd_t *pmd;
>  	unsigned long next;
> @@ -167,6 +167,9 @@ static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
>  	if (map_io) {
>  		prot_sect = PROT_SECT_DEVICE_nGnRE;
>  		prot_pte = __pgprot(PROT_DEVICE_nGnRE);
> +	} else if (map_xn) {
> +		prot_sect = PROT_SECT_NORMAL;
> +		prot_pte = PAGE_KERNEL;
>  	} else {
>  		prot_sect = PROT_SECT_NORMAL_EXEC;
>  		prot_pte = PAGE_KERNEL_EXEC;

Ideally, this if block should be completely removed.

> @@ -203,7 +206,7 @@ static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
>  
>  static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>  				  unsigned long addr, unsigned long end,
> -				  unsigned long phys, int map_io)
> +				  unsigned long phys, int map_io, int map_xn)
>  {
>  	pud_t *pud;
>  	unsigned long next;
> @@ -221,7 +224,7 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>  		/*
>  		 * For 4K granule only, attempt to put down a 1GB block
>  		 */
> -		if (!map_io && (PAGE_SHIFT == 12) &&
> +		if (!map_io && !map_xn && (PAGE_SHIFT == 12) &&

Presumably the !map_io and !map_xn tests are here because of the
PROT_SECT_NORMAL_EXEC below? Is there another reason why a pud mapping
would be unsuitable for these?

>  		    ((addr | next | phys) & ~PUD_MASK) == 0) {
>  			pud_t old_pud = *pud;
>  			set_pud(pud, __pud(phys | PROT_SECT_NORMAL_EXEC));
> @@ -239,7 +242,8 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>  				flush_tlb_all();
>  			}
>  		} else {
> -			alloc_init_pmd(mm, pud, addr, next, phys, map_io);
> +			alloc_init_pmd(mm, pud, addr, next, phys, map_io,
> +				       map_xn);
>  		}
>  		phys += next - addr;
>  	} while (pud++, addr = next, addr != end);
> @@ -251,7 +255,7 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>   */
>  static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
>  				    phys_addr_t phys, unsigned long virt,
> -				    phys_addr_t size, int map_io)
> +				    phys_addr_t size, int map_io, int map_xn)
>  {
>  	unsigned long addr, length, end, next;
>  
> @@ -261,7 +265,7 @@ static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
>  	end = addr + length;
>  	do {
>  		next = pgd_addr_end(addr, end);
> -		alloc_init_pud(mm, pgd, addr, next, phys, map_io);
> +		alloc_init_pud(mm, pgd, addr, next, phys, map_io, map_xn);
>  		phys += next - addr;
>  	} while (pgd++, addr = next, addr != end);
>  }
> @@ -275,7 +279,7 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
>  		return;
>  	}
>  	__create_mapping(&init_mm, pgd_offset_k(virt & PAGE_MASK), phys, virt,
> -			 size, 0);
> +			 size, 0, 0);
>  }
>  
>  void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
> @@ -285,7 +289,15 @@ void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
>  		return;
>  	}
>  	__create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
> -			 addr, addr, size, map_io);
> +			 addr, addr, size, map_io, 0);
> +}
> +
> +void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> +			       unsigned long virt, phys_addr_t size,
> +			       int map_io, int map_xn)
> +{
> +	__create_mapping(mm, pgd_offset(mm, virt), phys, virt, size, map_io,
> +			 map_xn);
>  }
>  
>  static void __init map_mem(void)
> -- 
> 1.8.3.2
> 

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping()
  2014-10-24 12:39 ` [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping() Ard Biesheuvel
@ 2014-10-24 15:23   ` Steve Capper
  0 siblings, 0 replies; 18+ messages in thread
From: Steve Capper @ 2014-10-24 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 24, 2014 at 02:39:39PM +0200, Ard Biesheuvel wrote:
> Currently, swapper_pg_dir and idmap_pg_dir share the init_mm mm_struct
> instance. To allow the introduction of other pg_dir instances, for instance,
> for UEFI's mapping of Runtime Services, make the struct_mm instance an
> explicit argument that gets passed down to the pmd and pte instantiation
> functions. Note that the consumers (pmd_populate/pgd_populate) of the
> mm_struct argument don't actually inspect it, but let's fix it for
> correctness' sake.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Acked-by: Steve Capper <steve.capper@linaro.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables
  2014-10-24 15:17   ` Steve Capper
@ 2014-10-24 15:42     ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-24 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 October 2014 17:17, Steve Capper <steve.capper@linaro.org> wrote:
> On Fri, Oct 24, 2014 at 02:39:40PM +0200, Ard Biesheuvel wrote:
>> For UEFI, we need to install the memory mappings used for Runtime Services
>> in a dedicated set of page tables. Add create_pgd_mapping(), which allows
>> us to allocate and install those page table entries early.
>> This also adds a 'map_xn' option, that creates regions with the PXN and
>> UXN bits set.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Hi Ard,
>
> Some comments below...
>
>> ---
>>  arch/arm64/include/asm/mmu.h |  3 +++
>>  arch/arm64/mm/mmu.c          | 28 ++++++++++++++++++++--------
>>  2 files changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
>> index c2f006c48bdb..bcf166043a8b 100644
>> --- a/arch/arm64/include/asm/mmu.h
>> +++ b/arch/arm64/include/asm/mmu.h
>> @@ -33,5 +33,8 @@ extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
>>  extern void init_mem_pgprot(void);
>>  /* create an identity mapping for memory (or io if map_io is true) */
>>  extern void create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io);
>> +extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>> +                            unsigned long virt, phys_addr_t size,
>> +                            int map_io, int map_xn);
>
> I really don't like these map_io and map_xn parameters.
> Further down we have logic "if (map_io)...", which is out of place, and
> doesn't match the normal style of mapping page table entries.
>
> Could we not instead have something like:
> extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>                                unsigned long virt, phys_addr_t size,
>                                pgprot_t prot_sect, pgprot_t prot_pte);
>
> Then we can remove all the conditional logic for map_io, map_xn?
>

Yes, you are quite right. I was a bit lazy and made some incremental
changes allowing me to use this code in the way i need to for creating
non-executable mappings for EFI data and config tables etc. But I
agree it would be much better to clean it up the way you suggest.

>>
>>  #endif
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 7eaa6a8c8467..f7d17a5a1f56 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -157,7 +157,7 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>
>>  static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
>>                                 unsigned long addr, unsigned long end,
>> -                               phys_addr_t phys, int map_io)
>> +                               phys_addr_t phys, int map_io, int map_xn)
>>  {
>>       pmd_t *pmd;
>>       unsigned long next;
>> @@ -167,6 +167,9 @@ static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
>>       if (map_io) {
>>               prot_sect = PROT_SECT_DEVICE_nGnRE;
>>               prot_pte = __pgprot(PROT_DEVICE_nGnRE);
>> +     } else if (map_xn) {
>> +             prot_sect = PROT_SECT_NORMAL;
>> +             prot_pte = PAGE_KERNEL;
>>       } else {
>>               prot_sect = PROT_SECT_NORMAL_EXEC;
>>               prot_pte = PAGE_KERNEL_EXEC;
>
> Ideally, this if block should be completely removed.
>
>> @@ -203,7 +206,7 @@ static void __init alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
>>
>>  static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>>                                 unsigned long addr, unsigned long end,
>> -                               unsigned long phys, int map_io)
>> +                               unsigned long phys, int map_io, int map_xn)
>>  {
>>       pud_t *pud;
>>       unsigned long next;
>> @@ -221,7 +224,7 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>>               /*
>>                * For 4K granule only, attempt to put down a 1GB block
>>                */
>> -             if (!map_io && (PAGE_SHIFT == 12) &&
>> +             if (!map_io && !map_xn && (PAGE_SHIFT == 12) &&
>
> Presumably the !map_io and !map_xn tests are here because of the
> PROT_SECT_NORMAL_EXEC below? Is there another reason why a pud mapping
> would be unsuitable for these?
>

No, it's just highly unlikely that we would encounter one of that size
with map_io or map_xn set.

Cheers,
Ard.



>>                   ((addr | next | phys) & ~PUD_MASK) == 0) {
>>                       pud_t old_pud = *pud;
>>                       set_pud(pud, __pud(phys | PROT_SECT_NORMAL_EXEC));
>> @@ -239,7 +242,8 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>>                               flush_tlb_all();
>>                       }
>>               } else {
>> -                     alloc_init_pmd(mm, pud, addr, next, phys, map_io);
>> +                     alloc_init_pmd(mm, pud, addr, next, phys, map_io,
>> +                                    map_xn);
>>               }
>>               phys += next - addr;
>>       } while (pud++, addr = next, addr != end);
>> @@ -251,7 +255,7 @@ static void __init alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
>>   */
>>  static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
>>                                   phys_addr_t phys, unsigned long virt,
>> -                                 phys_addr_t size, int map_io)
>> +                                 phys_addr_t size, int map_io, int map_xn)
>>  {
>>       unsigned long addr, length, end, next;
>>
>> @@ -261,7 +265,7 @@ static void __init __create_mapping(struct mm_struct *mm, pgd_t *pgd,
>>       end = addr + length;
>>       do {
>>               next = pgd_addr_end(addr, end);
>> -             alloc_init_pud(mm, pgd, addr, next, phys, map_io);
>> +             alloc_init_pud(mm, pgd, addr, next, phys, map_io, map_xn);
>>               phys += next - addr;
>>       } while (pgd++, addr = next, addr != end);
>>  }
>> @@ -275,7 +279,7 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
>>               return;
>>       }
>>       __create_mapping(&init_mm, pgd_offset_k(virt & PAGE_MASK), phys, virt,
>> -                      size, 0);
>> +                      size, 0, 0);
>>  }
>>
>>  void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
>> @@ -285,7 +289,15 @@ void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
>>               return;
>>       }
>>       __create_mapping(&init_mm, &idmap_pg_dir[pgd_index(addr)],
>> -                      addr, addr, size, map_io);
>> +                      addr, addr, size, map_io, 0);
>> +}
>> +
>> +void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>> +                            unsigned long virt, phys_addr_t size,
>> +                            int map_io, int map_xn)
>> +{
>> +     __create_mapping(mm, pgd_offset(mm, virt), phys, virt, size, map_io,
>> +                      map_xn);
>>  }
>>
>>  static void __init map_mem(void)
>> --
>> 1.8.3.2
>>
>
> Cheers,
> --
> Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub
  2014-10-24 12:39 ` [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub Ard Biesheuvel
  2014-10-24 13:39   ` Grant Likely
@ 2014-10-28  7:47   ` Dave Young
  2014-10-28  7:57     ` Ard Biesheuvel
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Young @ 2014-10-28  7:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Ard

The approach looks good, thanks for the patchset.

Please see a few code comments inline..

On 10/24/14 at 02:39pm, Ard Biesheuvel wrote:
> In order to support kexec, the kernel needs to be able to deal with the
> state of the UEFI firmware after SetVirtualAddressMap() has been called.
> To avoid having separate code paths for non-kexec and kexec, let's move
> the call to SetVirtualAddressMap() to the stub: this will guarantee us
> that it will only be called once (since the stub is not executed during
> kexec), and ensures that the UEFI state is identical between kexec and
> normal boot.
> 
> This implies that the layout of the virtual mapping needs to be created
> by the stub as well. All regions are rounded up to a naturally aligned
> multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
> in the UEFI memory map. The kernel proper reads those values and installs
> the mappings in a dedicated set of page tables that are swapped in during
> UEFI Runtime Services calls.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/efi.h       |  19 +++-
>  arch/arm64/kernel/efi.c            | 205 +++++++++++++++++++------------------
>  drivers/firmware/efi/libstub/fdt.c | 104 ++++++++++++++++++-
>  3 files changed, 224 insertions(+), 104 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> index a34fd3b12e2b..d752e5480096 100644
> --- a/arch/arm64/include/asm/efi.h
> +++ b/arch/arm64/include/asm/efi.h
> @@ -12,23 +12,32 @@ extern void efi_idmap_init(void);
>  #define efi_idmap_init()
>  #endif
>  
> +void efi_load_rt_mapping(void);
> +void efi_unload_rt_mapping(void);
> +
>  #define efi_call_virt(f, ...)						\
>  ({									\
> -	efi_##f##_t *__f = efi.systab->runtime->f;			\
> +	efi_##f##_t *__f;						\
>  	efi_status_t __s;						\
>  									\
> -	kernel_neon_begin();						\
> +	kernel_neon_begin();	/* disables preemption */		\
> +	efi_load_rt_mapping();						\
> +	__f = efi.systab->runtime->f;					\
>  	__s = __f(__VA_ARGS__);						\
> +	efi_unload_rt_mapping();					\
>  	kernel_neon_end();						\
>  	__s;								\
>  })
>  
>  #define __efi_call_virt(f, ...)						\
>  ({									\
> -	efi_##f##_t *__f = efi.systab->runtime->f;			\
> +	efi_##f##_t *__f;						\
>  									\
> -	kernel_neon_begin();						\
> +	kernel_neon_begin();	/* disables preemption */		\
> +	efi_load_rt_mapping();						\
> +	__f = efi.systab->runtime->f;					\
>  	__f(__VA_ARGS__);						\
> +	efi_unload_rt_mapping();					\
>  	kernel_neon_end();						\
>  })
>  
> @@ -44,4 +53,6 @@ extern void efi_idmap_init(void);
>  
>  #define efi_call_early(f, ...) sys_table_arg->boottime->f(__VA_ARGS__)
>  
> +#define EFI_VIRTMAP		EFI_ARCH_1
> +
>  #endif /* _ASM_EFI_H */
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index baab9344a32b..324398c03acd 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -20,16 +20,21 @@
>  #include <linux/of_fdt.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> +#include <linux/mm_types.h>
> +#include <linux/rbtree.h>
> +#include <linux/rwsem.h>
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/efi.h>
>  #include <asm/tlbflush.h>
> +#include <asm/pgtable.h>
>  #include <asm/mmu_context.h>
> +#include <asm/mmu.h>
>  
>  struct efi_memory_map memmap;
>  
> -static efi_runtime_services_t *runtime;
> -
>  static u64 efi_system_table;
>  
>  static int uefi_debug __initdata;
> @@ -69,9 +74,36 @@ static void __init efi_setup_idmap(void)
>  	}
>  }
>  
> +/*
> + * Translate a EFI virtual address into a physical address: this is necessary,
> + * as some data members of the EFI system table are virtually remapped after
> + * SetVirtualAddressMap() has been called.
> + */
> +static phys_addr_t __init efi_to_phys(unsigned long addr)
> +{
> +	efi_memory_desc_t *md;
> +
> +	for_each_efi_memory_desc(&memmap, md) {
> +		if (!(md->attribute & EFI_MEMORY_RUNTIME))
> +			continue;
> +		if (md->virt_addr == 0)
> +			/* no virtual mapping has been installed by the stub */
> +			break;
> +		if (md->virt_addr < addr &&
> +		    (addr - md->virt_addr) < (md->num_pages << EFI_PAGE_SHIFT))
> +			return md->phys_addr + addr - md->virt_addr;
> +	}
> +
> +	WARN_ONCE(1, "UEFI virtual mapping incomplete or missing -- no entry found for 0x%lx\n",
> +		  addr);

die here instead of warn looks better, addr could be anything which does not match the virt maps.

> +	return addr;
> +}
> +
>  static int __init uefi_init(void)
>  {
>  	efi_char16_t *c16;
> +	void *config_tables;
> +	u64 table_size;
>  	char vendor[100] = "unknown";
>  	int i, retval;
>  
> @@ -99,7 +131,7 @@ static int __init uefi_init(void)
>  			efi.systab->hdr.revision & 0xffff);
>  
>  	/* Show what we know for posterity */
> -	c16 = early_memremap(efi.systab->fw_vendor,
> +	c16 = early_memremap(efi_to_phys(efi.systab->fw_vendor),
>  			     sizeof(vendor));
>  	if (c16) {
>  		for (i = 0; i < (int) sizeof(vendor) - 1 && *c16; ++i)
> @@ -112,8 +144,14 @@ static int __init uefi_init(void)
>  		efi.systab->hdr.revision >> 16,
>  		efi.systab->hdr.revision & 0xffff, vendor);
>  
> -	retval = efi_config_init(NULL);
> +	table_size = sizeof(efi_config_table_64_t) * efi.systab->nr_tables;
> +	config_tables = early_memremap(efi_to_phys(efi.systab->tables),
> +				       table_size);

There should be error handling for early_memremap.

> +
> +	retval = efi_config_parse_tables(config_tables,
> +					 efi.systab->nr_tables, NULL);
>  
> +	early_memunmap(config_tables, table_size);
>  out:
>  	early_memunmap(efi.systab,  sizeof(efi_system_table_t));
>  	return retval;
> @@ -319,60 +357,64 @@ void __init efi_init(void)
>  	reserve_regions();
>  }
>  
> +static pgd_t efi_pgd[PTRS_PER_PGD] __page_aligned_bss;
> +
> +static struct mm_struct efi_mm = {
> +	.mm_rb			= RB_ROOT,
> +	.pgd			= efi_pgd,
> +	.mm_users		= ATOMIC_INIT(2),
> +	.mm_count		= ATOMIC_INIT(1),
> +	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
> +	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
> +	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
> +	INIT_MM_CONTEXT(efi_mm)
> +};
> +
>  void __init efi_idmap_init(void)
>  {
> +	efi_memory_desc_t *md;
> +
>  	if (!efi_enabled(EFI_BOOT))
>  		return;
>  
>  	/* boot time idmap_pg_dir is incomplete, so fill in missing parts */
>  	efi_setup_idmap();
> -}
> -
> -static int __init remap_region(efi_memory_desc_t *md, void **new)
> -{
> -	u64 paddr, vaddr, npages, size;
> -
> -	paddr = md->phys_addr;
> -	npages = md->num_pages;
> -	memrange_efi_to_native(&paddr, &npages);
> -	size = npages << PAGE_SHIFT;
> -
> -	if (is_normal_ram(md))
> -		vaddr = (__force u64)ioremap_cache(paddr, size);
> -	else
> -		vaddr = (__force u64)ioremap(paddr, size);
>  
> -	if (!vaddr) {
> -		pr_err("Unable to remap 0x%llx pages @ %p\n",
> -		       npages, (void *)paddr);
> -		return 0;
> -	}
> +	for_each_efi_memory_desc(&memmap, md) {
> +		u64 paddr, npages, size;
>  
> -	/* adjust for any rounding when EFI and system pagesize differs */
> -	md->virt_addr = vaddr + (md->phys_addr - paddr);
> +		if (!(md->attribute & EFI_MEMORY_RUNTIME))
> +			continue;
> +		if (md->virt_addr == 0)
> +			/* no virtual mapping has been installed by the stub */
> +			return;
>  
> -	if (uefi_debug)
> -		pr_info("  EFI remap 0x%012llx => %p\n",
> -			md->phys_addr, (void *)md->virt_addr);
> +		paddr = md->phys_addr;
> +		npages = md->num_pages;
> +		memrange_efi_to_native(&paddr, &npages);
> +		size = npages << PAGE_SHIFT;
>  
> -	memcpy(*new, md, memmap.desc_size);
> -	*new += memmap.desc_size;
> +		if (uefi_debug)
> +			pr_info("  EFI remap 0x%012llx => %p\n",
> +				md->phys_addr, (void *)md->virt_addr);
>  
> -	return 1;
> +		/*
> +		 * Only regions of type EFI_RUNTIME_SERVICES_CODE need to be
> +		 * executable, everything else can be mapped with the XN bits
> +		 * set. Unfortunately, we cannot map those code regions write
> +		 * protect, as they may contain read-write .data sections as
> +		 * well.
> +		 */
> +		create_pgd_mapping(&efi_mm, paddr, md->virt_addr, size,
> +				   !is_normal_ram(md),
> +				   md->type != EFI_RUNTIME_SERVICES_CODE);
> +	}
> +	set_bit(EFI_VIRTMAP, &efi.flags);
>  }
>  
> -/*
> - * Switch UEFI from an identity map to a kernel virtual map
> - */
>  static int __init arm64_enter_virtual_mode(void)
>  {
> -	efi_memory_desc_t *md;
> -	phys_addr_t virtmap_phys;
> -	void *virtmap, *virt_md;
> -	efi_status_t status;
>  	u64 mapsize;
> -	int count = 0;
> -	unsigned long flags;
>  
>  	if (!efi_enabled(EFI_MEMMAP))
>  		return 0;
> @@ -393,79 +435,28 @@ static int __init arm64_enter_virtual_mode(void)
>  
>  	efi.memmap = &memmap;
>  
> -	/* Map the runtime regions */
> -	virtmap = kmalloc(mapsize, GFP_KERNEL);
> -	if (!virtmap) {
> -		pr_err("Failed to allocate EFI virtual memmap\n");
> -		return -1;
> -	}
> -	virtmap_phys = virt_to_phys(virtmap);
> -	virt_md = virtmap;
> -
> -	for_each_efi_memory_desc(&memmap, md) {
> -		if (!(md->attribute & EFI_MEMORY_RUNTIME))
> -			continue;
> -		if (!remap_region(md, &virt_md))
> -			goto err_unmap;
> -		++count;
> -	}
> -
> -	efi.systab = (__force void *)efi_lookup_mapped_addr(efi_system_table);
> +	efi.systab = (__force void *)ioremap_cache(efi_system_table,
> +						   sizeof(efi_system_table_t));
>  	if (!efi.systab) {
> -		/*
> -		 * If we have no virtual mapping for the System Table at this
> -		 * point, the memory map doesn't cover the physical offset where
> -		 * it resides. This means the System Table will be inaccessible
> -		 * to Runtime Services themselves once the virtual mapping is
> -		 * installed.
> -		 */
> -		pr_err("Failed to remap EFI System Table -- buggy firmware?\n");
> -		goto err_unmap;
> +		pr_err("Failed to remap EFI System Table\n");
> +		return -1;
>  	}
>  	set_bit(EFI_SYSTEM_TABLES, &efi.flags);
>  
> -	local_irq_save(flags);
> -	cpu_switch_mm(idmap_pg_dir, &init_mm);
> -
> -	/* Call SetVirtualAddressMap with the physical address of the map */
> -	runtime = efi.systab->runtime;
> -	efi.set_virtual_address_map = runtime->set_virtual_address_map;
> -
> -	status = efi.set_virtual_address_map(count * memmap.desc_size,
> -					     memmap.desc_size,
> -					     memmap.desc_version,
> -					     (efi_memory_desc_t *)virtmap_phys);
> -	cpu_set_reserved_ttbr0();
> -	flush_tlb_all();
> -	local_irq_restore(flags);
> -
> -	kfree(virtmap);
> -
>  	free_boot_services();
>  
> -	if (status != EFI_SUCCESS) {
> -		pr_err("Failed to set EFI virtual address map! [%lx]\n",
> -			status);
> +	if (!efi_enabled(EFI_VIRTMAP)) {
> +		pr_err("No UEFI virtual mapping was installed -- runtime services will not be available\n");
>  		return -1;
>  	}
>  
>  	/* Set up runtime services function pointers */
> -	runtime = efi.systab->runtime;
>  	efi_native_runtime_setup();
>  	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
>  
>  	efi.runtime_version = efi.systab->hdr.revision;
>  
>  	return 0;
> -
> -err_unmap:
> -	/* unmap all mappings that succeeded: there are 'count' of those */
> -	for (virt_md = virtmap; count--; virt_md += memmap.desc_size) {
> -		md = virt_md;
> -		iounmap((__force void __iomem *)md->virt_addr);
> -	}
> -	kfree(virtmap);
> -	return -1;
>  }
>  early_initcall(arm64_enter_virtual_mode);
>  
> @@ -482,3 +473,21 @@ static int __init arm64_dmi_init(void)
>  	return 0;
>  }
>  core_initcall(arm64_dmi_init);
> +
> +static void efi_set_pgd(struct mm_struct *mm)
> +{
> +	cpu_switch_mm(mm->pgd, mm);
> +	flush_tlb_all();
> +	if (icache_is_aivivt())
> +		__flush_icache_all();
> +}
> +
> +void efi_load_rt_mapping(void)
> +{
> +	efi_set_pgd(&efi_mm);
> +}
> +
> +void efi_unload_rt_mapping(void)
> +{
> +	efi_set_pgd(current->active_mm);
> +}
> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
> index c846a9608cbd..9a5f6e54a423 100644
> --- a/drivers/firmware/efi/libstub/fdt.c
> +++ b/drivers/firmware/efi/libstub/fdt.c
> @@ -168,6 +168,68 @@ fdt_set_fail:
>  #endif
>  
>  /*
> + * This is the base address at which to start allocating virtual memory ranges
> + * for UEFI Runtime Services. This is a userland range so that we can use any
> + * allocation we choose, and eliminate the risk of a conflict after kexec.
> + */
> +#define EFI_RT_VIRTUAL_BASE	0x40000000
> +
> +static void update_memory_map(efi_memory_desc_t *memory_map,
> +			      unsigned long map_size, unsigned long desc_size,
> +			      int *count)
> +{
> +	u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
> +	union {
> +		efi_memory_desc_t entry;
> +		u8 pad[desc_size];
> +	} *p, *q, tmp;
> +	int i = map_size / desc_size;
> +
> +	p = (void *)memory_map;
> +	for (q = p; i >= 0; i--, q++) {
> +		u64 paddr, size;
> +
> +		if (!(q->entry.attribute & EFI_MEMORY_RUNTIME))
> +			continue;
> +
> +		/*
> +		 * Swap the entries around so that all EFI_MEMORY_RUNTIME
> +		 * entries bubble to the top. This will allow us to reuse the
> +		 * table as input to SetVirtualAddressMap().
> +		 */
> +		if (q != p) {
> +			tmp = *p;
> +			*p = *q;
> +			*q = tmp;
> +		}
> +
> +		/*
> +		 * Make the mapping compatible with 64k pages: this allows
> +		 * a 4k page size kernel to kexec a 64k page size kernel and
> +		 * vice versa.
> +		 */
> +		paddr = round_down(p->entry.phys_addr, SZ_64K);
> +		size = round_up(p->entry.num_pages * EFI_PAGE_SIZE +
> +				p->entry.phys_addr - paddr, SZ_64K);
> +
> +		/*
> +		 * Avoid wasting memory on PTEs by choosing a virtual base that
> +		 * is compatible with section mappings if this region has the
> +		 * appropriate size and physical alignment. (Sections are 2 MB
> +		 * on 4k granule kernels)
> +		 */
> +		if (IS_ALIGNED(p->entry.phys_addr, SZ_2M) && size >= SZ_2M)
> +			efi_virt_base = round_up(efi_virt_base, SZ_2M);
> +
> +		p->entry.virt_addr = efi_virt_base + p->entry.phys_addr - paddr;
> +		efi_virt_base += size;
> +
> +		++p;
> +		++*count;
> +	}
> +}
> +
> +/*
>   * Allocate memory for a new FDT, then add EFI, commandline, and
>   * initrd related fields to the FDT.  This routine increases the
>   * FDT allocation size until the allocated memory is large
> @@ -196,6 +258,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  	efi_memory_desc_t *memory_map;
>  	unsigned long new_fdt_size;
>  	efi_status_t status;
> +	int runtime_entry_count = 0;
>  
>  	/*
>  	 * Estimate size of new FDT, and allocate memory for it. We
> @@ -248,12 +311,49 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>  		}
>  	}
>  
> +	/*
> +	 * Update the memory map with virtual addresses, and reorder the entries
> +	 * so that we can pass it straight into SetVirtualAddressMap()
> +	 */
> +	update_memory_map(memory_map, map_size, desc_size,
> +			  &runtime_entry_count);
> +
>  	/* Now we are ready to exit_boot_services.*/
>  	status = sys_table->boottime->exit_boot_services(handle, mmap_key);
>  
> +	if (status == EFI_SUCCESS) {
> +		efi_set_virtual_address_map_t *svam;
> +
> +		/* Install the new virtual address map */
> +		svam = sys_table->runtime->set_virtual_address_map;
> +		status = svam(runtime_entry_count * desc_size, desc_size,
> +			      desc_ver, memory_map);
>  
> -	if (status == EFI_SUCCESS)
> -		return status;
> +		/*
> +		 * We are beyond the point of no return here, so if the call to
> +		 * SetVirtualAddressMap() failed, we need to signal that to the
> +		 * incoming kernel but proceed normally otherwise.
> +		 */
> +		if (status != EFI_SUCCESS) {
> +			int i;
> +
> +			/*
> +			 * Set the virtual address field of all
> +			 * EFI_MEMORY_RUNTIME entries to 0. This will signal
> +			 * the incoming kernel that no virtual translation has
> +			 * been installed.
> +			 */
> +			for (i = 0; i < map_size; i += desc_size) {
> +				efi_memory_desc_t *p;
> +
> +				p = (efi_memory_desc_t *)((u8 *)memory_map + i);
> +				if (!(p->attribute & EFI_MEMORY_RUNTIME))
> +					break;

should it be continue instead of break?

> +				p->virt_addr = 0;

In X86 kernel panics in case set_virtual_address_map, Matt mentioned that it's
not reasonable to continue. But for arm64 I'm not sure it's same case.

> +			}
> +		}
> +		return EFI_SUCCESS;
> +	}
>  
>  	pr_efi_err(sys_table, "Exit boot services failed.\n");
>  
> -- 
> 1.8.3.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub
  2014-10-28  7:47   ` Dave Young
@ 2014-10-28  7:57     ` Ard Biesheuvel
  2014-10-28  8:16       ` Dave Young
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2014-10-28  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 October 2014 08:47, Dave Young <dyoung@redhat.com> wrote:
> Hi, Ard
>
> The approach looks good, thanks for the patchset.
>
> Please see a few code comments inline..
>

Hi Dave,

Thanks for the review.

> On 10/24/14 at 02:39pm, Ard Biesheuvel wrote:
>> In order to support kexec, the kernel needs to be able to deal with the
>> state of the UEFI firmware after SetVirtualAddressMap() has been called.
>> To avoid having separate code paths for non-kexec and kexec, let's move
>> the call to SetVirtualAddressMap() to the stub: this will guarantee us
>> that it will only be called once (since the stub is not executed during
>> kexec), and ensures that the UEFI state is identical between kexec and
>> normal boot.
>>
>> This implies that the layout of the virtual mapping needs to be created
>> by the stub as well. All regions are rounded up to a naturally aligned
>> multiple of 64 KB (for compatibility with 64k pages kernels) and recorded
>> in the UEFI memory map. The kernel proper reads those values and installs
>> the mappings in a dedicated set of page tables that are swapped in during
>> UEFI Runtime Services calls.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/efi.h       |  19 +++-
>>  arch/arm64/kernel/efi.c            | 205 +++++++++++++++++++------------------
>>  drivers/firmware/efi/libstub/fdt.c | 104 ++++++++++++++++++-
>>  3 files changed, 224 insertions(+), 104 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
>> index a34fd3b12e2b..d752e5480096 100644
>> --- a/arch/arm64/include/asm/efi.h
>> +++ b/arch/arm64/include/asm/efi.h
>> @@ -12,23 +12,32 @@ extern void efi_idmap_init(void);
>>  #define efi_idmap_init()
>>  #endif
>>
>> +void efi_load_rt_mapping(void);
>> +void efi_unload_rt_mapping(void);
>> +
>>  #define efi_call_virt(f, ...)                                                \
>>  ({                                                                   \
>> -     efi_##f##_t *__f = efi.systab->runtime->f;                      \
>> +     efi_##f##_t *__f;                                               \
>>       efi_status_t __s;                                               \
>>                                                                       \
>> -     kernel_neon_begin();                                            \
>> +     kernel_neon_begin();    /* disables preemption */               \
>> +     efi_load_rt_mapping();                                          \
>> +     __f = efi.systab->runtime->f;                                   \
>>       __s = __f(__VA_ARGS__);                                         \
>> +     efi_unload_rt_mapping();                                        \
>>       kernel_neon_end();                                              \
>>       __s;                                                            \
>>  })
>>
>>  #define __efi_call_virt(f, ...)                                              \
>>  ({                                                                   \
>> -     efi_##f##_t *__f = efi.systab->runtime->f;                      \
>> +     efi_##f##_t *__f;                                               \
>>                                                                       \
>> -     kernel_neon_begin();                                            \
>> +     kernel_neon_begin();    /* disables preemption */               \
>> +     efi_load_rt_mapping();                                          \
>> +     __f = efi.systab->runtime->f;                                   \
>>       __f(__VA_ARGS__);                                               \
>> +     efi_unload_rt_mapping();                                        \
>>       kernel_neon_end();                                              \
>>  })
>>
>> @@ -44,4 +53,6 @@ extern void efi_idmap_init(void);
>>
>>  #define efi_call_early(f, ...) sys_table_arg->boottime->f(__VA_ARGS__)
>>
>> +#define EFI_VIRTMAP          EFI_ARCH_1
>> +
>>  #endif /* _ASM_EFI_H */
>> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
>> index baab9344a32b..324398c03acd 100644
>> --- a/arch/arm64/kernel/efi.c
>> +++ b/arch/arm64/kernel/efi.c
>> @@ -20,16 +20,21 @@
>>  #include <linux/of_fdt.h>
>>  #include <linux/sched.h>
>>  #include <linux/slab.h>
>> +#include <linux/mm_types.h>
>> +#include <linux/rbtree.h>
>> +#include <linux/rwsem.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/atomic.h>
>>
>>  #include <asm/cacheflush.h>
>>  #include <asm/efi.h>
>>  #include <asm/tlbflush.h>
>> +#include <asm/pgtable.h>
>>  #include <asm/mmu_context.h>
>> +#include <asm/mmu.h>
>>
>>  struct efi_memory_map memmap;
>>
>> -static efi_runtime_services_t *runtime;
>> -
>>  static u64 efi_system_table;
>>
>>  static int uefi_debug __initdata;
>> @@ -69,9 +74,36 @@ static void __init efi_setup_idmap(void)
>>       }
>>  }
>>
>> +/*
>> + * Translate a EFI virtual address into a physical address: this is necessary,
>> + * as some data members of the EFI system table are virtually remapped after
>> + * SetVirtualAddressMap() has been called.
>> + */
>> +static phys_addr_t __init efi_to_phys(unsigned long addr)
>> +{
>> +     efi_memory_desc_t *md;
>> +
>> +     for_each_efi_memory_desc(&memmap, md) {
>> +             if (!(md->attribute & EFI_MEMORY_RUNTIME))
>> +                     continue;
>> +             if (md->virt_addr == 0)
>> +                     /* no virtual mapping has been installed by the stub */
>> +                     break;
>> +             if (md->virt_addr < addr &&
>> +                 (addr - md->virt_addr) < (md->num_pages << EFI_PAGE_SHIFT))
>> +                     return md->phys_addr + addr - md->virt_addr;
>> +     }
>> +
>> +     WARN_ONCE(1, "UEFI virtual mapping incomplete or missing -- no entry found for 0x%lx\n",
>> +               addr);
>
> die here instead of warn looks better, addr could be anything which does not match the virt maps.
>

The general idea was (and below as well) that SetVirtualAddressMap()
is called from the stub, at a point where we can't easily panic/error
out in a meaningful way, as we have just called ExitBootServices() so
there is no console etc. So instead, I zero out all the virt_addr
fields, which indicates that no virtual mapping is installed. That
does not necessarily mean (on arm64 at least) that we will not be able
to bring up ACPI etc and moan in a way that could catch someone's
attention rather than only on the console.

I chose a WARN_ONCE() because it gives a nice fat stack dump in the
kernel log, but as the RT services regions and fw_vendor and
config_table are still id mapped, I don't see a compelling reason to
panic right here.

>> +     return addr;
>> +}
>> +
>>  static int __init uefi_init(void)
>>  {
>>       efi_char16_t *c16;
>> +     void *config_tables;
>> +     u64 table_size;
>>       char vendor[100] = "unknown";
>>       int i, retval;
>>
>> @@ -99,7 +131,7 @@ static int __init uefi_init(void)
>>                       efi.systab->hdr.revision & 0xffff);
>>
>>       /* Show what we know for posterity */
>> -     c16 = early_memremap(efi.systab->fw_vendor,
>> +     c16 = early_memremap(efi_to_phys(efi.systab->fw_vendor),
>>                            sizeof(vendor));
>>       if (c16) {
>>               for (i = 0; i < (int) sizeof(vendor) - 1 && *c16; ++i)
>> @@ -112,8 +144,14 @@ static int __init uefi_init(void)
>>               efi.systab->hdr.revision >> 16,
>>               efi.systab->hdr.revision & 0xffff, vendor);
>>
>> -     retval = efi_config_init(NULL);
>> +     table_size = sizeof(efi_config_table_64_t) * efi.systab->nr_tables;
>> +     config_tables = early_memremap(efi_to_phys(efi.systab->tables),
>> +                                    table_size);
>
> There should be error handling for early_memremap.
>
>> +
>> +     retval = efi_config_parse_tables(config_tables,
>> +                                      efi.systab->nr_tables, NULL);
>>
>> +     early_memunmap(config_tables, table_size);
>>  out:
>>       early_memunmap(efi.systab,  sizeof(efi_system_table_t));
>>       return retval;
>> @@ -319,60 +357,64 @@ void __init efi_init(void)
>>       reserve_regions();
>>  }
>>
>> +static pgd_t efi_pgd[PTRS_PER_PGD] __page_aligned_bss;
>> +
>> +static struct mm_struct efi_mm = {
>> +     .mm_rb                  = RB_ROOT,
>> +     .pgd                    = efi_pgd,
>> +     .mm_users               = ATOMIC_INIT(2),
>> +     .mm_count               = ATOMIC_INIT(1),
>> +     .mmap_sem               = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
>> +     .page_table_lock        = __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
>> +     .mmlist                 = LIST_HEAD_INIT(efi_mm.mmlist),
>> +     INIT_MM_CONTEXT(efi_mm)
>> +};
>> +
>>  void __init efi_idmap_init(void)
>>  {
>> +     efi_memory_desc_t *md;
>> +
>>       if (!efi_enabled(EFI_BOOT))
>>               return;
>>
>>       /* boot time idmap_pg_dir is incomplete, so fill in missing parts */
>>       efi_setup_idmap();
>> -}
>> -
>> -static int __init remap_region(efi_memory_desc_t *md, void **new)
>> -{
>> -     u64 paddr, vaddr, npages, size;
>> -
>> -     paddr = md->phys_addr;
>> -     npages = md->num_pages;
>> -     memrange_efi_to_native(&paddr, &npages);
>> -     size = npages << PAGE_SHIFT;
>> -
>> -     if (is_normal_ram(md))
>> -             vaddr = (__force u64)ioremap_cache(paddr, size);
>> -     else
>> -             vaddr = (__force u64)ioremap(paddr, size);
>>
>> -     if (!vaddr) {
>> -             pr_err("Unable to remap 0x%llx pages @ %p\n",
>> -                    npages, (void *)paddr);
>> -             return 0;
>> -     }
>> +     for_each_efi_memory_desc(&memmap, md) {
>> +             u64 paddr, npages, size;
>>
>> -     /* adjust for any rounding when EFI and system pagesize differs */
>> -     md->virt_addr = vaddr + (md->phys_addr - paddr);
>> +             if (!(md->attribute & EFI_MEMORY_RUNTIME))
>> +                     continue;
>> +             if (md->virt_addr == 0)
>> +                     /* no virtual mapping has been installed by the stub */
>> +                     return;
>>
>> -     if (uefi_debug)
>> -             pr_info("  EFI remap 0x%012llx => %p\n",
>> -                     md->phys_addr, (void *)md->virt_addr);
>> +             paddr = md->phys_addr;
>> +             npages = md->num_pages;
>> +             memrange_efi_to_native(&paddr, &npages);
>> +             size = npages << PAGE_SHIFT;
>>
>> -     memcpy(*new, md, memmap.desc_size);
>> -     *new += memmap.desc_size;
>> +             if (uefi_debug)
>> +                     pr_info("  EFI remap 0x%012llx => %p\n",
>> +                             md->phys_addr, (void *)md->virt_addr);
>>
>> -     return 1;
>> +             /*
>> +              * Only regions of type EFI_RUNTIME_SERVICES_CODE need to be
>> +              * executable, everything else can be mapped with the XN bits
>> +              * set. Unfortunately, we cannot map those code regions write
>> +              * protect, as they may contain read-write .data sections as
>> +              * well.
>> +              */
>> +             create_pgd_mapping(&efi_mm, paddr, md->virt_addr, size,
>> +                                !is_normal_ram(md),
>> +                                md->type != EFI_RUNTIME_SERVICES_CODE);
>> +     }
>> +     set_bit(EFI_VIRTMAP, &efi.flags);
>>  }
>>
>> -/*
>> - * Switch UEFI from an identity map to a kernel virtual map
>> - */
>>  static int __init arm64_enter_virtual_mode(void)
>>  {
>> -     efi_memory_desc_t *md;
>> -     phys_addr_t virtmap_phys;
>> -     void *virtmap, *virt_md;
>> -     efi_status_t status;
>>       u64 mapsize;
>> -     int count = 0;
>> -     unsigned long flags;
>>
>>       if (!efi_enabled(EFI_MEMMAP))
>>               return 0;
>> @@ -393,79 +435,28 @@ static int __init arm64_enter_virtual_mode(void)
>>
>>       efi.memmap = &memmap;
>>
>> -     /* Map the runtime regions */
>> -     virtmap = kmalloc(mapsize, GFP_KERNEL);
>> -     if (!virtmap) {
>> -             pr_err("Failed to allocate EFI virtual memmap\n");
>> -             return -1;
>> -     }
>> -     virtmap_phys = virt_to_phys(virtmap);
>> -     virt_md = virtmap;
>> -
>> -     for_each_efi_memory_desc(&memmap, md) {
>> -             if (!(md->attribute & EFI_MEMORY_RUNTIME))
>> -                     continue;
>> -             if (!remap_region(md, &virt_md))
>> -                     goto err_unmap;
>> -             ++count;
>> -     }
>> -
>> -     efi.systab = (__force void *)efi_lookup_mapped_addr(efi_system_table);
>> +     efi.systab = (__force void *)ioremap_cache(efi_system_table,
>> +                                                sizeof(efi_system_table_t));
>>       if (!efi.systab) {
>> -             /*
>> -              * If we have no virtual mapping for the System Table at this
>> -              * point, the memory map doesn't cover the physical offset where
>> -              * it resides. This means the System Table will be inaccessible
>> -              * to Runtime Services themselves once the virtual mapping is
>> -              * installed.
>> -              */
>> -             pr_err("Failed to remap EFI System Table -- buggy firmware?\n");
>> -             goto err_unmap;
>> +             pr_err("Failed to remap EFI System Table\n");
>> +             return -1;
>>       }
>>       set_bit(EFI_SYSTEM_TABLES, &efi.flags);
>>
>> -     local_irq_save(flags);
>> -     cpu_switch_mm(idmap_pg_dir, &init_mm);
>> -
>> -     /* Call SetVirtualAddressMap with the physical address of the map */
>> -     runtime = efi.systab->runtime;
>> -     efi.set_virtual_address_map = runtime->set_virtual_address_map;
>> -
>> -     status = efi.set_virtual_address_map(count * memmap.desc_size,
>> -                                          memmap.desc_size,
>> -                                          memmap.desc_version,
>> -                                          (efi_memory_desc_t *)virtmap_phys);
>> -     cpu_set_reserved_ttbr0();
>> -     flush_tlb_all();
>> -     local_irq_restore(flags);
>> -
>> -     kfree(virtmap);
>> -
>>       free_boot_services();
>>
>> -     if (status != EFI_SUCCESS) {
>> -             pr_err("Failed to set EFI virtual address map! [%lx]\n",
>> -                     status);
>> +     if (!efi_enabled(EFI_VIRTMAP)) {
>> +             pr_err("No UEFI virtual mapping was installed -- runtime services will not be available\n");
>>               return -1;
>>       }
>>
>>       /* Set up runtime services function pointers */
>> -     runtime = efi.systab->runtime;
>>       efi_native_runtime_setup();
>>       set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
>>
>>       efi.runtime_version = efi.systab->hdr.revision;
>>
>>       return 0;
>> -
>> -err_unmap:
>> -     /* unmap all mappings that succeeded: there are 'count' of those */
>> -     for (virt_md = virtmap; count--; virt_md += memmap.desc_size) {
>> -             md = virt_md;
>> -             iounmap((__force void __iomem *)md->virt_addr);
>> -     }
>> -     kfree(virtmap);
>> -     return -1;
>>  }
>>  early_initcall(arm64_enter_virtual_mode);
>>
>> @@ -482,3 +473,21 @@ static int __init arm64_dmi_init(void)
>>       return 0;
>>  }
>>  core_initcall(arm64_dmi_init);
>> +
>> +static void efi_set_pgd(struct mm_struct *mm)
>> +{
>> +     cpu_switch_mm(mm->pgd, mm);
>> +     flush_tlb_all();
>> +     if (icache_is_aivivt())
>> +             __flush_icache_all();
>> +}
>> +
>> +void efi_load_rt_mapping(void)
>> +{
>> +     efi_set_pgd(&efi_mm);
>> +}
>> +
>> +void efi_unload_rt_mapping(void)
>> +{
>> +     efi_set_pgd(current->active_mm);
>> +}
>> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
>> index c846a9608cbd..9a5f6e54a423 100644
>> --- a/drivers/firmware/efi/libstub/fdt.c
>> +++ b/drivers/firmware/efi/libstub/fdt.c
>> @@ -168,6 +168,68 @@ fdt_set_fail:
>>  #endif
>>
>>  /*
>> + * This is the base address at which to start allocating virtual memory ranges
>> + * for UEFI Runtime Services. This is a userland range so that we can use any
>> + * allocation we choose, and eliminate the risk of a conflict after kexec.
>> + */
>> +#define EFI_RT_VIRTUAL_BASE  0x40000000
>> +
>> +static void update_memory_map(efi_memory_desc_t *memory_map,
>> +                           unsigned long map_size, unsigned long desc_size,
>> +                           int *count)
>> +{
>> +     u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
>> +     union {
>> +             efi_memory_desc_t entry;
>> +             u8 pad[desc_size];
>> +     } *p, *q, tmp;
>> +     int i = map_size / desc_size;
>> +
>> +     p = (void *)memory_map;
>> +     for (q = p; i >= 0; i--, q++) {
>> +             u64 paddr, size;
>> +
>> +             if (!(q->entry.attribute & EFI_MEMORY_RUNTIME))
>> +                     continue;
>> +
>> +             /*
>> +              * Swap the entries around so that all EFI_MEMORY_RUNTIME
>> +              * entries bubble to the top. This will allow us to reuse the
>> +              * table as input to SetVirtualAddressMap().
>> +              */
>> +             if (q != p) {
>> +                     tmp = *p;
>> +                     *p = *q;
>> +                     *q = tmp;
>> +             }
>> +
>> +             /*
>> +              * Make the mapping compatible with 64k pages: this allows
>> +              * a 4k page size kernel to kexec a 64k page size kernel and
>> +              * vice versa.
>> +              */
>> +             paddr = round_down(p->entry.phys_addr, SZ_64K);
>> +             size = round_up(p->entry.num_pages * EFI_PAGE_SIZE +
>> +                             p->entry.phys_addr - paddr, SZ_64K);
>> +
>> +             /*
>> +              * Avoid wasting memory on PTEs by choosing a virtual base that
>> +              * is compatible with section mappings if this region has the
>> +              * appropriate size and physical alignment. (Sections are 2 MB
>> +              * on 4k granule kernels)
>> +              */
>> +             if (IS_ALIGNED(p->entry.phys_addr, SZ_2M) && size >= SZ_2M)
>> +                     efi_virt_base = round_up(efi_virt_base, SZ_2M);
>> +
>> +             p->entry.virt_addr = efi_virt_base + p->entry.phys_addr - paddr;
>> +             efi_virt_base += size;
>> +
>> +             ++p;
>> +             ++*count;
>> +     }
>> +}
>> +
>> +/*
>>   * Allocate memory for a new FDT, then add EFI, commandline, and
>>   * initrd related fields to the FDT.  This routine increases the
>>   * FDT allocation size until the allocated memory is large
>> @@ -196,6 +258,7 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>       efi_memory_desc_t *memory_map;
>>       unsigned long new_fdt_size;
>>       efi_status_t status;
>> +     int runtime_entry_count = 0;
>>
>>       /*
>>        * Estimate size of new FDT, and allocate memory for it. We
>> @@ -248,12 +311,49 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
>>               }
>>       }
>>
>> +     /*
>> +      * Update the memory map with virtual addresses, and reorder the entries
>> +      * so that we can pass it straight into SetVirtualAddressMap()
>> +      */
>> +     update_memory_map(memory_map, map_size, desc_size,
>> +                       &runtime_entry_count);
>> +
>>       /* Now we are ready to exit_boot_services.*/
>>       status = sys_table->boottime->exit_boot_services(handle, mmap_key);
>>
>> +     if (status == EFI_SUCCESS) {
>> +             efi_set_virtual_address_map_t *svam;
>> +
>> +             /* Install the new virtual address map */
>> +             svam = sys_table->runtime->set_virtual_address_map;
>> +             status = svam(runtime_entry_count * desc_size, desc_size,
>> +                           desc_ver, memory_map);
>>
>> -     if (status == EFI_SUCCESS)
>> -             return status;
>> +             /*
>> +              * We are beyond the point of no return here, so if the call to
>> +              * SetVirtualAddressMap() failed, we need to signal that to the
>> +              * incoming kernel but proceed normally otherwise.
>> +              */
>> +             if (status != EFI_SUCCESS) {
>> +                     int i;
>> +
>> +                     /*
>> +                      * Set the virtual address field of all
>> +                      * EFI_MEMORY_RUNTIME entries to 0. This will signal
>> +                      * the incoming kernel that no virtual translation has
>> +                      * been installed.
>> +                      */
>> +                     for (i = 0; i < map_size; i += desc_size) {
>> +                             efi_memory_desc_t *p;
>> +
>> +                             p = (efi_memory_desc_t *)((u8 *)memory_map + i);
>> +                             if (!(p->attribute & EFI_MEMORY_RUNTIME))
>> +                                     break;
>
> should it be continue instead of break?
>

The memory map is sorted, so we can break as soon as we encounter the
first one without the attribute set.

>> +                             p->virt_addr = 0;
>
> In X86 kernel panics in case set_virtual_address_map, Matt mentioned that it's
> not reasonable to continue. But for arm64 I'm not sure it's same case.
>

As I said, perhaps it is not reasonable, or perhaps it is, but
panicking here is maybe not the most productive thing to do.
I could add a printk() before calling exitbootservices() perhaps,
saying 'if you can read this line, and nothing more, we may have
crashed in/after SetVirtualAddressMap()'

>> +                     }
>> +             }
>> +             return EFI_SUCCESS;
>> +     }
>>
>>       pr_efi_err(sys_table, "Exit boot services failed.\n");
>>
>> --
>> 1.8.3.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub
  2014-10-28  7:57     ` Ard Biesheuvel
@ 2014-10-28  8:16       ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2014-10-28  8:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Ard

[snip]
> >> +static phys_addr_t __init efi_to_phys(unsigned long addr)
> >> +{
> >> +     efi_memory_desc_t *md;
> >> +
> >> +     for_each_efi_memory_desc(&memmap, md) {
> >> +             if (!(md->attribute & EFI_MEMORY_RUNTIME))
> >> +                     continue;
> >> +             if (md->virt_addr == 0)
> >> +                     /* no virtual mapping has been installed by the stub */
> >> +                     break;
> >> +             if (md->virt_addr < addr &&
> >> +                 (addr - md->virt_addr) < (md->num_pages << EFI_PAGE_SHIFT))
> >> +                     return md->phys_addr + addr - md->virt_addr;
> >> +     }
> >> +
> >> +     WARN_ONCE(1, "UEFI virtual mapping incomplete or missing -- no entry found for 0x%lx\n",
> >> +               addr);
> >
> > die here instead of warn looks better, addr could be anything which does not match the virt maps.
> >
> 
> The general idea was (and below as well) that SetVirtualAddressMap()
> is called from the stub, at a point where we can't easily panic/error
> out in a meaningful way, as we have just called ExitBootServices() so
> there is no console etc. So instead, I zero out all the virt_addr
> fields, which indicates that no virtual mapping is installed. That
> does not necessarily mean (on arm64 at least) that we will not be able
> to bring up ACPI etc and moan in a way that could catch someone's
> attention rather than only on the console.

Ok, thanks for explanation.

> 
> I chose a WARN_ONCE() because it gives a nice fat stack dump in the
> kernel log, but as the RT services regions and fw_vendor and
> config_table are still id mapped, I don't see a compelling reason to
> panic right here.

I worried that the addr was not the orignal physical address, maybe it's
corrupt in the middle of svam because of firmware bugs etc..

If we can ensuer it's same address as before svam callback, it will be
fine for using WARN.

[snip]
> >>       /* Now we are ready to exit_boot_services.*/
> >>       status = sys_table->boottime->exit_boot_services(handle, mmap_key);
> >>
> >> +     if (status == EFI_SUCCESS) {
> >> +             efi_set_virtual_address_map_t *svam;
> >> +
> >> +             /* Install the new virtual address map */
> >> +             svam = sys_table->runtime->set_virtual_address_map;
> >> +             status = svam(runtime_entry_count * desc_size, desc_size,
> >> +                           desc_ver, memory_map);
> >>
> >> -     if (status == EFI_SUCCESS)
> >> -             return status;
> >> +             /*
> >> +              * We are beyond the point of no return here, so if the call to
> >> +              * SetVirtualAddressMap() failed, we need to signal that to the
> >> +              * incoming kernel but proceed normally otherwise.
> >> +              */
> >> +             if (status != EFI_SUCCESS) {
> >> +                     int i;
> >> +
> >> +                     /*
> >> +                      * Set the virtual address field of all
> >> +                      * EFI_MEMORY_RUNTIME entries to 0. This will signal
> >> +                      * the incoming kernel that no virtual translation has
> >> +                      * been installed.
> >> +                      */
> >> +                     for (i = 0; i < map_size; i += desc_size) {
> >> +                             efi_memory_desc_t *p;
> >> +
> >> +                             p = (efi_memory_desc_t *)((u8 *)memory_map + i);
> >> +                             if (!(p->attribute & EFI_MEMORY_RUNTIME))
> >> +                                     break;
> >
> > should it be continue instead of break?
> >
> 
> The memory map is sorted, so we can break as soon as we encounter the
> first one without the attribute set.

ok, it's fine, I did not know that.

> 
> >> +                             p->virt_addr = 0;
> >
> > In X86 kernel panics in case set_virtual_address_map, Matt mentioned that it's
> > not reasonable to continue. But for arm64 I'm not sure it's same case.
> >
> 
> As I said, perhaps it is not reasonable, or perhaps it is, but
> panicking here is maybe not the most productive thing to do.
> I could add a printk() before calling exitbootservices() perhaps,
> saying 'if you can read this line, and nothing more, we may have
> crashed in/after SetVirtualAddressMap()'
> 

I tend to agree with you. adding a printk will be better.

Thanks
Dave 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-10-28  8:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-24 12:39 [PATCH 0/6] arm64: stable UEFI mappings for kexec Ard Biesheuvel
2014-10-24 12:39 ` [PATCH 1/6] arm64/mm: add explicit struct_mm argument to __create_mapping() Ard Biesheuvel
2014-10-24 15:23   ` Steve Capper
2014-10-24 12:39 ` [PATCH 2/6] arm64/mm: add create_pgd_mapping() to create private page tables Ard Biesheuvel
2014-10-24 15:17   ` Steve Capper
2014-10-24 15:42     ` Ard Biesheuvel
2014-10-24 12:39 ` [PATCH 3/6] efi: split off remapping code from efi_config_init() Ard Biesheuvel
2014-10-24 12:39 ` [PATCH 4/6] arm64/efi: move SetVirtualAddressMap() to UEFI stub Ard Biesheuvel
2014-10-24 13:39   ` Grant Likely
2014-10-24 13:47     ` Ard Biesheuvel
2014-10-28  7:47   ` Dave Young
2014-10-28  7:57     ` Ard Biesheuvel
2014-10-28  8:16       ` Dave Young
2014-10-24 12:39 ` [PATCH 5/6] arm64/efi: remove free_boot_services() and friends Ard Biesheuvel
2014-10-24 12:39 ` [PATCH 6/6] arm64/efi: remove idmap manipulations from UEFI code Ard Biesheuvel
2014-10-24 13:41   ` Grant Likely
2014-10-24 13:53     ` Ard Biesheuvel
2014-10-24 13:55       ` Grant Likely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).