* [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
@ 2015-09-04 13:14 Matt Fleming
2015-09-04 13:24 ` Ard Biesheuvel
2015-09-07 4:07 ` joeyli
0 siblings, 2 replies; 23+ messages in thread
From: Matt Fleming @ 2015-09-04 13:14 UTC (permalink / raw)
To: linux-efi
Cc: linux-kernel, x86, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable, Ard Biesheuvel
From: Matt Fleming <matt.fleming@intel.com>
Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
signals that the firmware PE/COFF loader supports splitting code and
data sections of PE/COFF images into separate EFI memory map entries.
This allows the kernel to map those regions with strict memory
protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
Unfortunately, an unwritten requirement of this new feature is that
the regions need to be mapped with the same offsets relative to each
other as observed in the EFI memory map. If this is not done crashes
like this may occur,
[ 0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
[ 0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
[ 0.007000] Call Trace:
[ 0.007000] [<ffffffff8104c90e>] efi_call+0x7e/0x100
[ 0.007000] [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
[ 0.007000] [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
[ 0.007000] [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
[ 0.007000] [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
[ 0.007000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.007000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
Here 0xfffffffefe6086dd refers to an address the firmware expects to
be mapped but which the OS never claimed was mapped. The issue is that
included in these regions are relative addresses to other regions
which were emitted by the firmware toolchain before the "splitting" of
sections occurred at runtime.
Needless to say, we don't satisfy this unwritten requirement on x86_64
and instead map the EFI memory map entries in reverse order. The above
crash is almost certainly triggerable with any kernel newer than v3.13
because that's when we rewrote the EFI runtime region mapping code, in
commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
kernel versions before v3.13 things may work by pure luck depending on
the fragmentation of the kernel virtual address space at the time we
map the EFI regions.
Instead of mapping the EFI memory map entries in reverse order, where
entry N has a higher virtual address than entry N+1, map them in the
same order as they appear in the EFI memory map to preserve this
relative offset between regions.
This patch has been kept as small as possible with the intention that
it should be applied aggressively to stable and distribution kernels.
It is very much a bugfix rather than support for a new feature, since
when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
above to even boot - we have no way of asking the firmware not to
split the code/data regions.
In fact, this patch doesn't even make use of the more strict memory
protections available in UEFI v2.5. That will come later.
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
arch/x86/include/asm/efi.h | 1 +
arch/x86/platform/efi/efi.c | 2 +
arch/x86/platform/efi/efi_32.c | 1 +
arch/x86/platform/efi/efi_64.c | 109 ++++++++++++++++++++++++++++++++++-------
4 files changed, 96 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 155162ea0e00..fe988599c5e1 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -96,6 +96,7 @@ extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
extern void __init efi_unmap_memmap(void);
extern void __init efi_memory_uc(u64 addr, unsigned long size);
extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void __init efi_map_calculate_base(void);
extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
extern void efi_sync_low_kernel_mappings(void);
extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e4308fe6afe8..5276ec6eefef 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -714,6 +714,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
unsigned long left = 0;
efi_memory_desc_t *md;
+ efi_map_calculate_base();
+
for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
md = p;
if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index ed5b67338294..8a80baa877fb 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -55,6 +55,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
void __init efi_map_region_fixed(efi_memory_desc_t *md) {}
void __init parse_efi_setup(u64 phys_addr, u32 data_len) {}
+void __init efi_map_calculate_base(void) {}
pgd_t * __init efi_call_phys_prolog(void)
{
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a0ac0f9c307f..4c1d15984a79 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -214,14 +214,52 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
md->phys_addr, va);
}
+/**
+ * efi_map_start_addr - Find an address to map an EFI region
+ * @md: EFI memory map region descriptor
+ * @addr: Begin searching from this address
+ *
+ * Calculate the next highest virtual address at which to map
+ * @md->phys_addr while abiding by any alignment restrictions. For
+ * example, if @md->phys_addr is 2M-aligned we keep the virtual
+ * address 2M-aligned.
+ */
+static u64 efi_map_start_addr(efi_memory_desc_t *md, u64 addr)
+{
+ u64 tail;
+
+ tail = md->phys_addr & (PMD_SIZE - 1);
+
+ /* Is physical address 2M-aligned? */
+ if (!tail) {
+ addr = roundup(addr, PMD_SIZE);
+ } else {
+ u64 prev_addr = addr;
+
+ /* get us the same offset within this 2M page */
+ addr = (addr & PMD_MASK) + tail;
+
+ /*
+ * Roll forward to the next PMD.
+ */
+ if (addr < prev_addr)
+ addr += PMD_SIZE;
+ }
+
+ return addr;
+}
+
void __init efi_map_region(efi_memory_desc_t *md)
{
unsigned long size = md->num_pages << PAGE_SHIFT;
- u64 pa = md->phys_addr;
if (efi_enabled(EFI_OLD_MEMMAP))
return old_map_region(md);
+ /* Has efi_map_calculate_base() been called? */
+ if (WARN_ON_ONCE(efi_va == EFI_VA_START))
+ return;
+
/*
* Make sure the 1:1 mappings are present as a catch-all for b0rked
* firmware which doesn't update all internal pointers after switching
@@ -239,23 +277,9 @@ void __init efi_map_region(efi_memory_desc_t *md)
return;
}
- efi_va -= size;
-
- /* Is PA 2M-aligned? */
- if (!(pa & (PMD_SIZE - 1))) {
- efi_va &= PMD_MASK;
- } else {
- u64 pa_offset = pa & (PMD_SIZE - 1);
- u64 prev_va = efi_va;
-
- /* get us the same offset within this 2M page */
- efi_va = (efi_va & PMD_MASK) + pa_offset;
-
- if (efi_va > prev_va)
- efi_va -= PMD_SIZE;
- }
+ efi_va = efi_map_start_addr(md, efi_va);
- if (efi_va < EFI_VA_END) {
+ if (efi_va > EFI_VA_START) {
pr_warn(FW_WARN "VA address range overflow!\n");
return;
}
@@ -263,6 +287,57 @@ void __init efi_map_region(efi_memory_desc_t *md)
/* Do the VA map */
__map_region(md, efi_va);
md->virt_addr = efi_va;
+ efi_va += size;
+}
+
+/**
+ * efi_map_calculate_base - Find the base address to map EFI regions
+ *
+ * This function calculates how much virtual address space is required
+ * to map all the EFI regions for runtime. We map those regions as
+ * close to each other as possible while sticking with the PMD
+ * alignment and ensuring we end at EFI_VA_START.
+ *
+ * On return we set 'efi_va' to the start address of the virtual
+ * address space where efi_map_region() will begin mapping things.
+ *
+ * Beginning in UEFI v2.5 the EFI_PROPERTIES_TABLE config table
+ * feature requires us to map all entries in the same order as they
+ * appear in the EFI memory map. That is to say, entry N must have a
+ * lower virtual address than entry N+1. This is because the firmware
+ * toolchain leaves relative references in the code/data sections,
+ * which are split and become separate EFI memory regions. Mapping
+ * things out-of-order leads to the firmware accessing unmapped
+ * addresses.
+ *
+ * We need to map things this way whether or not we actually make use
+ * of the EFI_PROPERTIES_TABLE feature.
+ *
+ * Call this function before invoking efi_map_region().
+ */
+void __init efi_map_calculate_base(void)
+{
+ efi_memory_desc_t *md;
+ u64 size = 0;
+
+ /*
+ * We don't need to place the mappings this carefully for the
+ * old mapping scheme.
+ */
+ if (efi_enabled(EFI_OLD_MEMMAP))
+ return;
+
+ for_each_efi_memory_desc(&memmap, md) {
+ if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
+ md->type != EFI_BOOT_SERVICES_CODE &&
+ md->type != EFI_BOOT_SERVICES_DATA)
+ continue;
+
+ size = efi_map_start_addr(md, size);
+ size += md->num_pages << PAGE_SHIFT;
+ }
+
+ efi_va -= size;
}
/*
--
2.1.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-04 13:14 [PATCH] x86/efi: Map EFI memmap entries in-order at runtime Matt Fleming
@ 2015-09-04 13:24 ` Ard Biesheuvel
2015-09-04 18:23 ` Matt Fleming
2015-09-07 4:07 ` joeyli
1 sibling, 1 reply; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-04 13:24 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On 4 September 2015 at 15:14, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> From: Matt Fleming <matt.fleming@intel.com>
>
> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
> signals that the firmware PE/COFF loader supports splitting code and
> data sections of PE/COFF images into separate EFI memory map entries.
> This allows the kernel to map those regions with strict memory
> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
>
> Unfortunately, an unwritten requirement of this new feature is that
> the regions need to be mapped with the same offsets relative to each
> other as observed in the EFI memory map. If this is not done crashes
> like this may occur,
>
> [ 0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
> [ 0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
> [ 0.007000] Call Trace:
> [ 0.007000] [<ffffffff8104c90e>] efi_call+0x7e/0x100
> [ 0.007000] [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
> [ 0.007000] [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
> [ 0.007000] [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
> [ 0.007000] [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
> [ 0.007000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
> [ 0.007000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
>
> Here 0xfffffffefe6086dd refers to an address the firmware expects to
> be mapped but which the OS never claimed was mapped. The issue is that
> included in these regions are relative addresses to other regions
> which were emitted by the firmware toolchain before the "splitting" of
> sections occurred at runtime.
>
> Needless to say, we don't satisfy this unwritten requirement on x86_64
> and instead map the EFI memory map entries in reverse order. The above
> crash is almost certainly triggerable with any kernel newer than v3.13
> because that's when we rewrote the EFI runtime region mapping code, in
> commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
> kernel versions before v3.13 things may work by pure luck depending on
> the fragmentation of the kernel virtual address space at the time we
> map the EFI regions.
>
> Instead of mapping the EFI memory map entries in reverse order, where
> entry N has a higher virtual address than entry N+1, map them in the
> same order as they appear in the EFI memory map to preserve this
> relative offset between regions.
>
Since the UEFI spec does not mandate an enumeration order for
GetMemoryMap(), it seems to me that you still need to sort its output
before laying out the VA space. Since you need to sort it anyway, why
not simply sort it in reverse order and keep all the original code?
Considering that this is meant for stable, that would keep the delta
*much* smaller.
--
Ard.
> This patch has been kept as small as possible with the intention that
> it should be applied aggressively to stable and distribution kernels.
> It is very much a bugfix rather than support for a new feature, since
> when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
> above to even boot - we have no way of asking the firmware not to
> split the code/data regions.
>
> In fact, this patch doesn't even make use of the more strict memory
> protections available in UEFI v2.5. That will come later.
>
> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Leif Lindholm <leif.lindholm@linaro.org>
> Cc: Peter Jones <pjones@redhat.com>
> Cc: James Bottomley <JBottomley@Odin.com>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
> ---
> arch/x86/include/asm/efi.h | 1 +
> arch/x86/platform/efi/efi.c | 2 +
> arch/x86/platform/efi/efi_32.c | 1 +
> arch/x86/platform/efi/efi_64.c | 109 ++++++++++++++++++++++++++++++++++-------
> 4 files changed, 96 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 155162ea0e00..fe988599c5e1 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -96,6 +96,7 @@ extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
> extern void __init efi_unmap_memmap(void);
> extern void __init efi_memory_uc(u64 addr, unsigned long size);
> extern void __init efi_map_region(efi_memory_desc_t *md);
> +extern void __init efi_map_calculate_base(void);
> extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
> extern void efi_sync_low_kernel_mappings(void);
> extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index e4308fe6afe8..5276ec6eefef 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -714,6 +714,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
> unsigned long left = 0;
> efi_memory_desc_t *md;
>
> + efi_map_calculate_base();
> +
> for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> md = p;
> if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
> index ed5b67338294..8a80baa877fb 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -55,6 +55,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
>
> void __init efi_map_region_fixed(efi_memory_desc_t *md) {}
> void __init parse_efi_setup(u64 phys_addr, u32 data_len) {}
> +void __init efi_map_calculate_base(void) {}
>
> pgd_t * __init efi_call_phys_prolog(void)
> {
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index a0ac0f9c307f..4c1d15984a79 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -214,14 +214,52 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
> md->phys_addr, va);
> }
>
> +/**
> + * efi_map_start_addr - Find an address to map an EFI region
> + * @md: EFI memory map region descriptor
> + * @addr: Begin searching from this address
> + *
> + * Calculate the next highest virtual address at which to map
> + * @md->phys_addr while abiding by any alignment restrictions. For
> + * example, if @md->phys_addr is 2M-aligned we keep the virtual
> + * address 2M-aligned.
> + */
> +static u64 efi_map_start_addr(efi_memory_desc_t *md, u64 addr)
> +{
> + u64 tail;
> +
> + tail = md->phys_addr & (PMD_SIZE - 1);
> +
> + /* Is physical address 2M-aligned? */
> + if (!tail) {
> + addr = roundup(addr, PMD_SIZE);
> + } else {
> + u64 prev_addr = addr;
> +
> + /* get us the same offset within this 2M page */
> + addr = (addr & PMD_MASK) + tail;
> +
> + /*
> + * Roll forward to the next PMD.
> + */
> + if (addr < prev_addr)
> + addr += PMD_SIZE;
> + }
> +
> + return addr;
> +}
> +
> void __init efi_map_region(efi_memory_desc_t *md)
> {
> unsigned long size = md->num_pages << PAGE_SHIFT;
> - u64 pa = md->phys_addr;
>
> if (efi_enabled(EFI_OLD_MEMMAP))
> return old_map_region(md);
>
> + /* Has efi_map_calculate_base() been called? */
> + if (WARN_ON_ONCE(efi_va == EFI_VA_START))
> + return;
> +
> /*
> * Make sure the 1:1 mappings are present as a catch-all for b0rked
> * firmware which doesn't update all internal pointers after switching
> @@ -239,23 +277,9 @@ void __init efi_map_region(efi_memory_desc_t *md)
> return;
> }
>
> - efi_va -= size;
> -
> - /* Is PA 2M-aligned? */
> - if (!(pa & (PMD_SIZE - 1))) {
> - efi_va &= PMD_MASK;
> - } else {
> - u64 pa_offset = pa & (PMD_SIZE - 1);
> - u64 prev_va = efi_va;
> -
> - /* get us the same offset within this 2M page */
> - efi_va = (efi_va & PMD_MASK) + pa_offset;
> -
> - if (efi_va > prev_va)
> - efi_va -= PMD_SIZE;
> - }
> + efi_va = efi_map_start_addr(md, efi_va);
>
> - if (efi_va < EFI_VA_END) {
> + if (efi_va > EFI_VA_START) {
> pr_warn(FW_WARN "VA address range overflow!\n");
> return;
> }
> @@ -263,6 +287,57 @@ void __init efi_map_region(efi_memory_desc_t *md)
> /* Do the VA map */
> __map_region(md, efi_va);
> md->virt_addr = efi_va;
> + efi_va += size;
> +}
> +
> +/**
> + * efi_map_calculate_base - Find the base address to map EFI regions
> + *
> + * This function calculates how much virtual address space is required
> + * to map all the EFI regions for runtime. We map those regions as
> + * close to each other as possible while sticking with the PMD
> + * alignment and ensuring we end at EFI_VA_START.
> + *
> + * On return we set 'efi_va' to the start address of the virtual
> + * address space where efi_map_region() will begin mapping things.
> + *
> + * Beginning in UEFI v2.5 the EFI_PROPERTIES_TABLE config table
> + * feature requires us to map all entries in the same order as they
> + * appear in the EFI memory map. That is to say, entry N must have a
> + * lower virtual address than entry N+1. This is because the firmware
> + * toolchain leaves relative references in the code/data sections,
> + * which are split and become separate EFI memory regions. Mapping
> + * things out-of-order leads to the firmware accessing unmapped
> + * addresses.
> + *
> + * We need to map things this way whether or not we actually make use
> + * of the EFI_PROPERTIES_TABLE feature.
> + *
> + * Call this function before invoking efi_map_region().
> + */
> +void __init efi_map_calculate_base(void)
> +{
> + efi_memory_desc_t *md;
> + u64 size = 0;
> +
> + /*
> + * We don't need to place the mappings this carefully for the
> + * old mapping scheme.
> + */
> + if (efi_enabled(EFI_OLD_MEMMAP))
> + return;
> +
> + for_each_efi_memory_desc(&memmap, md) {
> + if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> + md->type != EFI_BOOT_SERVICES_CODE &&
> + md->type != EFI_BOOT_SERVICES_DATA)
> + continue;
> +
> + size = efi_map_start_addr(md, size);
> + size += md->num_pages << PAGE_SHIFT;
> + }
> +
> + efi_va -= size;
> }
>
> /*
> --
> 2.1.0
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-04 13:24 ` Ard Biesheuvel
@ 2015-09-04 18:23 ` Matt Fleming
2015-09-04 18:53 ` Ard Biesheuvel
0 siblings, 1 reply; 23+ messages in thread
From: Matt Fleming @ 2015-09-04 18:23 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On Fri, 04 Sep, at 03:24:21PM, Ard Biesheuvel wrote:
>
> Since the UEFI spec does not mandate an enumeration order for
> GetMemoryMap(), it seems to me that you still need to sort its output
> before laying out the VA space. Since you need to sort it anyway, why
> not simply sort it in reverse order and keep all the original code?
> Considering that this is meant for stable, that would keep the delta
> *much* smaller.
Hmm... that'd be a neat trick and while it would save on the diff
size, I don't think it would be smaller in terms of change complexity.
EDK2 sorts the memory map when EFI_PROPERTIES_TABLE is enabled, so we
can be reasonably sure the entry order returned by GetMemoryMap() is
compatible with the split regions, even if it's not mandated by the
spec.
For the non-EFI_PROPERTIES_TABLE case, things have been working fine
without the sorting, so I'm reluctant to introduce it now (it's also
much less of an issue there).
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-04 18:23 ` Matt Fleming
@ 2015-09-04 18:53 ` Ard Biesheuvel
2015-09-06 14:06 ` Ard Biesheuvel
2015-09-08 13:16 ` Matt Fleming
0 siblings, 2 replies; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-04 18:53 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On 4 September 2015 at 20:23, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Fri, 04 Sep, at 03:24:21PM, Ard Biesheuvel wrote:
>>
>> Since the UEFI spec does not mandate an enumeration order for
>> GetMemoryMap(), it seems to me that you still need to sort its output
>> before laying out the VA space. Since you need to sort it anyway, why
>> not simply sort it in reverse order and keep all the original code?
>> Considering that this is meant for stable, that would keep the delta
>> *much* smaller.
>
> Hmm... that'd be a neat trick and while it would save on the diff
> size, I don't think it would be smaller in terms of change complexity.
>
> EDK2 sorts the memory map when EFI_PROPERTIES_TABLE is enabled, so we
> can be reasonably sure the entry order returned by GetMemoryMap() is
> compatible with the split regions, even if it's not mandated by the
> spec.
>
EDK2 does sort it, but the spec does not mandate it so another
implementation may do something different entirely.
> For the non-EFI_PROPERTIES_TABLE case, things have been working fine
> without the sorting, so I'm reluctant to introduce it now (it's also
> much less of an issue there).
>
I see. I do wonder, since the VA mapping preserves the modulo 2 MB
alignment of each region, aren't you using much more VA space when
mapping in reverse order as you are doing now?
--
Ard.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-04 18:53 ` Ard Biesheuvel
@ 2015-09-06 14:06 ` Ard Biesheuvel
2015-09-08 13:16 ` Matt Fleming
1 sibling, 0 replies; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-06 14:06 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On 4 September 2015 at 20:53, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 4 September 2015 at 20:23, Matt Fleming <matt@codeblueprint.co.uk> wrote:
>> On Fri, 04 Sep, at 03:24:21PM, Ard Biesheuvel wrote:
>>>
>>> Since the UEFI spec does not mandate an enumeration order for
>>> GetMemoryMap(), it seems to me that you still need to sort its output
>>> before laying out the VA space. Since you need to sort it anyway, why
>>> not simply sort it in reverse order and keep all the original code?
>>> Considering that this is meant for stable, that would keep the delta
>>> *much* smaller.
>>
>> Hmm... that'd be a neat trick and while it would save on the diff
>> size, I don't think it would be smaller in terms of change complexity.
>>
>> EDK2 sorts the memory map when EFI_PROPERTIES_TABLE is enabled, so we
>> can be reasonably sure the entry order returned by GetMemoryMap() is
>> compatible with the split regions, even if it's not mandated by the
>> spec.
>>
>
> EDK2 does sort it, but the spec does not mandate it so another
> implementation may do something different entirely.
>
>> For the non-EFI_PROPERTIES_TABLE case, things have been working fine
>> without the sorting, so I'm reluctant to introduce it now (it's also
>> much less of an issue there).
>>
>
> I see. I do wonder, since the VA mapping preserves the modulo 2 MB
> alignment of each region, aren't you using much more VA space when
> mapping in reverse order as you are doing now?
>
BTW if you are going to rely on the sortedness of the memory map if
the feature is enabled, you could still simply traverse the memory map
in reverse order and keep most of the old code.
--
Ard.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-04 13:14 [PATCH] x86/efi: Map EFI memmap entries in-order at runtime Matt Fleming
2015-09-04 13:24 ` Ard Biesheuvel
@ 2015-09-07 4:07 ` joeyli
2015-09-08 20:41 ` Matt Fleming
1 sibling, 1 reply; 23+ messages in thread
From: joeyli @ 2015-09-07 4:07 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi, linux-kernel, x86, Matt Fleming, Borislav Petkov,
Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
H. Peter Anvin, Dave Young, stable, Ard Biesheuvel
Hi,
On Fri, Sep 04, 2015 at 02:14:07PM +0100, Matt Fleming wrote:
> From: Matt Fleming <matt.fleming@intel.com>
>
> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
> signals that the firmware PE/COFF loader supports splitting code and
> data sections of PE/COFF images into separate EFI memory map entries.
> This allows the kernel to map those regions with strict memory
> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
>
> Unfortunately, an unwritten requirement of this new feature is that
> the regions need to be mapped with the same offsets relative to each
> other as observed in the EFI memory map. If this is not done crashes
> like this may occur,
>
> [ 0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
> [ 0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
> [ 0.007000] Call Trace:
> [ 0.007000] [<ffffffff8104c90e>] efi_call+0x7e/0x100
> [ 0.007000] [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
> [ 0.007000] [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
> [ 0.007000] [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
> [ 0.007000] [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
> [ 0.007000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
> [ 0.007000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
>
> Here 0xfffffffefe6086dd refers to an address the firmware expects to
> be mapped but which the OS never claimed was mapped. The issue is that
> included in these regions are relative addresses to other regions
> which were emitted by the firmware toolchain before the "splitting" of
> sections occurred at runtime.
>
> Needless to say, we don't satisfy this unwritten requirement on x86_64
> and instead map the EFI memory map entries in reverse order. The above
> crash is almost certainly triggerable with any kernel newer than v3.13
> because that's when we rewrote the EFI runtime region mapping code, in
> commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
> kernel versions before v3.13 things may work by pure luck depending on
> the fragmentation of the kernel virtual address space at the time we
> map the EFI regions.
>
> Instead of mapping the EFI memory map entries in reverse order, where
> entry N has a higher virtual address than entry N+1, map them in the
> same order as they appear in the EFI memory map to preserve this
> relative offset between regions.
>
> This patch has been kept as small as possible with the intention that
> it should be applied aggressively to stable and distribution kernels.
> It is very much a bugfix rather than support for a new feature, since
> when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
> above to even boot - we have no way of asking the firmware not to
> split the code/data regions.
>
> In fact, this patch doesn't even make use of the more strict memory
> protections available in UEFI v2.5. That will come later.
>
> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Leif Lindholm <leif.lindholm@linaro.org>
> Cc: Peter Jones <pjones@redhat.com>
> Cc: James Bottomley <JBottomley@Odin.com>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This patch works to me on Intel S1200V3RPS to fix issue:
DMI: Intel Corporation (uefidk.com) Intel Server Board S1200V3RPS UEFI Development Kit/ROMLEY, BIOS 2.0
Tested-by: Lee, Chun-Yi <jlee@suse.com>
> ---
> arch/x86/include/asm/efi.h | 1 +
> arch/x86/platform/efi/efi.c | 2 +
> arch/x86/platform/efi/efi_32.c | 1 +
> arch/x86/platform/efi/efi_64.c | 109 ++++++++++++++++++++++++++++++++++-------
> 4 files changed, 96 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 155162ea0e00..fe988599c5e1 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -96,6 +96,7 @@ extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
> extern void __init efi_unmap_memmap(void);
> extern void __init efi_memory_uc(u64 addr, unsigned long size);
> extern void __init efi_map_region(efi_memory_desc_t *md);
> +extern void __init efi_map_calculate_base(void);
> extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
> extern void efi_sync_low_kernel_mappings(void);
> extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index e4308fe6afe8..5276ec6eefef 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -714,6 +714,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
> unsigned long left = 0;
> efi_memory_desc_t *md;
>
> + efi_map_calculate_base();
> +
> for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> md = p;
> if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
> index ed5b67338294..8a80baa877fb 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -55,6 +55,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
>
> void __init efi_map_region_fixed(efi_memory_desc_t *md) {}
> void __init parse_efi_setup(u64 phys_addr, u32 data_len) {}
> +void __init efi_map_calculate_base(void) {}
>
> pgd_t * __init efi_call_phys_prolog(void)
> {
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index a0ac0f9c307f..4c1d15984a79 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -214,14 +214,52 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
> md->phys_addr, va);
> }
>
> +/**
> + * efi_map_start_addr - Find an address to map an EFI region
> + * @md: EFI memory map region descriptor
> + * @addr: Begin searching from this address
> + *
> + * Calculate the next highest virtual address at which to map
> + * @md->phys_addr while abiding by any alignment restrictions. For
> + * example, if @md->phys_addr is 2M-aligned we keep the virtual
> + * address 2M-aligned.
> + */
> +static u64 efi_map_start_addr(efi_memory_desc_t *md, u64 addr)
> +{
> + u64 tail;
> +
> + tail = md->phys_addr & (PMD_SIZE - 1);
> +
> + /* Is physical address 2M-aligned? */
> + if (!tail) {
> + addr = roundup(addr, PMD_SIZE);
> + } else {
> + u64 prev_addr = addr;
> +
> + /* get us the same offset within this 2M page */
> + addr = (addr & PMD_MASK) + tail;
> +
> + /*
> + * Roll forward to the next PMD.
> + */
> + if (addr < prev_addr)
> + addr += PMD_SIZE;
> + }
> +
> + return addr;
> +}
> +
> void __init efi_map_region(efi_memory_desc_t *md)
> {
> unsigned long size = md->num_pages << PAGE_SHIFT;
> - u64 pa = md->phys_addr;
>
> if (efi_enabled(EFI_OLD_MEMMAP))
> return old_map_region(md);
>
> + /* Has efi_map_calculate_base() been called? */
> + if (WARN_ON_ONCE(efi_va == EFI_VA_START))
> + return;
> +
> /*
> * Make sure the 1:1 mappings are present as a catch-all for b0rked
> * firmware which doesn't update all internal pointers after switching
> @@ -239,23 +277,9 @@ void __init efi_map_region(efi_memory_desc_t *md)
> return;
> }
>
> - efi_va -= size;
> -
> - /* Is PA 2M-aligned? */
> - if (!(pa & (PMD_SIZE - 1))) {
> - efi_va &= PMD_MASK;
> - } else {
> - u64 pa_offset = pa & (PMD_SIZE - 1);
> - u64 prev_va = efi_va;
> -
> - /* get us the same offset within this 2M page */
> - efi_va = (efi_va & PMD_MASK) + pa_offset;
> -
> - if (efi_va > prev_va)
> - efi_va -= PMD_SIZE;
> - }
> + efi_va = efi_map_start_addr(md, efi_va);
>
> - if (efi_va < EFI_VA_END) {
> + if (efi_va > EFI_VA_START) {
> pr_warn(FW_WARN "VA address range overflow!\n");
> return;
> }
> @@ -263,6 +287,57 @@ void __init efi_map_region(efi_memory_desc_t *md)
> /* Do the VA map */
> __map_region(md, efi_va);
> md->virt_addr = efi_va;
> + efi_va += size;
> +}
> +
> +/**
> + * efi_map_calculate_base - Find the base address to map EFI regions
> + *
> + * This function calculates how much virtual address space is required
> + * to map all the EFI regions for runtime. We map those regions as
> + * close to each other as possible while sticking with the PMD
> + * alignment and ensuring we end at EFI_VA_START.
> + *
> + * On return we set 'efi_va' to the start address of the virtual
> + * address space where efi_map_region() will begin mapping things.
> + *
> + * Beginning in UEFI v2.5 the EFI_PROPERTIES_TABLE config table
> + * feature requires us to map all entries in the same order as they
> + * appear in the EFI memory map. That is to say, entry N must have a
> + * lower virtual address than entry N+1. This is because the firmware
> + * toolchain leaves relative references in the code/data sections,
> + * which are split and become separate EFI memory regions. Mapping
> + * things out-of-order leads to the firmware accessing unmapped
> + * addresses.
> + *
> + * We need to map things this way whether or not we actually make use
> + * of the EFI_PROPERTIES_TABLE feature.
> + *
> + * Call this function before invoking efi_map_region().
> + */
> +void __init efi_map_calculate_base(void)
> +{
> + efi_memory_desc_t *md;
> + u64 size = 0;
> +
> + /*
> + * We don't need to place the mappings this carefully for the
> + * old mapping scheme.
> + */
> + if (efi_enabled(EFI_OLD_MEMMAP))
> + return;
> +
> + for_each_efi_memory_desc(&memmap, md) {
> + if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> + md->type != EFI_BOOT_SERVICES_CODE &&
> + md->type != EFI_BOOT_SERVICES_DATA)
> + continue;
> +
> + size = efi_map_start_addr(md, size);
> + size += md->num_pages << PAGE_SHIFT;
> + }
> +
> + efi_va -= size;
> }
>
> /*
> --
> 2.1.0
>
> --
Thanks a lot!
Joey Lee
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-04 18:53 ` Ard Biesheuvel
2015-09-06 14:06 ` Ard Biesheuvel
@ 2015-09-08 13:16 ` Matt Fleming
2015-09-08 13:21 ` Ard Biesheuvel
1 sibling, 1 reply; 23+ messages in thread
From: Matt Fleming @ 2015-09-08 13:16 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On Fri, 04 Sep, at 08:53:36PM, Ard Biesheuvel wrote:
> On 4 September 2015 at 20:23, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> > On Fri, 04 Sep, at 03:24:21PM, Ard Biesheuvel wrote:
> >>
> >> Since the UEFI spec does not mandate an enumeration order for
> >> GetMemoryMap(), it seems to me that you still need to sort its output
> >> before laying out the VA space. Since you need to sort it anyway, why
> >> not simply sort it in reverse order and keep all the original code?
> >> Considering that this is meant for stable, that would keep the delta
> >> *much* smaller.
> >
> > Hmm... that'd be a neat trick and while it would save on the diff
> > size, I don't think it would be smaller in terms of change complexity.
> >
> > EDK2 sorts the memory map when EFI_PROPERTIES_TABLE is enabled, so we
> > can be reasonably sure the entry order returned by GetMemoryMap() is
> > compatible with the split regions, even if it's not mandated by the
> > spec.
> >
>
> EDK2 does sort it, but the spec does not mandate it so another
> implementation may do something different entirely.
Yeah, we should get that requirement added to the spec.
> > For the non-EFI_PROPERTIES_TABLE case, things have been working fine
> > without the sorting, so I'm reluctant to introduce it now (it's also
> > much less of an issue there).
> >
>
> I see. I do wonder, since the VA mapping preserves the modulo 2 MB
> alignment of each region, aren't you using much more VA space when
> mapping in reverse order as you are doing now?
It doesn't enforce a 2MB alignment for every entry, just those that
are actually 2MB aligned. This should be exactly what was done in the
previous version of the code. Do you see a bug?
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-08 13:16 ` Matt Fleming
@ 2015-09-08 13:21 ` Ard Biesheuvel
2015-09-08 20:37 ` Matt Fleming
0 siblings, 1 reply; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-08 13:21 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On 8 September 2015 at 15:16, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Fri, 04 Sep, at 08:53:36PM, Ard Biesheuvel wrote:
>> On 4 September 2015 at 20:23, Matt Fleming <matt@codeblueprint.co.uk> wrote:
>> > On Fri, 04 Sep, at 03:24:21PM, Ard Biesheuvel wrote:
>> >>
>> >> Since the UEFI spec does not mandate an enumeration order for
>> >> GetMemoryMap(), it seems to me that you still need to sort its output
>> >> before laying out the VA space. Since you need to sort it anyway, why
>> >> not simply sort it in reverse order and keep all the original code?
>> >> Considering that this is meant for stable, that would keep the delta
>> >> *much* smaller.
>> >
>> > Hmm... that'd be a neat trick and while it would save on the diff
>> > size, I don't think it would be smaller in terms of change complexity.
>> >
>> > EDK2 sorts the memory map when EFI_PROPERTIES_TABLE is enabled, so we
>> > can be reasonably sure the entry order returned by GetMemoryMap() is
>> > compatible with the split regions, even if it's not mandated by the
>> > spec.
>> >
>>
>> EDK2 does sort it, but the spec does not mandate it so another
>> implementation may do something different entirely.
>
> Yeah, we should get that requirement added to the spec.
>
>> > For the non-EFI_PROPERTIES_TABLE case, things have been working fine
>> > without the sorting, so I'm reluctant to introduce it now (it's also
>> > much less of an issue there).
>> >
>>
>> I see. I do wonder, since the VA mapping preserves the modulo 2 MB
>> alignment of each region, aren't you using much more VA space when
>> mapping in reverse order as you are doing now?
>
> It doesn't enforce a 2MB alignment for every entry, just those that
> are actually 2MB aligned. This should be exactly what was done in the
> previous version of the code. Do you see a bug?
>
I noticed that the 64-bit version of efi_map_region() preserves the
relative alignment with respect to a 2 MB boundary for /each/ region.
Since the regions are mapped in reverse order, it is highly unlikely
that each region starts at the same 2 MB relative alignment that the
previous region ended at, so you are likely wasting quite a bit of VA
space.
I don't think it is a bug, though, but it does not seem intentional.
--
Ard.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-08 13:21 ` Ard Biesheuvel
@ 2015-09-08 20:37 ` Matt Fleming
2015-09-09 7:37 ` Ard Biesheuvel
0 siblings, 1 reply; 23+ messages in thread
From: Matt Fleming @ 2015-09-08 20:37 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On Tue, 08 Sep, at 03:21:17PM, Ard Biesheuvel wrote:
>
> I noticed that the 64-bit version of efi_map_region() preserves the
> relative alignment with respect to a 2 MB boundary for /each/ region.
> Since the regions are mapped in reverse order, it is highly unlikely
> that each region starts at the same 2 MB relative alignment that the
> previous region ended at, so you are likely wasting quite a bit of VA
> space.
>
> I don't think it is a bug, though, but it does not seem intentional.
Yeah, that's a very good catch. The existing code, that is, top-down
allocation scheme where we map ealier EFI memmap entries at higher
virtual addresses, does incur quite a bit of wasted address space.
That's not true of this patch, though, and it's also not true if we
map the entries in reverse order of the EFI memmap, that is, mapping
the last memmap entry at the highest virtual address.
So it's a bug in the original code, or rather an unintended feature.
Ard, based on your suggestion I cooked this patch up to show what
iterating the EFI memmap in reverse looks like in terms of code. The
below diff and the original patch from this thread give me identical
virtual address space layouts.
Admittedly the below is missing a whole bunch of comments so makes the
diff look smaller, but something like this could work,
---
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 691b333e0038..a2af35f6093a 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -704,6 +704,44 @@ out:
return ret;
}
+static inline void *efi_map_next_entry_reverse(void *entry)
+{
+ if (!entry)
+ return memmap.map_end - memmap.desc_size;
+
+ entry -= memmap.desc_size;
+ if (entry < memmap.map)
+ return NULL;
+
+ return entry;
+}
+
+static void *efi_map_next_entry(void *entry)
+{
+ bool reverse = false;
+
+ if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
+ /*
+ * Iterate the EFI memory map in reverse order because
+ * the regions will be mapped top-down. The end result
+ * is the same as if we had mapped things forward, but
+ * doesn't require us to change the implementation of
+ * efi_map_region().
+ */
+ return efi_map_next_entry_reverse(entry);
+ }
+
+ /* Initial call */
+ if (!entry)
+ return memmap.map;
+
+ entry += memmap.desc_size;
+ if (entry >= memmap.map_end)
+ return NULL;
+
+ return entry;
+}
+
/*
* Map the efi memory ranges of the runtime services and update new_mmap with
* virtual addresses.
@@ -718,7 +756,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
start = -1UL;
end = 0;
- for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ p = NULL;
+ while ((p = efi_map_next_entry(p))) {
md = p;
if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
#ifdef CONFIG_X86_64
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-07 4:07 ` joeyli
@ 2015-09-08 20:41 ` Matt Fleming
2015-09-09 0:33 ` joeyli
0 siblings, 1 reply; 23+ messages in thread
From: Matt Fleming @ 2015-09-08 20:41 UTC (permalink / raw)
To: joeyli
Cc: linux-efi, linux-kernel, x86, Matt Fleming, Borislav Petkov,
Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
H. Peter Anvin, Dave Young, stable, Ard Biesheuvel
On Mon, 07 Sep, at 12:07:52PM, joeyli wrote:
>
> This patch works to me on Intel S1200V3RPS to fix issue:
> DMI: Intel Corporation (uefidk.com) Intel Server Board S1200V3RPS UEFI Development Kit/ROMLEY, BIOS 2.0
>
> Tested-by: Lee, Chun-Yi <jlee@suse.com>
When you say "fix issue", do you mean that your machine has the
EFI_PROPERTIES_TABLE feature enabled, and that it doesn't boot without
this patch?
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-08 20:41 ` Matt Fleming
@ 2015-09-09 0:33 ` joeyli
2015-09-09 11:21 ` Matt Fleming
0 siblings, 1 reply; 23+ messages in thread
From: joeyli @ 2015-09-09 0:33 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi, linux-kernel, x86, Matt Fleming, Borislav Petkov,
LeifLindholm, leif.lindholm, Peter Jones, James Bottomley,
Matthew Garrett, H. Peter Anvin, Dave Young, stable,
Ard Biesheuvel
Hi Matt,
On Tue, Sep 08, 2015 at 09:41:47PM +0100, Matt Fleming wrote:
> On Mon, 07 Sep, at 12:07:52PM, joeyli wrote:
> >
> > This patch works to me on Intel S1200V3RPS to fix issue:
> > DMI: Intel Corporation (uefidk.com) Intel Server Board S1200V3RPS UEFI Development Kit/ROMLEY, BIOS 2.0
> >
> > Tested-by: Lee, Chun-Yi <jlee@suse.com>
>
> When you say "fix issue", do you mean that your machine has the
> EFI_PROPERTIES_TABLE feature enabled, and that it doesn't boot without
> this patch?
>
> --
> Matt Fleming, Intel Open Source Technology Center
Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
boot without your patch.
I captured similar kernel oops through serial port:
[ 0.037745] ACPI: All ACPI Tables successfully acquired
[ 0.044666] BUG: unable to handle kernel paging request at fffffffef0e5d450
[ 0.052451] IP: [<fffffffef0e5d450>] 0xfffffffef0e5d450
[ 0.058291] PGD 1c0d067 PUD 17fcfd063 PMD 17fd6f063 PTE 0
[ 0.064355] Oops: 0010 [#1] SMP
[ 0.067972] Modules linked in:
[ 0.071388] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc8-2.gc02428d-default #1
[ 0.080121] Hardware name: Intel Corporation (uefidk.com) Intel Server Board S1200V3RPS UEFI Development Kit/ROMLEY, BIOS 2.0
[ 0.092827] task: ffffffff81c114c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 0.101172] RIP: 0010:[<fffffffef0e5d450>] [<fffffffef0e5d450>] 0xfffffffef0e5d450
[ 0.109724] RSP: 0000:ffffffff81c03d38 EFLAGS: 00010082
[ 0.115647] RAX: fffffffef0c5d100 RBX: fffffffef1d66040 RCX: fffffffef0c5d100
[ 0.123604] RDX: 00000000be366018 RSI: 0000000000000000 RDI: ffffffff81c38b00
[ 0.131560] RBP: 000000000000000c R08: ffffffff81c03d70 R09: ffffffff81c38b0b
[ 0.139517] R10: 0000000000000078 R11: 0000000000000002 R12: 0000000000000296
[ 0.147475] R13: ffffffff81c03eb8 R14: 0000000000000030 R15: 0000000000000007
[ 0.155432] FS: 0000000000000000(0000) GS:ffff88042e600000(0000) knlGS:0000000000000000
[ 0.164457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.170864] CR2: fffffffef0e5d450 CR3: 000000000009b000 CR4: 00000000000406b0
[ 0.178819] Stack:
[ 0.181059] fffffffef10574c9 0000000000000002 0000000000000000 0000000000000000
[ 0.189351] ffffffff81c38b00 0000000000000000 fffffffef1057a4b 0000000000000058
[ 0.197645] 000000078000203c 000000000009b000 ffff8801ffd70000 000000000009b000
[ 0.205939] Call Trace:
[ 0.208669] [<ffffffff8105dd9e>] ? efi_call+0x7e/0x100
[ 0.214497] [<ffffffff81523006>] ? virt_efi_set_variable+0x66/0x90
[ 0.221487] [<ffffffff8105cd67>] ? efi_delete_dummy_variable+0x77/0x90
[ 0.228866] [<ffffffff81d41ff5>] ? efi_enter_virtual_mode+0x3ac/0x3bb
[ 0.236147] [<ffffffff81d26f24>] ? start_kernel+0x3f4/0x484
[ 0.242459] [<ffffffff81d26120>] ? early_idt_handler_array+0x120/0x120
[ 0.249835] [<ffffffff81d26315>] ? x86_64_start_reservations+0x2a/0x2c
[ 0.257212] [<ffffffff81d26452>] ? x86_64_start_kernel+0x13b/0x14a
[ 0.264200] Code: Bad RIP value.
[ 0.267916] RIP [<fffffffef0e5d450>] 0xfffffffef0e5d450
[ 0.273851] RSP <ffffffff81c03d38>
[ 0.277739] CR2: fffffffef0e5d450
[ 0.281436] ---[ end trace 19be7a419bfa9401 ]---
[ 0.286575] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.294049] Rebooting in 90 seconds..
[ 0.299184] ACPI MEMORY or I/O RESET_REG.
Thanks a lot!
Joey Lee
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-08 20:37 ` Matt Fleming
@ 2015-09-09 7:37 ` Ard Biesheuvel
2015-09-09 9:58 ` Matt Fleming
0 siblings, 1 reply; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-09 7:37 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On 8 September 2015 at 22:37, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Tue, 08 Sep, at 03:21:17PM, Ard Biesheuvel wrote:
>>
>> I noticed that the 64-bit version of efi_map_region() preserves the
>> relative alignment with respect to a 2 MB boundary for /each/ region.
>> Since the regions are mapped in reverse order, it is highly unlikely
>> that each region starts at the same 2 MB relative alignment that the
>> previous region ended at, so you are likely wasting quite a bit of VA
>> space.
>>
>> I don't think it is a bug, though, but it does not seem intentional.
>
> Yeah, that's a very good catch. The existing code, that is, top-down
> allocation scheme where we map ealier EFI memmap entries at higher
> virtual addresses, does incur quite a bit of wasted address space.
>
> That's not true of this patch, though, and it's also not true if we
> map the entries in reverse order of the EFI memmap, that is, mapping
> the last memmap entry at the highest virtual address.
>
> So it's a bug in the original code, or rather an unintended feature.
>
Indeed. It does deserve a mention, since the point of this patch is to
prevent reordering and/or rounding up of regions.
> Ard, based on your suggestion I cooked this patch up to show what
> iterating the EFI memmap in reverse looks like in terms of code. The
> below diff and the original patch from this thread give me identical
> virtual address space layouts.
>
Good, as expected.
> Admittedly the below is missing a whole bunch of comments so makes the
> diff look smaller, but something like this could work,
>
> ---
>
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 691b333e0038..a2af35f6093a 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -704,6 +704,44 @@ out:
> return ret;
> }
>
> +static inline void *efi_map_next_entry_reverse(void *entry)
> +{
> + if (!entry)
> + return memmap.map_end - memmap.desc_size;
> +
> + entry -= memmap.desc_size;
> + if (entry < memmap.map)
> + return NULL;
> +
> + return entry;
> +}
> +
> +static void *efi_map_next_entry(void *entry)
> +{
> + bool reverse = false;
> +
> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
Here, you could also test whether the
EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA bit
(sigh) is set
> + /*
> + * Iterate the EFI memory map in reverse order because
> + * the regions will be mapped top-down. The end result
> + * is the same as if we had mapped things forward, but
> + * doesn't require us to change the implementation of
> + * efi_map_region().
> + */
> + return efi_map_next_entry_reverse(entry);
> + }
> +
> + /* Initial call */
> + if (!entry)
> + return memmap.map;
> +
> + entry += memmap.desc_size;
> + if (entry >= memmap.map_end)
> + return NULL;
> +
> + return entry;
> +}
> +
> /*
> * Map the efi memory ranges of the runtime services and update new_mmap with
> * virtual addresses.
> @@ -718,7 +756,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
> start = -1UL;
> end = 0;
>
> - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> + p = NULL;
> + while ((p = efi_map_next_entry(p))) {
> md = p;
> if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> #ifdef CONFIG_X86_64
>
> --
> Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-09 7:37 ` Ard Biesheuvel
@ 2015-09-09 9:58 ` Matt Fleming
2015-09-09 9:59 ` Ard Biesheuvel
0 siblings, 1 reply; 23+ messages in thread
From: Matt Fleming @ 2015-09-09 9:58 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On Wed, 09 Sep, at 09:37:21AM, Ard Biesheuvel wrote:
> On 8 September 2015 at 22:37, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> >
> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > index 691b333e0038..a2af35f6093a 100644
> > --- a/arch/x86/platform/efi/efi.c
> > +++ b/arch/x86/platform/efi/efi.c
> > @@ -704,6 +704,44 @@ out:
> > return ret;
> > }
> >
> > +static inline void *efi_map_next_entry_reverse(void *entry)
> > +{
> > + if (!entry)
> > + return memmap.map_end - memmap.desc_size;
> > +
> > + entry -= memmap.desc_size;
> > + if (entry < memmap.map)
> > + return NULL;
> > +
> > + return entry;
> > +}
> > +
> > +static void *efi_map_next_entry(void *entry)
> > +{
> > + bool reverse = false;
> > +
> > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>
> Here, you could also test whether the
> EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA bit
> (sigh) is set
No, leaving this out was intentional because we're already suffering
from the combinatoral explosion of config options. Introducing more
code paths is very much the wrong thing to do unless absolutely
necessary.
If we can get away with using one mapping scheme here, we should.
When trying to debug this code in the future I do not want to be
thinking "Do you have EFI_PROPERTIES_RUNTIME_OMG_THIS_IS_SILLY bit
set? because that means we're mapping the runtime regions in a
different order".
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-09 9:58 ` Matt Fleming
@ 2015-09-09 9:59 ` Ard Biesheuvel
0 siblings, 0 replies; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-09 9:59 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
x86@kernel.org, Matt Fleming, Borislav Petkov, Leif Lindholm,
Peter Jones, James Bottomley, Matthew Garrett, H. Peter Anvin,
Dave Young, stable@vger.kernel.org
On 9 September 2015 at 11:58, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Wed, 09 Sep, at 09:37:21AM, Ard Biesheuvel wrote:
>> On 8 September 2015 at 22:37, Matt Fleming <matt@codeblueprint.co.uk> wrote:
>> >
>> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>> > index 691b333e0038..a2af35f6093a 100644
>> > --- a/arch/x86/platform/efi/efi.c
>> > +++ b/arch/x86/platform/efi/efi.c
>> > @@ -704,6 +704,44 @@ out:
>> > return ret;
>> > }
>> >
>> > +static inline void *efi_map_next_entry_reverse(void *entry)
>> > +{
>> > + if (!entry)
>> > + return memmap.map_end - memmap.desc_size;
>> > +
>> > + entry -= memmap.desc_size;
>> > + if (entry < memmap.map)
>> > + return NULL;
>> > +
>> > + return entry;
>> > +}
>> > +
>> > +static void *efi_map_next_entry(void *entry)
>> > +{
>> > + bool reverse = false;
>> > +
>> > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>>
>> Here, you could also test whether the
>> EFI_PROPERTIES_RUNTIME_MEMORY_PROTECTION_NON_EXECUTABLE_PE_DATA bit
>> (sigh) is set
>
> No, leaving this out was intentional because we're already suffering
> from the combinatoral explosion of config options. Introducing more
> code paths is very much the wrong thing to do unless absolutely
> necessary.
>
> If we can get away with using one mapping scheme here, we should.
>
> When trying to debug this code in the future I do not want to be
> thinking "Do you have EFI_PROPERTIES_RUNTIME_OMG_THIS_IS_SILLY bit
> set? because that means we're mapping the runtime regions in a
> different order".
>
OK, point taken. I suppose buggy firmware already has the option of
using EFI_OLD_MEMMAP as a fallback.
--
Ard.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-09 0:33 ` joeyli
@ 2015-09-09 11:21 ` Matt Fleming
2015-09-10 3:38 ` joeyli
2015-09-16 10:08 ` Borislav Petkov
0 siblings, 2 replies; 23+ messages in thread
From: Matt Fleming @ 2015-09-09 11:21 UTC (permalink / raw)
To: joeyli
Cc: linux-efi, linux-kernel, x86, Matt Fleming, Borislav Petkov,
LeifLindholm, leif.lindholm, Peter Jones, James Bottomley,
Matthew Garrett, H. Peter Anvin, Dave Young, stable,
Ard Biesheuvel
On Wed, 09 Sep, at 08:33:07AM, joeyli wrote:
>
> Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
> boot without your patch.
Awesome. Could you test the following patch instead?
---
>From 24d324b781a3b688dcc265995949a9cf4e8af687 Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt.fleming@intel.com>
Date: Thu, 3 Sep 2015 15:56:25 +0100
Subject: [PATCH v2] x86/efi: Map EFI memmap entries in-order at runtime
Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
signals that the firmware PE/COFF loader supports splitting code and
data sections of PE/COFF images into separate EFI memory map entries.
This allows the kernel to map those regions with strict memory
protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
Unfortunately, an unwritten requirement of this new feature is that
the regions need to be mapped with the same offsets relative to each
other as observed in the EFI memory map. If this is not done crashes
like this may occur,
[ 0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
[ 0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
[ 0.007000] Call Trace:
[ 0.007000] [<ffffffff8104c90e>] efi_call+0x7e/0x100
[ 0.007000] [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
[ 0.007000] [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
[ 0.007000] [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
[ 0.007000] [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
[ 0.007000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.007000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
Here 0xfffffffefe6086dd refers to an address the firmware expects to
be mapped but which the OS never claimed was mapped. The issue is that
included in these regions are relative addresses to other regions
which were emitted by the firmware toolchain before the "splitting" of
sections occurred at runtime.
Needless to say, we don't satisfy this unwritten requirement on x86_64
and instead map the EFI memory map entries in reverse order. The above
crash is almost certainly triggerable with any kernel newer than v3.13
because that's when we rewrote the EFI runtime region mapping code, in
commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
kernel versions before v3.13 things may work by pure luck depending on
the fragmentation of the kernel virtual address space at the time we
map the EFI regions.
Instead of mapping the EFI memory map entries in reverse order, where
entry N has a higher virtual address than entry N+1, map them in the
same order as they appear in the EFI memory map to preserve this
relative offset between regions.
This patch has been kept as small as possible with the intention that
it should be applied aggressively to stable and distribution kernels.
It is very much a bugfix rather than support for a new feature, since
when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
above to even boot - we have no way of asking the firmware not to
split the code/data regions.
In fact, this patch doesn't even make use of the more strict memory
protections available in UEFI v2.5. That will come later.
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
v2: Use Ard's reverse iteration scheme so that we can reuse the
existing efi_map_region() implementation that maps things top-down.
arch/x86/platform/efi/efi.c | 67 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e4308fe6afe8..c6835bfad3a1 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -705,6 +705,70 @@ out:
}
/*
+ * Iterate the EFI memory map in reverse order because the regions
+ * will be mapped top-down. The end result is the same as if we had
+ * mapped things forward, but doesn't require us to change the
+ * existing implementation of efi_map_region().
+ */
+static inline void *efi_map_next_entry_reverse(void *entry)
+{
+ /* Initial call */
+ if (!entry)
+ return memmap.map_end - memmap.desc_size;
+
+ entry -= memmap.desc_size;
+ if (entry < memmap.map)
+ return NULL;
+
+ return entry;
+}
+
+/*
+ * efi_map_next_entry - Return the next EFI memory map descriptor
+ * @entry: Previous EFI memory map descriptor
+ *
+ * This is a helper function to iterate over the EFI memory map, which
+ * we do in different orders depending on the current configuration.
+ *
+ * To begin traversing the memory map @entry must be %NULL.
+ *
+ * Returns %NULL when we reach the end of the memory map.
+ */
+static void *efi_map_next_entry(void *entry)
+{
+ if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
+ /*
+ * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
+ * config table feature requires us to map all entries
+ * in the same order as they appear in the EFI memory
+ * map. That is to say, entry N must have a lower
+ * virtual address than entry N+1. This is because the
+ * firmware toolchain leaves relative references in
+ * the code/data sections, which are split and become
+ * separate EFI memory regions. Mapping things
+ * out-of-order leads to the firmware accessing
+ * unmapped addresses.
+ *
+ * Since we need to map things this way whether or not
+ * the kernel actually makes use of
+ * EFI_PROPERTIES_TABLE, let's just switch to this
+ * scheme by default for 64-bit.
+ */
+ return efi_map_next_entry_reverse(entry);
+ }
+
+ /* Initial call */
+ if (!entry)
+ return memmap.map;
+
+ entry += memmap.desc_size;
+ if (entry >= memmap.map_end)
+ return NULL;
+
+ return entry;
+}
+
+/*
* Map the efi memory ranges of the runtime services and update new_mmap with
* virtual addresses.
*/
@@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
unsigned long left = 0;
efi_memory_desc_t *md;
- for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ p = NULL;
+ while ((p = efi_map_next_entry(p))) {
md = p;
if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
#ifdef CONFIG_X86_64
--
2.1.0
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-09 11:21 ` Matt Fleming
@ 2015-09-10 3:38 ` joeyli
2015-09-16 10:08 ` Borislav Petkov
1 sibling, 0 replies; 23+ messages in thread
From: joeyli @ 2015-09-10 3:38 UTC (permalink / raw)
To: Matt Fleming
Cc: linux-efi, linux-kernel, x86, Matt Fleming, Borislav Petkov,
LeifLindholm, leif.lindholm, Peter Jones, James Bottomley,
Matthew Garrett, H. Peter Anvin, Dave Young, stable,
Ard Biesheuvel
Hi,
On Wed, Sep 09, 2015 at 12:21:23PM +0100, Matt Fleming wrote:
> On Wed, 09 Sep, at 08:33:07AM, joeyli wrote:
> >
> > Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
> > boot without your patch.
>
> Awesome. Could you test the following patch instead?
>
> ---
Yes, as the first edition, this patch works on my S1200V3RPS machine.
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Regards
Joey Lee
>
> >From 24d324b781a3b688dcc265995949a9cf4e8af687 Mon Sep 17 00:00:00 2001
> From: Matt Fleming <matt.fleming@intel.com>
> Date: Thu, 3 Sep 2015 15:56:25 +0100
> Subject: [PATCH v2] x86/efi: Map EFI memmap entries in-order at runtime
>
> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
> signals that the firmware PE/COFF loader supports splitting code and
> data sections of PE/COFF images into separate EFI memory map entries.
> This allows the kernel to map those regions with strict memory
> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
>
> Unfortunately, an unwritten requirement of this new feature is that
> the regions need to be mapped with the same offsets relative to each
> other as observed in the EFI memory map. If this is not done crashes
> like this may occur,
>
> [ 0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
> [ 0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
> [ 0.007000] Call Trace:
> [ 0.007000] [<ffffffff8104c90e>] efi_call+0x7e/0x100
> [ 0.007000] [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
> [ 0.007000] [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
> [ 0.007000] [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
> [ 0.007000] [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
> [ 0.007000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
> [ 0.007000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
>
> Here 0xfffffffefe6086dd refers to an address the firmware expects to
> be mapped but which the OS never claimed was mapped. The issue is that
> included in these regions are relative addresses to other regions
> which were emitted by the firmware toolchain before the "splitting" of
> sections occurred at runtime.
>
> Needless to say, we don't satisfy this unwritten requirement on x86_64
> and instead map the EFI memory map entries in reverse order. The above
> crash is almost certainly triggerable with any kernel newer than v3.13
> because that's when we rewrote the EFI runtime region mapping code, in
> commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
> kernel versions before v3.13 things may work by pure luck depending on
> the fragmentation of the kernel virtual address space at the time we
> map the EFI regions.
>
> Instead of mapping the EFI memory map entries in reverse order, where
> entry N has a higher virtual address than entry N+1, map them in the
> same order as they appear in the EFI memory map to preserve this
> relative offset between regions.
>
> This patch has been kept as small as possible with the intention that
> it should be applied aggressively to stable and distribution kernels.
> It is very much a bugfix rather than support for a new feature, since
> when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
> above to even boot - we have no way of asking the firmware not to
> split the code/data regions.
>
> In fact, this patch doesn't even make use of the more strict memory
> protections available in UEFI v2.5. That will come later.
>
> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Lee, Chun-Yi <jlee@suse.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Leif Lindholm <leif.lindholm@linaro.org>
> Cc: Peter Jones <pjones@redhat.com>
> Cc: James Bottomley <JBottomley@Odin.com>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
> ---
>
> v2: Use Ard's reverse iteration scheme so that we can reuse the
> existing efi_map_region() implementation that maps things top-down.
>
> arch/x86/platform/efi/efi.c | 67 ++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 66 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index e4308fe6afe8..c6835bfad3a1 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -705,6 +705,70 @@ out:
> }
>
> /*
> + * Iterate the EFI memory map in reverse order because the regions
> + * will be mapped top-down. The end result is the same as if we had
> + * mapped things forward, but doesn't require us to change the
> + * existing implementation of efi_map_region().
> + */
> +static inline void *efi_map_next_entry_reverse(void *entry)
> +{
> + /* Initial call */
> + if (!entry)
> + return memmap.map_end - memmap.desc_size;
> +
> + entry -= memmap.desc_size;
> + if (entry < memmap.map)
> + return NULL;
> +
> + return entry;
> +}
> +
> +/*
> + * efi_map_next_entry - Return the next EFI memory map descriptor
> + * @entry: Previous EFI memory map descriptor
> + *
> + * This is a helper function to iterate over the EFI memory map, which
> + * we do in different orders depending on the current configuration.
> + *
> + * To begin traversing the memory map @entry must be %NULL.
> + *
> + * Returns %NULL when we reach the end of the memory map.
> + */
> +static void *efi_map_next_entry(void *entry)
> +{
> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> + /*
> + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> + * config table feature requires us to map all entries
> + * in the same order as they appear in the EFI memory
> + * map. That is to say, entry N must have a lower
> + * virtual address than entry N+1. This is because the
> + * firmware toolchain leaves relative references in
> + * the code/data sections, which are split and become
> + * separate EFI memory regions. Mapping things
> + * out-of-order leads to the firmware accessing
> + * unmapped addresses.
> + *
> + * Since we need to map things this way whether or not
> + * the kernel actually makes use of
> + * EFI_PROPERTIES_TABLE, let's just switch to this
> + * scheme by default for 64-bit.
> + */
> + return efi_map_next_entry_reverse(entry);
> + }
> +
> + /* Initial call */
> + if (!entry)
> + return memmap.map;
> +
> + entry += memmap.desc_size;
> + if (entry >= memmap.map_end)
> + return NULL;
> +
> + return entry;
> +}
> +
> +/*
> * Map the efi memory ranges of the runtime services and update new_mmap with
> * virtual addresses.
> */
> @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
> unsigned long left = 0;
> efi_memory_desc_t *md;
>
> - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> + p = NULL;
> + while ((p = efi_map_next_entry(p))) {
> md = p;
> if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> #ifdef CONFIG_X86_64
> --
> 2.1.0
>
> --
> Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-09 11:21 ` Matt Fleming
2015-09-10 3:38 ` joeyli
@ 2015-09-16 10:08 ` Borislav Petkov
2015-09-16 11:25 ` Ard Biesheuvel
1 sibling, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2015-09-16 10:08 UTC (permalink / raw)
To: Matt Fleming
Cc: joeyli, linux-efi, linux-kernel, x86, Matt Fleming, LeifLindholm,
leif.lindholm, Peter Jones, James Bottomley, Matthew Garrett,
H. Peter Anvin, Dave Young, stable, Ard Biesheuvel
On Wed, Sep 09, 2015 at 12:21:23PM +0100, Matt Fleming wrote:
> On Wed, 09 Sep, at 08:33:07AM, joeyli wrote:
> >
> > Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
> > boot without your patch.
>
> Awesome. Could you test the following patch instead?
>
> ---
>
> From 24d324b781a3b688dcc265995949a9cf4e8af687 Mon Sep 17 00:00:00 2001
> From: Matt Fleming <matt.fleming@intel.com>
> Date: Thu, 3 Sep 2015 15:56:25 +0100
> Subject: [PATCH v2] x86/efi: Map EFI memmap entries in-order at runtime
>
> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
> signals that the firmware PE/COFF loader supports splitting code and
> data sections of PE/COFF images into separate EFI memory map entries.
> This allows the kernel to map those regions with strict memory
> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
>
> Unfortunately, an unwritten requirement of this new feature is that
> the regions need to be mapped with the same offsets relative to each
> other as observed in the EFI memory map. If this is not done crashes
Let me get this straight: this looks like the next EFI screwup which
practically requires specific mapping placement in VA space just
because it uses relative addresses? And since you say "unwritten" this
practically a requirement is not even in the spec?
Can we state explicitly in the spec NOT to rely on mapping VA placement?
I mean, this "unwritten" requirement is seriously screwed on soo many
levels...
What else are we to expect? Spelled out virtual addresses which are
going to be the EFI-allowed ones only??!
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-16 10:08 ` Borislav Petkov
@ 2015-09-16 11:25 ` Ard Biesheuvel
2015-09-16 13:28 ` Borislav Petkov
2015-09-16 13:37 ` James Bottomley
0 siblings, 2 replies; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-16 11:25 UTC (permalink / raw)
To: Borislav Petkov
Cc: Matt Fleming, joeyli, linux-efi@vger.kernel.org,
linux-kernel@vger.kernel.org, x86@kernel.org, Matt Fleming,
LeifLindholm, Leif Lindholm, Peter Jones, James Bottomley,
Matthew Garrett, H. Peter Anvin, Dave Young,
stable@vger.kernel.org
On 16 September 2015 at 12:08, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Sep 09, 2015 at 12:21:23PM +0100, Matt Fleming wrote:
>> On Wed, 09 Sep, at 08:33:07AM, joeyli wrote:
>> >
>> > Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
>> > boot without your patch.
>>
>> Awesome. Could you test the following patch instead?
>>
>> ---
>>
>> From 24d324b781a3b688dcc265995949a9cf4e8af687 Mon Sep 17 00:00:00 2001
>> From: Matt Fleming <matt.fleming@intel.com>
>> Date: Thu, 3 Sep 2015 15:56:25 +0100
>> Subject: [PATCH v2] x86/efi: Map EFI memmap entries in-order at runtime
>>
>> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
>> signals that the firmware PE/COFF loader supports splitting code and
>> data sections of PE/COFF images into separate EFI memory map entries.
>> This allows the kernel to map those regions with strict memory
>> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
>>
>> Unfortunately, an unwritten requirement of this new feature is that
>> the regions need to be mapped with the same offsets relative to each
>> other as observed in the EFI memory map. If this is not done crashes
>
> Let me get this straight: this looks like the next EFI screwup which
> practically requires specific mapping placement in VA space just
> because it uses relative addresses?
Both relative and absolute references, currently. The latter are also
affected since the relocation offset that is applied to all PE/COFF
relocation entries is based on the displacement of ImageBase, and
absolute references to symbols in .data need to be treated specially
(since it may be shifted relative to the .text section containing
ImageBase). This could be worked around by converting each absolute
reference individually using ConvertPointer () [and I have a proof of
concept that actually makes the problem go away on x86] but it would
still be only a partial solution, since relative references are not
tracked in the PE/COFF metadata, so even if we wanted to, it would be
intractible to find each cross-section relative reference and do the
fixup.
> And since you say "unwritten" this
> practically a requirement is not even in the spec?
>
No, it seems nobody thought of this when designing the feature.
> Can we state explicitly in the spec NOT to rely on mapping VA placement?
> I mean, this "unwritten" requirement is seriously screwed on soo many
> levels...
>
Several solutions and/or work arounds are currently under discussion
--
Ard.
> What else are we to expect? Spelled out virtual addresses which are
> going to be the EFI-allowed ones only??!
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> --
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-16 11:25 ` Ard Biesheuvel
@ 2015-09-16 13:28 ` Borislav Petkov
2015-09-16 13:38 ` Ard Biesheuvel
2015-09-16 13:37 ` James Bottomley
1 sibling, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2015-09-16 13:28 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Matt Fleming, joeyli, linux-efi@vger.kernel.org,
linux-kernel@vger.kernel.org, x86@kernel.org, Matt Fleming,
Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
H. Peter Anvin, Dave Young, stable@vger.kernel.org
On Wed, Sep 16, 2015 at 01:25:06PM +0200, Ard Biesheuvel wrote:
> ... so even if we wanted to, it would be intractible to find each
> cross-section relative reference and do the fixup.
Hmm, maybe we should go and patch EFI code segments and fixup all
relative references after mapping. I mean, if you want something done
right, you better do it yourself. :-\
> No, it seems nobody thought of this when designing the feature.
Not surprised at all, to be honest.
> Several solutions and/or work arounds are currently under discussion.
And requiring for code segments not to refer to each other with relative
offsets and holding that down in the spec post-factum is not possible
anymore...?
[ I can already imagine what the answer to that question would be though... ]
Thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-16 11:25 ` Ard Biesheuvel
2015-09-16 13:28 ` Borislav Petkov
@ 2015-09-16 13:37 ` James Bottomley
2015-09-16 14:07 ` Ard Biesheuvel
1 sibling, 1 reply; 23+ messages in thread
From: James Bottomley @ 2015-09-16 13:37 UTC (permalink / raw)
To: ard.biesheuvel@linaro.org
Cc: matt@codeblueprint.co.uk, linux-kernel@vger.kernel.org,
pjones@redhat.com, jlee@suse.com, bp@suse.de, dyoung@redhat.com,
stable@vger.kernel.org, x86@kernel.org, hpa@zytor.com,
linux-efi@vger.kernel.org, leif.lindholm@linaro.org,
matt.fleming@intel.com, LeifLindholm@linux-rxt1.site,
mjg59@srcf.ucam.org
On Wed, 2015-09-16 at 13:25 +0200, Ard Biesheuvel wrote:
> On 16 September 2015 at 12:08, Borislav Petkov <bp@suse.de> wrote:
> > On Wed, Sep 09, 2015 at 12:21:23PM +0100, Matt Fleming wrote:
> >> On Wed, 09 Sep, at 08:33:07AM, joeyli wrote:
> >> >
> >> > Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
> >> > boot without your patch.
> >>
> >> Awesome. Could you test the following patch instead?
> >>
> >> ---
> >>
> >> From 24d324b781a3b688dcc265995949a9cf4e8af687 Mon Sep 17 00:00:00 2001
> >> From: Matt Fleming <matt.fleming@intel.com>
> >> Date: Thu, 3 Sep 2015 15:56:25 +0100
> >> Subject: [PATCH v2] x86/efi: Map EFI memmap entries in-order at runtime
> >>
> >> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
> >> signals that the firmware PE/COFF loader supports splitting code and
> >> data sections of PE/COFF images into separate EFI memory map entries.
> >> This allows the kernel to map those regions with strict memory
> >> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
> >>
> >> Unfortunately, an unwritten requirement of this new feature is that
> >> the regions need to be mapped with the same offsets relative to each
> >> other as observed in the EFI memory map. If this is not done crashes
> >
> > Let me get this straight: this looks like the next EFI screwup which
> > practically requires specific mapping placement in VA space just
> > because it uses relative addresses?
>
> Both relative and absolute references, currently. The latter are also
> affected since the relocation offset that is applied to all PE/COFF
> relocation entries is based on the displacement of ImageBase, and
> absolute references to symbols in .data need to be treated specially
> (since it may be shifted relative to the .text section containing
> ImageBase). This could be worked around by converting each absolute
> reference individually using ConvertPointer () [and I have a proof of
> concept that actually makes the problem go away on x86] but it would
> still be only a partial solution, since relative references are not
> tracked in the PE/COFF metadata, so even if we wanted to, it would be
> intractible to find each cross-section relative reference and do the
> fixup.
>
> > And since you say "unwritten" this
> > practically a requirement is not even in the spec?
> >
>
> No, it seems nobody thought of this when designing the feature.
To add colour: our problem is section relative references (either loads
or jumps). The PE/COFF linker is allowed not to emit relocations for
section relative references because it expects that the sections will
always be loaded at the same relative offset. It looks like it is
possible to force them to have relocation entries, so it would be
possible to add to the standard language requiring this for UEFI
compatible binaries, but that won't help with any of the existing
PE/COFF stuff in the field.
The problem is that to apply the various protections UEFI is
introducing, we're trying to relocate the sections and that's what's
causing the issue. Before this, no-one really thought of mapping the
sections at different relative addresses. It's really an unexpected
weakness in the PE/COFF spec that we can't fix, so we have to work
around it.
James
> > Can we state explicitly in the spec NOT to rely on mapping VA placement?
> > I mean, this "unwritten" requirement is seriously screwed on soo many
> > levels...
> >
>
> Several solutions and/or work arounds are currently under discussion
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-16 13:28 ` Borislav Petkov
@ 2015-09-16 13:38 ` Ard Biesheuvel
2015-09-17 8:05 ` Borislav Petkov
0 siblings, 1 reply; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-16 13:38 UTC (permalink / raw)
To: Borislav Petkov
Cc: Matt Fleming, joeyli, linux-efi@vger.kernel.org,
linux-kernel@vger.kernel.org, x86@kernel.org, Matt Fleming,
Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
H. Peter Anvin, Dave Young, stable@vger.kernel.org
On 16 September 2015 at 15:28, Borislav Petkov <bp@suse.de> wrote:
> On Wed, Sep 16, 2015 at 01:25:06PM +0200, Ard Biesheuvel wrote:
>> ... so even if we wanted to, it would be intractible to find each
>> cross-section relative reference and do the fixup.
>
> Hmm, maybe we should go and patch EFI code segments and fixup all
> relative references after mapping. I mean, if you want something done
> right, you better do it yourself. :-\
>
That is a can of worms I'd rather keep closed, if you don't mind ...
>> No, it seems nobody thought of this when designing the feature.
>
> Not surprised at all, to be honest.
>
>> Several solutions and/or work arounds are currently under discussion.
>
> And requiring for code segments not to refer to each other with relative
> offsets and holding that down in the spec post-factum is not possible
> anymore...?
>
> [ I can already imagine what the answer to that question would be though... ]
>
Fixing the spec is easy. Modifying all the toolchains out there to add
an option that inhibits cross-section relative references is the
problem. Note that, at the object level, it is not necessarily obvious
to the compiler whether a symbol reference will end up referring to
another section than its own. So this basically means 'no relative
references at all', and most compilers don't have that option yet.
[Note that GCC's large code model, which makes no assumption about the
proximity of external symbols, may still emit PC relative literals
(e.g., '.quad sym - .') on ARM. Not sure about X64)
But in general, since we are already violating the PE/COFF spec by
relocating each runtime image once, then invoke its entry point, then
fire an event which it should catch to manually update its pointers,
and then relocate it again into the OS VA space.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-16 13:37 ` James Bottomley
@ 2015-09-16 14:07 ` Ard Biesheuvel
0 siblings, 0 replies; 23+ messages in thread
From: Ard Biesheuvel @ 2015-09-16 14:07 UTC (permalink / raw)
To: James Bottomley
Cc: matt@codeblueprint.co.uk, linux-kernel@vger.kernel.org,
pjones@redhat.com, jlee@suse.com, bp@suse.de, dyoung@redhat.com,
stable@vger.kernel.org, x86@kernel.org, hpa@zytor.com,
linux-efi@vger.kernel.org, leif.lindholm@linaro.org,
matt.fleming@intel.com, LeifLindholm@linux-rxt1.site,
mjg59@srcf.ucam.org
On 16 September 2015 at 15:37, James Bottomley <jbottomley@odin.com> wrote:
> On Wed, 2015-09-16 at 13:25 +0200, Ard Biesheuvel wrote:
>> On 16 September 2015 at 12:08, Borislav Petkov <bp@suse.de> wrote:
>> > On Wed, Sep 09, 2015 at 12:21:23PM +0100, Matt Fleming wrote:
>> >> On Wed, 09 Sep, at 08:33:07AM, joeyli wrote:
>> >> >
>> >> > Yes, the machine on my hand has EFI_PROPERTIES_TABLE enabled, and it doesn't
>> >> > boot without your patch.
>> >>
>> >> Awesome. Could you test the following patch instead?
>> >>
>> >> ---
>> >>
>> >> From 24d324b781a3b688dcc265995949a9cf4e8af687 Mon Sep 17 00:00:00 2001
>> >> From: Matt Fleming <matt.fleming@intel.com>
>> >> Date: Thu, 3 Sep 2015 15:56:25 +0100
>> >> Subject: [PATCH v2] x86/efi: Map EFI memmap entries in-order at runtime
>> >>
>> >> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
>> >> signals that the firmware PE/COFF loader supports splitting code and
>> >> data sections of PE/COFF images into separate EFI memory map entries.
>> >> This allows the kernel to map those regions with strict memory
>> >> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
>> >>
>> >> Unfortunately, an unwritten requirement of this new feature is that
>> >> the regions need to be mapped with the same offsets relative to each
>> >> other as observed in the EFI memory map. If this is not done crashes
>> >
>> > Let me get this straight: this looks like the next EFI screwup which
>> > practically requires specific mapping placement in VA space just
>> > because it uses relative addresses?
>>
>> Both relative and absolute references, currently. The latter are also
>> affected since the relocation offset that is applied to all PE/COFF
>> relocation entries is based on the displacement of ImageBase, and
>> absolute references to symbols in .data need to be treated specially
>> (since it may be shifted relative to the .text section containing
>> ImageBase). This could be worked around by converting each absolute
>> reference individually using ConvertPointer () [and I have a proof of
>> concept that actually makes the problem go away on x86] but it would
>> still be only a partial solution, since relative references are not
>> tracked in the PE/COFF metadata, so even if we wanted to, it would be
>> intractible to find each cross-section relative reference and do the
>> fixup.
>>
>> > And since you say "unwritten" this
>> > practically a requirement is not even in the spec?
>> >
>>
>> No, it seems nobody thought of this when designing the feature.
>
> To add colour: our problem is section relative references (either loads
> or jumps). The PE/COFF linker is allowed not to emit relocations for
> section relative references because it expects that the sections will
> always be loaded at the same relative offset.
The PE/COFF spec does not define any relative relocation types to be
used in the .reloc section (which is what is used for runtime
relocations). It does define such relocation types as COFF
relocations, which are more like static relocations in ELF, i.e., what
the linker uses at build time to combine object files into an
executable, but those cannot appear in an executable, only in object
files.
> It looks like it is
> possible to force them to have relocation entries, so it would be
> possible to add to the standard language requiring this for UEFI
> compatible binaries, but that won't help with any of the existing
> PE/COFF stuff in the field.
>
Sadly, no. The PE/COFF spec is clear about COFF relocations appearing
only in object files, and the .reloc section only tracks absolute
references.
> The problem is that to apply the various protections UEFI is
> introducing, we're trying to relocate the sections and that's what's
> causing the issue. Before this, no-one really thought of mapping the
> sections at different relative addresses. It's really an unexpected
> weakness in the PE/COFF spec that we can't fix, so we have to work
> around it.
>
To be honest, while I am not a big fan of PE/COFF, I don't think it is
reasonable to expect that an image can be simply split up and moved
apart. I think we (UEFI forum) have dropped the ball here.
I won't go into too much detail about how I think it should be
implemented instead, let's save that for the conf call. But splitting
memory regions that belong together without /any/ annotations
whatsoever in the memory map is just sloppy design.
--
Ard.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime
2015-09-16 13:38 ` Ard Biesheuvel
@ 2015-09-17 8:05 ` Borislav Petkov
0 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2015-09-17 8:05 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Matt Fleming, joeyli, linux-efi@vger.kernel.org,
linux-kernel@vger.kernel.org, x86@kernel.org, Matt Fleming,
Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
H. Peter Anvin, Dave Young, stable@vger.kernel.org
On Wed, Sep 16, 2015 at 03:38:45PM +0200, Ard Biesheuvel wrote:
> That is a can of worms I'd rather keep closed, if you don't mind ...
Same here.
> But in general, since we are already violating the PE/COFF spec by
> relocating each runtime image once, then invoke its entry point, then
> fire an event which it should catch to manually update its pointers,
> and then relocate it again into the OS VA space.
Yeah, I vaguely remember at the time hpa proposing an EFI-specific
page fault handler or so. I guess we should consider such or similar
technique as it should be most flexible to deal with such screwups. And
with whatever funky fw stuff comes our way in the future...
Thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-09-17 8:05 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-04 13:14 [PATCH] x86/efi: Map EFI memmap entries in-order at runtime Matt Fleming
2015-09-04 13:24 ` Ard Biesheuvel
2015-09-04 18:23 ` Matt Fleming
2015-09-04 18:53 ` Ard Biesheuvel
2015-09-06 14:06 ` Ard Biesheuvel
2015-09-08 13:16 ` Matt Fleming
2015-09-08 13:21 ` Ard Biesheuvel
2015-09-08 20:37 ` Matt Fleming
2015-09-09 7:37 ` Ard Biesheuvel
2015-09-09 9:58 ` Matt Fleming
2015-09-09 9:59 ` Ard Biesheuvel
2015-09-07 4:07 ` joeyli
2015-09-08 20:41 ` Matt Fleming
2015-09-09 0:33 ` joeyli
2015-09-09 11:21 ` Matt Fleming
2015-09-10 3:38 ` joeyli
2015-09-16 10:08 ` Borislav Petkov
2015-09-16 11:25 ` Ard Biesheuvel
2015-09-16 13:28 ` Borislav Petkov
2015-09-16 13:38 ` Ard Biesheuvel
2015-09-17 8:05 ` Borislav Petkov
2015-09-16 13:37 ` James Bottomley
2015-09-16 14:07 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).