[PATCH v1 0/2] arm64: runtime workaround for erratum #843419

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/2] arm64: runtime workaround for erratum #843419
@ 2015-09-18 15:39 Ard Biesheuvel
  2015-09-18 15:39 ` [PATCH v1 1/2] arm64: introduce infrastructure for emitting veneers at module reloc time Ard Biesheuvel
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ard Biesheuvel @ 2015-09-18 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

This implements a workaround for the A53 erratum that may result in adrp
instructions to produce incorrect values if they appear in either of
the last two instruction slots of a 4 KB page.

Instead of building modules using the large code model, this approach
uses veneers to call the adrp instructions at unaffected offsets, unless
they can be converted into adr instructions, which is even better (this
depends on whether the symbol is with 1 MB of the place)

Changes since RFC/v0:
- use correct relocation entry type (RELA not REL)
- put veneers in .text or .init.text instead of dedicated sections: this way,
  modules remain compatible regardless of whether they were built with
  CONFIG_ARM64_MOD_VENEERS enabled
- allocate veneer space for all adrp instructions we encounter, since the check
  is performed before the module is at its final vmalloc address
- fix Kconfig text
- some cosmetic changes

Comments welcome.

Ard Biesheuvel (2):
  arm64: introduce infrastructure for emitting veneers at module reloc
    time
  arm64: errata: add module load workaround for erratum #843419

 arch/arm64/Kconfig                   |  24 ++++
 arch/arm64/include/asm/mod_veneers.h |  20 +++
 arch/arm64/include/asm/module.h      |  11 ++
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/mod_veneers.c      | 138 ++++++++++++++++++++
 arch/arm64/kernel/module.c           |  34 +++++
 6 files changed, 228 insertions(+)
 create mode 100644 arch/arm64/include/asm/mod_veneers.h
 create mode 100644 arch/arm64/kernel/mod_veneers.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/2] arm64: introduce infrastructure for emitting veneers at module reloc time
  2015-09-18 15:39 [PATCH v1 0/2] arm64: runtime workaround for erratum #843419 Ard Biesheuvel
@ 2015-09-18 15:39 ` Ard Biesheuvel
  2015-09-18 15:40 ` [PATCH v1 2/2] arm64: errata: add module load workaround for erratum #843419 Ard Biesheuvel
  2015-10-13 15:13 ` [PATCH v1 0/2] arm64: runtime " Will Deacon
  2 siblings, 0 replies; 5+ messages in thread
From: Ard Biesheuvel @ 2015-09-18 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

Introduce a framework for arm64 that allows errata fixups to be implemented
by replacing problematic instruction sequences with calls into veneers that
are generated on the fly.

This is based on the module PLT support for ARM.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig              |   3 +
 arch/arm64/include/asm/module.h |  11 ++
 arch/arm64/kernel/Makefile      |   1 +
 arch/arm64/kernel/mod_veneers.c | 105 ++++++++++++++++++++
 4 files changed, 120 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7d95663c0160..b591aac06fed 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -68,6 +68,7 @@ config ARM64
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
 	select HAVE_MEMBLOCK
+	select HAVE_MOD_ARCH_SPECIFIC if ARM64_MOD_VENEERS
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
@@ -333,6 +334,8 @@ config ARM64_ERRATUM_845719
 
 endmenu
 
+config ARM64_MOD_VENEERS
+	bool
 
 choice
 	prompt "Page size"
diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
index e80e232b730e..d9ce63386e33 100644
--- a/arch/arm64/include/asm/module.h
+++ b/arch/arm64/include/asm/module.h
@@ -20,4 +20,15 @@
 
 #define MODULE_ARCH_VERMAGIC	"aarch64"
 
+#ifdef CONFIG_HAVE_MOD_ARCH_SPECIFIC
+struct mod_arch_specific {
+#ifdef CONFIG_ARM64_MOD_VENEERS
+	struct veneer_section {
+		int		sec_index;
+		unsigned int	sec_offset;
+	} core, init;
+#endif
+};
+#endif
+
 #endif /* __ASM_MODULE_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 22dc9bc781be..29eb0c8c33a8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -25,6 +25,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
 					   ../../arm/kernel/opcodes.o
 arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
 arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
+arm64-obj-$(CONFIG_ARM64_MOD_VENEERS)	+= mod_veneers.o
 arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
 arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/arm64/kernel/mod_veneers.c b/arch/arm64/kernel/mod_veneers.c
new file mode 100644
index 000000000000..9649c4f305bd
--- /dev/null
+++ b/arch/arm64/kernel/mod_veneers.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/elf.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+static bool in_init(const struct module *mod, u64 addr)
+{
+	return addr - (u64)mod->module_init < mod->init_size;
+}
+
+static void *alloc_veneer(struct module *mod, u64 loc, int size,
+			  Elf64_Shdr *sechdrs)
+{
+	struct veneer_section *vs;
+	void *ret;
+
+	if (in_init(mod, loc) && mod->arch.init.sec_index != -1)
+		vs = &mod->arch.init;
+	else
+		vs = &mod->arch.core;
+
+	ret = (void*)sechdrs[vs->sec_index].sh_addr + vs->sec_offset;
+	vs->sec_offset += size;
+
+	return ret;
+}
+
+/* estimate the maximum size of the veneer for this relocation */
+static unsigned long get_veneers_size(Elf64_Addr base, const Elf64_Rela *rel,
+				      int num)
+{
+	unsigned long ret = 0;
+	int i;
+
+	for (i = 0; i < num; i++)
+		switch (ELF64_R_TYPE(rel[i].r_info)) {
+		}
+	return ret;
+}
+
+int module_frob_arch_sections(Elf64_Ehdr *ehdr, Elf64_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long core_veneers_maxsize = 0, init_veneers_maxsize = 0;
+	int core_sec_idx = -1, init_sec_idx = -1;
+	int i;
+
+	/* find the .text and .init.text sections */
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		const char *sec_name = secstrings + sechdrs[i].sh_name;
+
+		if (strcmp(sec_name, ".text") == 0)
+			core_sec_idx = i;
+		else if (strcmp(sec_name, ".init.text") == 0)
+			init_sec_idx = i;
+
+		if (core_sec_idx != -1 && init_sec_idx != -1)
+			break;
+	}
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		Elf64_Shdr *s = &sechdrs[i];
+		Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
+		int numrels = s->sh_size / sizeof(Elf64_Rela);
+		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
+
+		if (s->sh_type != SHT_RELA)
+			continue;
+
+		if (strstr(secstrings + s->sh_name, ".init"))
+			init_veneers_maxsize += get_veneers_size(
+						dstsec->sh_addr, rels, numrels);
+		else
+			core_veneers_maxsize += get_veneers_size(
+						dstsec->sh_addr, rels, numrels);
+	}
+
+	if (init_sec_idx == -1 && init_veneers_maxsize > 0)
+		core_veneers_maxsize += init_veneers_maxsize;
+
+	if (core_sec_idx == -1 && core_veneers_maxsize > 0) {
+		pr_err("%s: .text section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	mod->arch.core.sec_index = core_sec_idx;
+	mod->arch.core.sec_offset = sechdrs[core_sec_idx].sh_size;
+	sechdrs[core_sec_idx].sh_size += core_veneers_maxsize;
+
+	mod->arch.init.sec_index = init_sec_idx;
+	if (init_sec_idx != -1) {
+		mod->arch.init.sec_offset = sechdrs[init_sec_idx].sh_size;
+		sechdrs[init_sec_idx].sh_size += init_veneers_maxsize;
+	}
+	pr_debug("module %s: core_veneers_maxsize == %lu, init_veneers_maxsize == %lu\n",
+		 mod->name, core_veneers_maxsize, init_veneers_maxsize);
+	return 0;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 2/2] arm64: errata: add module load workaround for erratum #843419
  2015-09-18 15:39 [PATCH v1 0/2] arm64: runtime workaround for erratum #843419 Ard Biesheuvel
  2015-09-18 15:39 ` [PATCH v1 1/2] arm64: introduce infrastructure for emitting veneers at module reloc time Ard Biesheuvel
@ 2015-09-18 15:40 ` Ard Biesheuvel
  2015-10-13 15:13 ` [PATCH v1 0/2] arm64: runtime " Will Deacon
  2 siblings, 0 replies; 5+ messages in thread
From: Ard Biesheuvel @ 2015-09-18 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

In order to work around Cortex-A53 erratum #843419, this patch updates
the module loading logic to either change potentially problematic adrp
instructions into adr instructions (if the symbol turns out to be in
range), or emit a veneer that is guaranteed to be at an offset that does
not trigger the issue.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig                   | 21 ++++++++++++
 arch/arm64/include/asm/mod_veneers.h | 20 ++++++++++++
 arch/arm64/kernel/mod_veneers.c      | 33 +++++++++++++++++++
 arch/arm64/kernel/module.c           | 34 ++++++++++++++++++++
 4 files changed, 108 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b591aac06fed..0f9e1d1235cb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -337,6 +337,27 @@ endmenu
 config ARM64_MOD_VENEERS
 	bool
 
+config ARM64_ERRATUM_843419
+	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
+	depends on MODULES
+	select ARM64_MOD_VENEERS
+	default y
+	help
+	  This option applies special relocation logic when loading kernel
+	  modules to prevent ADRP instructions from appearing in either of the
+	  last two instruction slots in a 4 KB page. This can cause a subsequent
+	  memory access to use an incorrect address on Cortex-A53 parts up to
+	  r0p4. If such an instruction is subject to relocation, we either
+	  change the ADRP instruction into an ADR instruction (if the symbol it
+	  refers to is within 1 MB), or else we replace it by a call to a veneer
+	  which executes the same instruction at a safe offset before branching
+	  back past the original location.
+
+	  Note that the kernel itself must be linked with a version of ld which
+	  fixes potentially affected ADRP instructions at build time.
+
+	  If unsure, say Y.
+
 choice
 	prompt "Page size"
 	default ARM64_4K_PAGES
diff --git a/arch/arm64/include/asm/mod_veneers.h b/arch/arm64/include/asm/mod_veneers.h
new file mode 100644
index 000000000000..3cf36ce91549
--- /dev/null
+++ b/arch/arm64/include/asm/mod_veneers.h
@@ -0,0 +1,20 @@
+
+#include <linux/types.h>
+
+struct veneer_erratum_843419 {
+	u32	adrp;
+	u32	branch;
+};
+
+static inline bool erratum_843419_affects_adrp_insn(void *addr)
+{
+	/*
+	 * The workaround for erratum 843419 only needs to be
+	 * applied if the adrp instruction appears in either of
+	 * the last two instruction slots in the 4 KB page.
+	 */
+	return ((u64)addr % SZ_4K) >= (SZ_4K - 8);
+}
+
+void *emit_erratum_843419_veneer(struct module *mod, u32 *insn,
+				 Elf64_Shdr *sechdrs);
diff --git a/arch/arm64/kernel/mod_veneers.c b/arch/arm64/kernel/mod_veneers.c
index 9649c4f305bd..7da300e59749 100644
--- a/arch/arm64/kernel/mod_veneers.c
+++ b/arch/arm64/kernel/mod_veneers.c
@@ -10,6 +10,8 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
+#include <asm/mod_veneers.h>
+
 static bool in_init(const struct module *mod, u64 addr)
 {
 	return addr - (u64)mod->module_init < mod->init_size;
@@ -32,6 +34,31 @@ static void *alloc_veneer(struct module *mod, u64 loc, int size,
 	return ret;
 }
 
+#ifdef CONFIG_ARM64_ERRATUM_843419
+void *emit_erratum_843419_veneer(struct module *mod, u32 *insn,
+				 Elf64_Shdr *sechdrs)
+{
+	struct veneer_erratum_843419 *veneer;
+
+	veneer = alloc_veneer(mod, (u64)insn, 2 * sizeof(*veneer), sechdrs);
+	if (erratum_843419_affects_adrp_insn(&veneer->adrp))
+		/*
+		 * We allocated a veneer that is susceptible to the same problem
+		 * as the original location. We allocated twice the space, so
+		 * just advance to the next slot.
+		 */
+		veneer++;
+
+	veneer->adrp = *insn;
+	veneer->branch = aarch64_insn_gen_branch_imm(
+				(unsigned long)&veneer->branch,
+				(unsigned long)(insn + 1),
+				AARCH64_INSN_BRANCH_NOLINK);
+
+	return veneer;
+}
+#endif
+
 /* estimate the maximum size of the veneer for this relocation */
 static unsigned long get_veneers_size(Elf64_Addr base, const Elf64_Rela *rel,
 				      int num)
@@ -41,6 +68,12 @@ static unsigned long get_veneers_size(Elf64_Addr base, const Elf64_Rela *rel,
 
 	for (i = 0; i < num; i++)
 		switch (ELF64_R_TYPE(rel[i].r_info)) {
+		case R_AARCH64_ADR_PREL_PG_HI21_NC:
+		case R_AARCH64_ADR_PREL_PG_HI21:
+#ifdef CONFIG_ARM64_ERRATUM_843419
+			ret += 2 * sizeof(struct veneer_erratum_843419);
+#endif
+			break;
 		}
 	return ret;
 }
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 67bf4107f6ef..5d46f63c0111 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -27,6 +27,7 @@
 #include <linux/vmalloc.h>
 #include <asm/alternative.h>
 #include <asm/insn.h>
+#include <asm/mod_veneers.h>
 #include <asm/sections.h>
 
 #define	AARCH64_INSN_IMM_MOVNZ		AARCH64_INSN_IMM_MAX
@@ -335,6 +336,39 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		case R_AARCH64_ADR_PREL_PG_HI21_NC:
 			overflow_check = false;
 		case R_AARCH64_ADR_PREL_PG_HI21:
+#ifdef CONFIG_ARM64_ERRATUM_843419
+			/*
+			 * TODO check for presence of affected A53 cores
+			 */
+			if (erratum_843419_affects_adrp_insn(loc)) {
+				struct veneer_erratum_843419 *v;
+
+				/*
+				 * This adrp instruction appears at an offset
+				 * that may be problematic on older Cortex-A53
+				 * cores. So first, try to convert it into a
+				 * simple adr instruction.
+				 */
+				ovf = reloc_insn_imm(RELOC_OP_PREL, loc,
+						     val & ~(SZ_4K - 1), 0, 21,
+						     AARCH64_INSN_IMM_ADR);
+				if (ovf == 0) {
+					/* success! convert adrp -> adr */
+					*(u32 *)loc &= 0x7fffffff;
+					break;
+				}
+
+				/* symbol out of range for adr -> emit veneer */
+				v = emit_erratum_843419_veneer(me, loc,
+							       sechdrs);
+				*(u32 *)loc = aarch64_insn_gen_branch_imm(
+						(unsigned long)loc,
+						(unsigned long)v,
+						AARCH64_INSN_BRANCH_NOLINK);
+				loc = &v->adrp;
+				/* fall through */
+			}
+#endif
 			ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
 					     AARCH64_INSN_IMM_ADR);
 			break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 0/2] arm64: runtime workaround for erratum #843419
  2015-09-18 15:39 [PATCH v1 0/2] arm64: runtime workaround for erratum #843419 Ard Biesheuvel
  2015-09-18 15:39 ` [PATCH v1 1/2] arm64: introduce infrastructure for emitting veneers at module reloc time Ard Biesheuvel
  2015-09-18 15:40 ` [PATCH v1 2/2] arm64: errata: add module load workaround for erratum #843419 Ard Biesheuvel
@ 2015-10-13 15:13 ` Will Deacon
  2015-10-13 19:22   ` Ard Biesheuvel
  2 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2015-10-13 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Fri, Sep 18, 2015 at 04:39:58PM +0100, Ard Biesheuvel wrote:
> This implements a workaround for the A53 erratum that may result in adrp
> instructions to produce incorrect values if they appear in either of
> the last two instruction slots of a 4 KB page.
> 
> Instead of building modules using the large code model, this approach
> uses veneers to call the adrp instructions at unaffected offsets, unless
> they can be converted into adr instructions, which is even better (this
> depends on whether the symbol is with 1 MB of the place)

As discussed at Connect, we should evaluate the performance penalty of
the existing workaround before making this more complicated. I tried to
do that today by building xfs as a module, then copying an xfs disk
image containing a git clone of the kernel into tmpfs, loop mounting
that and md5summing all of the files.

The performance eppears to be the same with and without the errata
workaround enabled, but I wondered if you had a better idea for a test
(short of writing a synthetic module)?

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 0/2] arm64: runtime workaround for erratum #843419
  2015-10-13 15:13 ` [PATCH v1 0/2] arm64: runtime " Will Deacon
@ 2015-10-13 19:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 5+ messages in thread
From: Ard Biesheuvel @ 2015-10-13 19:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 October 2015 at 16:13, Will Deacon <will.deacon@arm.com> wrote:
> Hi Ard,
>
> On Fri, Sep 18, 2015 at 04:39:58PM +0100, Ard Biesheuvel wrote:
>> This implements a workaround for the A53 erratum that may result in adrp
>> instructions to produce incorrect values if they appear in either of
>> the last two instruction slots of a 4 KB page.
>>
>> Instead of building modules using the large code model, this approach
>> uses veneers to call the adrp instructions at unaffected offsets, unless
>> they can be converted into adr instructions, which is even better (this
>> depends on whether the symbol is with 1 MB of the place)
>
> As discussed at Connect, we should evaluate the performance penalty of
> the existing workaround before making this more complicated. I tried to
> do that today by building xfs as a module, then copying an xfs disk
> image containing a git clone of the kernel into tmpfs, loop mounting
> that and md5summing all of the files.
>
> The performance eppears to be the same with and without the errata
> workaround enabled, but I wondered if you had a better idea for a test
> (short of writing a synthetic module)?
>

Good question. The AArch64 large model is essentially the 64-bit
equivalent of the pre-v7 AArch32 approach of adr instructions combined
with literal pools that are interspersed with the instructions in
.text. Even though movt/movw pairs carry absolute values in their
immediate fields (and thus can be used for any value and not just
nearby addresses), the use of v7 movt/movw pairs can be considered the
equivalent of the small model with its adrp/add pairs.

This means that any justification for preferring movt/movw over
literal pools should carry over to this case, and I'd be interested if
there is any knowledge internally at ARM regarding the use cases that
get a significant speedup. Obviously, workloads that are sensitive to
L1 efficiency are likely to be affected (since the literals go via the
D-cache) but I don't have any numbers to back up the claim that the
large model is slower in the real world.

-- 
Ard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-13 19:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-18 15:39 [PATCH v1 0/2] arm64: runtime workaround for erratum #843419 Ard Biesheuvel
2015-09-18 15:39 ` [PATCH v1 1/2] arm64: introduce infrastructure for emitting veneers at module reloc time Ard Biesheuvel
2015-09-18 15:40 ` [PATCH v1 2/2] arm64: errata: add module load workaround for erratum #843419 Ard Biesheuvel
2015-10-13 15:13 ` [PATCH v1 0/2] arm64: runtime " Will Deacon
2015-10-13 19:22   ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).