* [PATCH v4 0/3] arm64: hibernate: Support DEBUG_PAGEALLOC
@ 2016-08-15 17:11 James Morse
2016-08-15 17:12 ` [PATCH v4 1/3] arm64: Create sections.h James Morse
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: James Morse @ 2016-08-15 17:11 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
These patches improve arm64's hibernate support by fixing the hibernate
problem with DEBUG_PAGEALLOC reported by Will.
This also addresses the outstanding comment from Catalin [0] regarding
cleaning of the whole kernel to the PoC. Now we clean two new sections
.mmuoff.{data,text}.
This series is based on v4.8-rc2, and can be retrieved from:
git://linux-arm.org/linux-jm.git -b hibernate/debug_pagealloc/v4
Changes since v3:
* Pad mmuoff.data section to CWG.
* Specified the .mmuoff.data section for secondary_holding_pen_release in C
* Added irqentry_text start/end to sections.h
* Updated patch 1 for kprobes idmap/hyp_idmap declarations.
Changes since v2:
* Added a mmuoff.data section for secondary_holding_pen_release etc.
Changes since v1:
* Added .mmuoff.text section and gathered functions together.
* Put sections.h in alphabetical order.
James Morse (3):
arm64: Create sections.h
arm64: vmlinux.ld: Add .mmuoff.{text,data} sections
arm64: hibernate: Support DEBUG_PAGEALLOC
arch/arm64/Kconfig | 1 -
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/sections.h | 31 +++++++++++++++++++++++
arch/arm64/include/asm/traps.h | 6 +----
arch/arm64/include/asm/virt.h | 9 +------
arch/arm64/kernel/alternative.c | 7 +++---
arch/arm64/kernel/head.S | 26 +++++++++++++------
arch/arm64/kernel/hibernate.c | 51 +++++++++++++++++++++++++++-----------
arch/arm64/kernel/probes/kprobes.c | 5 +---
arch/arm64/kernel/sleep.S | 2 ++
arch/arm64/kernel/smp_spin_table.c | 3 ++-
arch/arm64/kernel/vmlinux.lds.S | 8 ++++++
arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++-
arch/arm64/mm/proc.S | 4 +++
14 files changed, 146 insertions(+), 48 deletions(-)
create mode 100644 arch/arm64/include/asm/sections.h
--
2.8.0.rc3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 1/3] arm64: Create sections.h
2016-08-15 17:11 [PATCH v4 0/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
@ 2016-08-15 17:12 ` James Morse
2016-08-15 17:12 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text,data} sections James Morse
2016-08-15 17:12 ` [PATCH v4 3/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
2 siblings, 0 replies; 8+ messages in thread
From: James Morse @ 2016-08-15 17:12 UTC (permalink / raw)
To: linux-arm-kernel
Each time new section markers are added, kernel/vmlinux.ld.S is updated,
and new extern char __start_foo[] definitions are scattered through the
tree.
Create asm/include/sections.h to collect these definitions (and include
the existing asm-generic version).
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
---
Changes since v3:
* Added irqentry_text start/end to sections.h
* Updated for kprobes idmap/hyp_idmap declarations.
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/sections.h | 29 +++++++++++++++++++++++++++++
arch/arm64/include/asm/traps.h | 6 +-----
arch/arm64/include/asm/virt.h | 9 +--------
arch/arm64/kernel/alternative.c | 7 +++----
arch/arm64/kernel/hibernate.c | 6 ------
arch/arm64/kernel/probes/kprobes.c | 5 +----
7 files changed, 35 insertions(+), 28 deletions(-)
create mode 100644 arch/arm64/include/asm/sections.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index f43d2c44c765..2b3d2d24acba 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -32,7 +32,6 @@ generic-y += poll.h
generic-y += preempt.h
generic-y += resource.h
generic-y += rwsem.h
-generic-y += sections.h
generic-y += segment.h
generic-y += sembuf.h
generic-y += serial.h
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
new file mode 100644
index 000000000000..237fcdd13445
--- /dev/null
+++ b/arch/arm64/include/asm/sections.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2016 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SECTIONS_H
+#define __ASM_SECTIONS_H
+
+#include <asm-generic/sections.h>
+
+extern char __alt_instructions[], __alt_instructions_end[];
+extern char __exception_text_start[], __exception_text_end[];
+extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
+extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+extern char __hyp_text_start[], __hyp_text_end[];
+extern char __idmap_text_start[], __idmap_text_end[];
+extern char __irqentry_text_start[], __irqentry_text_end[];
+
+#endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 9cd03f3e812f..02e9035b0685 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -19,6 +19,7 @@
#define __ASM_TRAP_H
#include <linux/list.h>
+#include <asm/sections.h>
struct pt_regs;
@@ -39,9 +40,6 @@ void arm64_notify_segfault(struct pt_regs *regs, unsigned long addr);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
static inline int __in_irqentry_text(unsigned long ptr)
{
- extern char __irqentry_text_start[];
- extern char __irqentry_text_end[];
-
return ptr >= (unsigned long)&__irqentry_text_start &&
ptr < (unsigned long)&__irqentry_text_end;
}
@@ -54,8 +52,6 @@ static inline int __in_irqentry_text(unsigned long ptr)
static inline int in_exception_text(unsigned long ptr)
{
- extern char __exception_text_start[];
- extern char __exception_text_end[];
int in;
in = ptr >= (unsigned long)&__exception_text_start &&
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 1788545f25bc..db5739413677 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -45,6 +45,7 @@
#ifndef __ASSEMBLY__
#include <asm/ptrace.h>
+#include <asm/sections.h>
/*
* __boot_cpu_mode records what mode CPUs were booted in.
@@ -87,14 +88,6 @@ extern void verify_cpu_run_el(void);
static inline void verify_cpu_run_el(void) {}
#endif
-/* The section containing the hypervisor idmap text */
-extern char __hyp_idmap_text_start[];
-extern char __hyp_idmap_text_end[];
-
-/* The section containing the hypervisor text */
-extern char __hyp_text_start[];
-extern char __hyp_text_end[];
-
#endif /* __ASSEMBLY__ */
#endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b21a10d..4434dabde898 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -25,14 +25,13 @@
#include <asm/alternative.h>
#include <asm/cpufeature.h>
#include <asm/insn.h>
+#include <asm/sections.h>
#include <linux/stop_machine.h>
#define __ALT_PTR(a,f) (u32 *)((void *)&(a)->f + (a)->f)
#define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
#define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
-extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
-
struct alt_region {
struct alt_instr *begin;
struct alt_instr *end;
@@ -124,8 +123,8 @@ static int __apply_alternatives_multi_stop(void *unused)
{
static int patched = 0;
struct alt_region region = {
- .begin = __alt_instructions,
- .end = __alt_instructions_end,
+ .begin = (struct alt_instr *)__alt_instructions,
+ .end = (struct alt_instr *)__alt_instructions_end,
};
/* We always have a CPU 0 at this point (__init) */
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 65d81f965e74..b4082017c4cb 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -54,12 +54,6 @@ extern int in_suspend;
/* Do we need to reset el2? */
#define el2_reset_needed() (is_hyp_mode_available() && !is_kernel_in_hyp_mode())
-/*
- * Start/end of the hibernate exit code, this must be copied to a 'safe'
- * location in memory, and executed from there.
- */
-extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
-
/* temporary el2 vectors in the __hibernate_exit_text section. */
extern char hibernate_el2_vectors[];
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index c6b0f40620d8..2dc1b42afaaa 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -31,7 +31,7 @@
#include <asm/insn.h>
#include <asm/uaccess.h>
#include <asm/irq.h>
-#include <asm-generic/sections.h>
+#include <asm/sections.h>
#include "decode-insn.h"
@@ -543,9 +543,6 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
bool arch_within_kprobe_blacklist(unsigned long addr)
{
- extern char __idmap_text_start[], __idmap_text_end[];
- extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
if ((addr >= (unsigned long)__kprobes_text_start &&
addr < (unsigned long)__kprobes_text_end) ||
(addr >= (unsigned long)__entry_text_start &&
--
2.8.0.rc3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text,data} sections
2016-08-15 17:11 [PATCH v4 0/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
2016-08-15 17:12 ` [PATCH v4 1/3] arm64: Create sections.h James Morse
@ 2016-08-15 17:12 ` James Morse
2016-08-17 17:50 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections Ard Biesheuvel
2016-08-15 17:12 ` [PATCH v4 3/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
2 siblings, 1 reply; 8+ messages in thread
From: James Morse @ 2016-08-15 17:12 UTC (permalink / raw)
To: linux-arm-kernel
Resume from hibernate needs to clean any text executed by the kernel with
the MMU off to the PoC. Collect these functions together into a new
.mmuoff.text section. __boot_cpu_mode and secondary_holding_pen_release
are data that is read or written with the MMU off. Add these to a new
.mmuoff.data section.
This covers booting of secondary cores and the cpu_suspend() path used
by cpu-idle and suspend-to-ram.
The bulk of head.S is not included, as the primary boot code is only ever
executed once, the kernel never needs to ensure it is cleaned to a
particular point in the cache.
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v3:
* Pad mmuoff.data section to CWG.
* Specified the .mmuoff.data section for secondary_holding_pen_release in C
arch/arm64/include/asm/sections.h | 2 ++
arch/arm64/kernel/head.S | 26 ++++++++++++++++++--------
arch/arm64/kernel/sleep.S | 2 ++
arch/arm64/kernel/smp_spin_table.c | 3 ++-
arch/arm64/kernel/vmlinux.lds.S | 8 ++++++++
arch/arm64/mm/proc.S | 4 ++++
6 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 237fcdd13445..fb824a71fbb2 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -25,5 +25,7 @@ extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
extern char __hyp_text_start[], __hyp_text_end[];
extern char __idmap_text_start[], __idmap_text_end[];
extern char __irqentry_text_start[], __irqentry_text_end[];
+extern char __mmuoff_data_start[], __mmuoff_data_end[];
+extern char __mmuoff_text_start[], __mmuoff_text_end[];
#endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b77f58355da1..4230eeeeabf5 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -477,6 +477,7 @@ ENTRY(kimage_vaddr)
* Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in x20 if
* booted in EL1 or EL2 respectively.
*/
+ .pushsection ".mmuoff.text", "ax"
ENTRY(el2_setup)
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
@@ -621,17 +622,31 @@ set_cpu_boot_mode_flag:
ENDPROC(set_cpu_boot_mode_flag)
/*
+ * Values in this section are written with the MMU off, but read with the
+ * MMU on. Writers will invalidate the corresponding address, discarding
+ * a 'Cache Writeback Granule' (CWG) worth of data. Align these variables
+ * to the architectural maximum of 2K.
+ */
+ .pushsection ".mmuoff.data", "aw"
+ .align 11
+/*
* We need to find out the CPU boot mode long after boot, so we need to
* store it in a writable variable.
*
* This is not in .bss, because we set it sufficiently early that the boot-time
* zeroing of .bss would clobber it.
*/
- .pushsection .data..cacheline_aligned
- .align L1_CACHE_SHIFT
ENTRY(__boot_cpu_mode)
.long BOOT_CPU_MODE_EL2
.long BOOT_CPU_MODE_EL1
+/*
+ * The booting CPU updates the failed status @__early_cpu_boot_status,
+ * with MMU turned off.
+ */
+ENTRY(__early_cpu_boot_status)
+ .long 0
+
+ .align 11
.popsection
/*
@@ -687,6 +702,7 @@ __secondary_switched:
mov x29, #0
b secondary_start_kernel
ENDPROC(__secondary_switched)
+ .popsection
/*
* The booting CPU updates the failed status @__early_cpu_boot_status,
@@ -706,12 +722,6 @@ ENDPROC(__secondary_switched)
dc ivac, \tmp1 // Invalidate potentially stale cache line
.endm
- .pushsection .data..cacheline_aligned
- .align L1_CACHE_SHIFT
-ENTRY(__early_cpu_boot_status)
- .long 0
- .popsection
-
/*
* Enable the MMU.
*
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 9a3aec97ac09..e66ce9b7bbde 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -97,6 +97,7 @@ ENTRY(__cpu_suspend_enter)
ENDPROC(__cpu_suspend_enter)
.ltorg
+ .pushsection ".mmuoff.text", "ax"
ENTRY(cpu_resume)
bl el2_setup // if in EL2 drop to EL1 cleanly
/* enable the MMU early - so we can access sleep_save_stash by va */
@@ -106,6 +107,7 @@ ENTRY(cpu_resume)
adrp x26, swapper_pg_dir
b __cpu_setup
ENDPROC(cpu_resume)
+ .popsection
ENTRY(_cpu_resume)
mrs x1, mpidr_el1
diff --git a/arch/arm64/kernel/smp_spin_table.c b/arch/arm64/kernel/smp_spin_table.c
index 18a71bcd26ee..9db2471e1eed 100644
--- a/arch/arm64/kernel/smp_spin_table.c
+++ b/arch/arm64/kernel/smp_spin_table.c
@@ -29,7 +29,8 @@
#include <asm/smp_plat.h>
extern void secondary_holding_pen(void);
-volatile unsigned long secondary_holding_pen_release = INVALID_HWID;
+volatile unsigned long __section(".mmuoff.data")
+secondary_holding_pen_release = INVALID_HWID;
static phys_addr_t cpu_release_addr[NR_CPUS];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 659963d40bb4..bbab3d886516 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -120,6 +120,9 @@ SECTIONS
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
ENTRY_TEXT
+ __mmuoff_text_start = .;
+ *(.mmuoff.text)
+ __mmuoff_text_end = .;
TEXT_TEXT
SCHED_TEXT
LOCK_TEXT
@@ -186,6 +189,11 @@ SECTIONS
_sdata = .;
RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
PECOFF_EDATA_PADDING
+ .mmuoff.data : {
+ __mmuoff_data_start = .;
+ *(.mmuoff.data)
+ __mmuoff_data_end = .;
+ }
_edata = .;
BSS_SECTION(0, 0, 0)
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5bb61de23201..a709e95d68ff 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -83,6 +83,7 @@ ENDPROC(cpu_do_suspend)
*
* x0: Address of context pointer
*/
+ .pushsection ".mmuoff.text", "ax"
ENTRY(cpu_do_resume)
ldp x2, x3, [x0]
ldp x4, x5, [x0, #16]
@@ -111,6 +112,7 @@ ENTRY(cpu_do_resume)
isb
ret
ENDPROC(cpu_do_resume)
+ .popsection
#endif
/*
@@ -172,6 +174,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
* Initialise the processor for turning the MMU on. Return in x0 the
* value of the SCTLR_EL1 register.
*/
+ .pushsection ".mmuoff.text", "ax"
ENTRY(__cpu_setup)
tlbi vmalle1 // Invalidate local TLB
dsb nsh
@@ -257,3 +260,4 @@ ENDPROC(__cpu_setup)
crval:
.word 0xfcffffff // clear
.word 0x34d5d91d // set
+ .popsection
--
2.8.0.rc3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 3/3] arm64: hibernate: Support DEBUG_PAGEALLOC
2016-08-15 17:11 [PATCH v4 0/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
2016-08-15 17:12 ` [PATCH v4 1/3] arm64: Create sections.h James Morse
2016-08-15 17:12 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text,data} sections James Morse
@ 2016-08-15 17:12 ` James Morse
2 siblings, 0 replies; 8+ messages in thread
From: James Morse @ 2016-08-15 17:12 UTC (permalink / raw)
To: linux-arm-kernel
DEBUG_PAGEALLOC removes the valid bit of page table entries to prevent
any access to unallocated memory. Hibernate uses this as a hint that those
pages don't need to be saved/restored. This patch adds the
kernel_page_present() function it uses.
hibernate.c copies the resume kernel's linear map for use during restore.
Add _copy_pte() to fill-in the holes made by DEBUG_PAGEALLOC in the resume
kernel, so we can restore data the original kernel had at these addresses.
Finally, DEBUG_PAGEALLOC means the linear-map alias of KERNEL_START to
KERNEL_END may have holes in it, so we can't lazily clean this whole
area to the PoC. Only clean the new mmuoff regions, and the kernel/kvm
idmaps.
This reverts commit da24eb1f3f9e2c7b75c5f8c40d8e48e2c4789596.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/Kconfig | 1 -
arch/arm64/kernel/hibernate.c | 45 ++++++++++++++++++++++++++++++++++---------
arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++++++++++-
3 files changed, 75 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bc3f00f586f1..9be0c164df4e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -607,7 +607,6 @@ source kernel/Kconfig.preempt
source kernel/Kconfig.hz
config ARCH_SUPPORTS_DEBUG_PAGEALLOC
- depends on !HIBERNATION
def_bool y
config ARCH_HAS_HOLES_MEMORYMODEL
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index b4082017c4cb..da4470de1807 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -235,6 +235,7 @@ out:
return rc;
}
+#define dcache_clean_range(start, end) __flush_dcache_area(start, (end - start))
int swsusp_arch_suspend(void)
{
@@ -252,8 +253,14 @@ int swsusp_arch_suspend(void)
if (__cpu_suspend_enter(&state)) {
ret = swsusp_save();
} else {
- /* Clean kernel to PoC for secondary core startup */
- __flush_dcache_area(LMADDR(KERNEL_START), KERNEL_END - KERNEL_START);
+ /* Clean kernel core startup/idle code to PoC*/
+ dcache_clean_range(__mmuoff_text_start, __mmuoff_text_end);
+ dcache_clean_range(__mmuoff_data_start, __mmuoff_data_end);
+ dcache_clean_range(__idmap_text_start, __idmap_text_end);
+
+ /* Clean kvm setup code to PoC? */
+ if (el2_reset_needed())
+ dcache_clean_range(__hyp_idmap_text_start, __hyp_idmap_text_end);
/*
* Tell the hibernation core that we've just restored
@@ -269,6 +276,32 @@ int swsusp_arch_suspend(void)
return ret;
}
+static void _copy_pte(pte_t *dst_pte, pte_t *src_pte, unsigned long addr)
+{
+ unsigned long pfn = virt_to_pfn(addr);
+
+ if (pte_valid(*src_pte)) {
+ /*
+ * Resume will overwrite areas that may be marked
+ * read only (code, rodata). Clear the RDONLY bit from
+ * the temporary mappings we use during restore.
+ */
+ set_pte(dst_pte, __pte(pte_val(*src_pte) & ~PTE_RDONLY));
+ } else if (debug_pagealloc_enabled()) {
+ /*
+ * debug_pagealloc may have removed the PTE_VALID bit if
+ * the page isn't in use by the resume kernel. It may have
+ * been in use by the original kernel, in which case we need
+ * to put it back in our copy to do the restore.
+ *
+ * Check for mappable memory that gives us a translation
+ * like part of the linear map.
+ */
+ if (pfn_valid(pfn) && pte_pfn(*src_pte) == pfn)
+ set_pte(dst_pte, __pte((pte_val(*src_pte) & ~PTE_RDONLY) | PTE_VALID));
+ }
+}
+
static int copy_pte(pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long start,
unsigned long end)
{
@@ -284,13 +317,7 @@ static int copy_pte(pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long start,
src_pte = pte_offset_kernel(src_pmd, start);
do {
- if (!pte_none(*src_pte))
- /*
- * Resume will overwrite areas that may be marked
- * read only (code, rodata). Clear the RDONLY bit from
- * the temporary mappings we use during restore.
- */
- set_pte(dst_pte, __pte(pte_val(*src_pte) & ~PTE_RDONLY));
+ _copy_pte(dst_pte, src_pte, addr);
} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
return 0;
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index ca6d268e3313..b6c0da84258c 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -139,4 +139,42 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
__pgprot(0),
__pgprot(PTE_VALID));
}
-#endif
+#ifdef CONFIG_HIBERNATION
+/*
+ * When built with CONFIG_DEBUG_PAGEALLOC and CONFIG_HIBERNATION, this function
+ * is used to determine if a linear map page has been marked as not-present by
+ * CONFIG_DEBUG_PAGEALLOC. Walk the page table and check the PTE_VALID bit.
+ * This is based on kern_addr_valid(), which almost does what we need.
+ */
+bool kernel_page_present(struct page *page)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ unsigned long addr = (unsigned long)page_address(page);
+
+ pgd = pgd_offset_k(addr);
+ if (pgd_none(*pgd))
+ return false;
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ return false;
+ if (pud_sect(*pud))
+ return true;
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ return false;
+ if (pmd_sect(*pmd))
+ return true;
+
+ pte = pte_offset_kernel(pmd, addr);
+ if (pte_none(*pte))
+ return false;
+
+ return pte_valid(*pte);
+}
+#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_DEBUG_PAGEALLOC */
--
2.8.0.rc3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections
2016-08-15 17:12 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text,data} sections James Morse
@ 2016-08-17 17:50 ` Ard Biesheuvel
2016-08-18 11:39 ` James Morse
0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2016-08-17 17:50 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
On 15 August 2016 at 19:12, James Morse <james.morse@arm.com> wrote:
> Resume from hibernate needs to clean any text executed by the kernel with
> the MMU off to the PoC. Collect these functions together into a new
> .mmuoff.text section. __boot_cpu_mode and secondary_holding_pen_release
> are data that is read or written with the MMU off. Add these to a new
> .mmuoff.data section.
>
> This covers booting of secondary cores and the cpu_suspend() path used
> by cpu-idle and suspend-to-ram.
>
> The bulk of head.S is not included, as the primary boot code is only ever
> executed once, the kernel never needs to ensure it is cleaned to a
> particular point in the cache.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v3:
> * Pad mmuoff.data section to CWG.
> * Specified the .mmuoff.data section for secondary_holding_pen_release in C
>
> arch/arm64/include/asm/sections.h | 2 ++
> arch/arm64/kernel/head.S | 26 ++++++++++++++++++--------
> arch/arm64/kernel/sleep.S | 2 ++
> arch/arm64/kernel/smp_spin_table.c | 3 ++-
> arch/arm64/kernel/vmlinux.lds.S | 8 ++++++++
> arch/arm64/mm/proc.S | 4 ++++
> 6 files changed, 36 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
> index 237fcdd13445..fb824a71fbb2 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -25,5 +25,7 @@ extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
> extern char __hyp_text_start[], __hyp_text_end[];
> extern char __idmap_text_start[], __idmap_text_end[];
> extern char __irqentry_text_start[], __irqentry_text_end[];
> +extern char __mmuoff_data_start[], __mmuoff_data_end[];
> +extern char __mmuoff_text_start[], __mmuoff_text_end[];
>
> #endif /* __ASM_SECTIONS_H */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index b77f58355da1..4230eeeeabf5 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -477,6 +477,7 @@ ENTRY(kimage_vaddr)
> * Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in x20 if
> * booted in EL1 or EL2 respectively.
> */
> + .pushsection ".mmuoff.text", "ax"
> ENTRY(el2_setup)
> mrs x0, CurrentEL
> cmp x0, #CurrentEL_EL2
> @@ -621,17 +622,31 @@ set_cpu_boot_mode_flag:
> ENDPROC(set_cpu_boot_mode_flag)
>
> /*
> + * Values in this section are written with the MMU off, but read with the
> + * MMU on. Writers will invalidate the corresponding address, discarding
> + * a 'Cache Writeback Granule' (CWG) worth of data. Align these variables
> + * to the architectural maximum of 2K.
> + */
> + .pushsection ".mmuoff.data", "aw"
> + .align 11
> +/*
> * We need to find out the CPU boot mode long after boot, so we need to
> * store it in a writable variable.
> *
> * This is not in .bss, because we set it sufficiently early that the boot-time
> * zeroing of .bss would clobber it.
> */
> - .pushsection .data..cacheline_aligned
> - .align L1_CACHE_SHIFT
> ENTRY(__boot_cpu_mode)
> .long BOOT_CPU_MODE_EL2
> .long BOOT_CPU_MODE_EL1
> +/*
> + * The booting CPU updates the failed status @__early_cpu_boot_status,
> + * with MMU turned off.
> + */
> +ENTRY(__early_cpu_boot_status)
> + .long 0
> +
> + .align 11
How is this supposed to work? Is secondary_holding_pen_release
expected to be covered by this region as well?
Wouldn't it be better to handle this alignment in the linker script?
(if you even need it, but see below)
> .popsection
>
> /*
> @@ -687,6 +702,7 @@ __secondary_switched:
> mov x29, #0
> b secondary_start_kernel
> ENDPROC(__secondary_switched)
> + .popsection
>
> /*
> * The booting CPU updates the failed status @__early_cpu_boot_status,
> @@ -706,12 +722,6 @@ ENDPROC(__secondary_switched)
> dc ivac, \tmp1 // Invalidate potentially stale cache line
> .endm
>
> - .pushsection .data..cacheline_aligned
> - .align L1_CACHE_SHIFT
> -ENTRY(__early_cpu_boot_status)
> - .long 0
> - .popsection
> -
> /*
> * Enable the MMU.
> *
> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
> index 9a3aec97ac09..e66ce9b7bbde 100644
> --- a/arch/arm64/kernel/sleep.S
> +++ b/arch/arm64/kernel/sleep.S
> @@ -97,6 +97,7 @@ ENTRY(__cpu_suspend_enter)
> ENDPROC(__cpu_suspend_enter)
> .ltorg
>
> + .pushsection ".mmuoff.text", "ax"
> ENTRY(cpu_resume)
> bl el2_setup // if in EL2 drop to EL1 cleanly
> /* enable the MMU early - so we can access sleep_save_stash by va */
> @@ -106,6 +107,7 @@ ENTRY(cpu_resume)
> adrp x26, swapper_pg_dir
> b __cpu_setup
> ENDPROC(cpu_resume)
> + .popsection
>
> ENTRY(_cpu_resume)
> mrs x1, mpidr_el1
> diff --git a/arch/arm64/kernel/smp_spin_table.c b/arch/arm64/kernel/smp_spin_table.c
> index 18a71bcd26ee..9db2471e1eed 100644
> --- a/arch/arm64/kernel/smp_spin_table.c
> +++ b/arch/arm64/kernel/smp_spin_table.c
> @@ -29,7 +29,8 @@
> #include <asm/smp_plat.h>
>
> extern void secondary_holding_pen(void);
> -volatile unsigned long secondary_holding_pen_release = INVALID_HWID;
> +volatile unsigned long __section(".mmuoff.data")
> +secondary_holding_pen_release = INVALID_HWID;
>
> static phys_addr_t cpu_release_addr[NR_CPUS];
>
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 659963d40bb4..bbab3d886516 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -120,6 +120,9 @@ SECTIONS
> IRQENTRY_TEXT
> SOFTIRQENTRY_TEXT
> ENTRY_TEXT
> + __mmuoff_text_start = .;
> + *(.mmuoff.text)
> + __mmuoff_text_end = .;
> TEXT_TEXT
> SCHED_TEXT
> LOCK_TEXT
> @@ -186,6 +189,11 @@ SECTIONS
> _sdata = .;
> RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> PECOFF_EDATA_PADDING
This padding needs to be at the end, it is intended to make the size
of Image a multiple of 512. Alternatively, you could get rid of it
completely, I guess, if the end of .mmuoff.text is expected to be 2 KB
aligned (but I wonder if you need to)
> + .mmuoff.data : {
.mmuoff.data : ALIGN (SZ_2K) {
> + __mmuoff_data_start = .;
> + *(.mmuoff.data)
. = ALIGN(SZ_2K);
However, if the invalidation occurs before .bss is cleared (with the
caches on), perhaps there is no need to align the end of this section?
(and there is also no need to round up the part of it that lives in
head.S?)
> + __mmuoff_data_end = .;
> + }
> _edata = .;
>
> BSS_SECTION(0, 0, 0)
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5bb61de23201..a709e95d68ff 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -83,6 +83,7 @@ ENDPROC(cpu_do_suspend)
> *
> * x0: Address of context pointer
> */
> + .pushsection ".mmuoff.text", "ax"
> ENTRY(cpu_do_resume)
> ldp x2, x3, [x0]
> ldp x4, x5, [x0, #16]
> @@ -111,6 +112,7 @@ ENTRY(cpu_do_resume)
> isb
> ret
> ENDPROC(cpu_do_resume)
> + .popsection
> #endif
>
> /*
> @@ -172,6 +174,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
> * Initialise the processor for turning the MMU on. Return in x0 the
> * value of the SCTLR_EL1 register.
> */
> + .pushsection ".mmuoff.text", "ax"
> ENTRY(__cpu_setup)
> tlbi vmalle1 // Invalidate local TLB
> dsb nsh
> @@ -257,3 +260,4 @@ ENDPROC(__cpu_setup)
> crval:
> .word 0xfcffffff // clear
> .word 0x34d5d91d // set
> + .popsection
> --
> 2.8.0.rc3
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections
2016-08-17 17:50 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections Ard Biesheuvel
@ 2016-08-18 11:39 ` James Morse
2016-08-18 11:55 ` Ard Biesheuvel
0 siblings, 1 reply; 8+ messages in thread
From: James Morse @ 2016-08-18 11:39 UTC (permalink / raw)
To: linux-arm-kernel
Hi Ard,
On 17/08/16 18:50, Ard Biesheuvel wrote:
> On 15 August 2016 at 19:12, James Morse <james.morse@arm.com> wrote:
>> Resume from hibernate needs to clean any text executed by the kernel with
>> the MMU off to the PoC. Collect these functions together into a new
>> .mmuoff.text section. __boot_cpu_mode and secondary_holding_pen_release
>> are data that is read or written with the MMU off. Add these to a new
>> .mmuoff.data section.
>>
>> This covers booting of secondary cores and the cpu_suspend() path used
>> by cpu-idle and suspend-to-ram.
>>
>> The bulk of head.S is not included, as the primary boot code is only ever
>> executed once, the kernel never needs to ensure it is cleaned to a
>> particular point in the cache.
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index b77f58355da1..4230eeeeabf5 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -621,17 +622,31 @@ set_cpu_boot_mode_flag:
>> ENDPROC(set_cpu_boot_mode_flag)
>>
>> /*
>> + * Values in this section are written with the MMU off, but read with the
>> + * MMU on. Writers will invalidate the corresponding address, discarding
>> + * a 'Cache Writeback Granule' (CWG) worth of data. Align these variables
>> + * to the architectural maximum of 2K.
>> + */
>> + .pushsection ".mmuoff.data", "aw"
>> + .align 11
>> +/*
>> * We need to find out the CPU boot mode long after boot, so we need to
>> * store it in a writable variable.
>> *
>> * This is not in .bss, because we set it sufficiently early that the boot-time
>> * zeroing of .bss would clobber it.
>> */
>> - .pushsection .data..cacheline_aligned
>> - .align L1_CACHE_SHIFT
>> ENTRY(__boot_cpu_mode)
>> .long BOOT_CPU_MODE_EL2
>> .long BOOT_CPU_MODE_EL1
>> +/*
>> + * The booting CPU updates the failed status @__early_cpu_boot_status,
>> + * with MMU turned off.
>> + */
>> +ENTRY(__early_cpu_boot_status)
>> + .long 0
>> +
>> + .align 11
>
> How is this supposed to work? Is secondary_holding_pen_release
> expected to be covered by this region as well?
In this section, but not in the same CWG:
__boot_cpu_mode and __early_cpu_boot_status are written with the mmu off, then
the corresponding cache area is invalidated.
secondary_holding_pen_release works the other way round, it is written with the
mmu on, then clean+invalidated, to be read with the mmu off.
I grouped them together in an older version, Mark pointed out that the
maintenance of one could corrupt the other if they fall within a CWG of each
other. [0]
> Wouldn't it be better to handle this alignment in the linker script?
Its not just alignment of the section, but the alignment between mmuoff:read and
mmuoff:write variables. Maybe it would be cleared if they were in separate sections?
(I should at least smother it with more comments)
> (if you even need it, but see below)
>
>> .popsection
>>
>> /*
>> @@ -687,6 +702,7 @@ __secondary_switched:
>> mov x29, #0
>> b secondary_start_kernel
>> ENDPROC(__secondary_switched)
>> + .popsection
>>
>> /*
>> * The booting CPU updates the failed status @__early_cpu_boot_status,
>> @@ -706,12 +722,6 @@ ENDPROC(__secondary_switched)
>> dc ivac, \tmp1 // Invalidate potentially stale cache line
>> .endm
>>
>> - .pushsection .data..cacheline_aligned
>> - .align L1_CACHE_SHIFT
>> -ENTRY(__early_cpu_boot_status)
>> - .long 0
>> - .popsection
>> -
>> /*
>> * Enable the MMU.
>> *
>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 659963d40bb4..bbab3d886516 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -120,6 +120,9 @@ SECTIONS
>> IRQENTRY_TEXT
>> SOFTIRQENTRY_TEXT
>> ENTRY_TEXT
>> + __mmuoff_text_start = .;
>> + *(.mmuoff.text)
>> + __mmuoff_text_end = .;
>> TEXT_TEXT
>> SCHED_TEXT
>> LOCK_TEXT
>> @@ -186,6 +189,11 @@ SECTIONS
>> _sdata = .;
>> RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
>> PECOFF_EDATA_PADDING
>
> This padding needs to be at the end, it is intended to make the size
> of Image a multiple of 512.
Ah, I didn't realise that, readelf says I broke this.
I will move it to be before.
(secondary_holding_pen_rel appears after head.S's align 11s, so the next symbol
isn't 2KB aligned)
> Alternatively, you could get rid of it
> completely, I guess, if the end of .mmuoff.text is expected to be 2 KB
> aligned (but I wonder if you need to)
>
>> + .mmuoff.data : {
>
> .mmuoff.data : ALIGN (SZ_2K) {
>
>> + __mmuoff_data_start = .;
>> + *(.mmuoff.data)
>
> . = ALIGN(SZ_2K);
>
> However, if the invalidation occurs before .bss is cleared (with the
> caches on), perhaps there is no need to align the end of this section?
We invalidate something in this section via secondary_entry() ->
set_cpu_boot_mode_flag(), so this can happen any time.
My understanding is that CWG is maximum amount of data that will be invalidated
and at compile time we assume its worst case of 2KB. Aligning the end is to stop
anything else being located in this worst case range.
>
>> + __mmuoff_data_end = .;
>> + }
>> _edata = .;
>>
>> BSS_SECTION(0, 0, 0)
Thanks,
James
[0] https://patchwork.kernel.org/patch/9203423/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections
2016-08-18 11:39 ` James Morse
@ 2016-08-18 11:55 ` Ard Biesheuvel
2016-08-22 13:15 ` James Morse
0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2016-08-18 11:55 UTC (permalink / raw)
To: linux-arm-kernel
On 18 August 2016 at 13:39, James Morse <james.morse@arm.com> wrote:
> Hi Ard,
>
> On 17/08/16 18:50, Ard Biesheuvel wrote:
>> On 15 August 2016 at 19:12, James Morse <james.morse@arm.com> wrote:
>>> Resume from hibernate needs to clean any text executed by the kernel with
>>> the MMU off to the PoC. Collect these functions together into a new
>>> .mmuoff.text section. __boot_cpu_mode and secondary_holding_pen_release
>>> are data that is read or written with the MMU off. Add these to a new
>>> .mmuoff.data section.
>>>
>>> This covers booting of secondary cores and the cpu_suspend() path used
>>> by cpu-idle and suspend-to-ram.
>>>
>>> The bulk of head.S is not included, as the primary boot code is only ever
>>> executed once, the kernel never needs to ensure it is cleaned to a
>>> particular point in the cache.
>
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index b77f58355da1..4230eeeeabf5 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>>> @@ -621,17 +622,31 @@ set_cpu_boot_mode_flag:
>>> ENDPROC(set_cpu_boot_mode_flag)
>>>
>>> /*
>>> + * Values in this section are written with the MMU off, but read with the
>>> + * MMU on. Writers will invalidate the corresponding address, discarding
>>> + * a 'Cache Writeback Granule' (CWG) worth of data. Align these variables
>>> + * to the architectural maximum of 2K.
>>> + */
>>> + .pushsection ".mmuoff.data", "aw"
>>> + .align 11
>>> +/*
>>> * We need to find out the CPU boot mode long after boot, so we need to
>>> * store it in a writable variable.
>>> *
>>> * This is not in .bss, because we set it sufficiently early that the boot-time
>>> * zeroing of .bss would clobber it.
>>> */
>>> - .pushsection .data..cacheline_aligned
>>> - .align L1_CACHE_SHIFT
>>> ENTRY(__boot_cpu_mode)
>>> .long BOOT_CPU_MODE_EL2
>>> .long BOOT_CPU_MODE_EL1
>>> +/*
>>> + * The booting CPU updates the failed status @__early_cpu_boot_status,
>>> + * with MMU turned off.
>>> + */
>>> +ENTRY(__early_cpu_boot_status)
>>> + .long 0
>>> +
>>> + .align 11
>>
>> How is this supposed to work? Is secondary_holding_pen_release
>> expected to be covered by this region as well?
>
> In this section, but not in the same CWG:
> __boot_cpu_mode and __early_cpu_boot_status are written with the mmu off, then
> the corresponding cache area is invalidated.
>
> secondary_holding_pen_release works the other way round, it is written with the
> mmu on, then clean+invalidated, to be read with the mmu off.
>
> I grouped them together in an older version, Mark pointed out that the
> maintenance of one could corrupt the other if they fall within a CWG of each
> other. [0]
>
>
>> Wouldn't it be better to handle this alignment in the linker script?
>
> Its not just alignment of the section, but the alignment between mmuoff:read and
> mmuoff:write variables. Maybe it would be cleared if they were in separate sections?
>
Yes, that would make sense. Also, the section containing
secondary_holding_pen_release may still overlap with something else,
afaict if it is emitted after the one in head.S.
> (I should at least smother it with more comments)
>
That too :-)
>
>> (if you even need it, but see below)
>>
>>> .popsection
>>>
>>> /*
>>> @@ -687,6 +702,7 @@ __secondary_switched:
>>> mov x29, #0
>>> b secondary_start_kernel
>>> ENDPROC(__secondary_switched)
>>> + .popsection
>>>
>>> /*
>>> * The booting CPU updates the failed status @__early_cpu_boot_status,
>>> @@ -706,12 +722,6 @@ ENDPROC(__secondary_switched)
>>> dc ivac, \tmp1 // Invalidate potentially stale cache line
>>> .endm
>>>
>>> - .pushsection .data..cacheline_aligned
>>> - .align L1_CACHE_SHIFT
>>> -ENTRY(__early_cpu_boot_status)
>>> - .long 0
>>> - .popsection
>>> -
>>> /*
>>> * Enable the MMU.
>>> *
>
>>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>>> index 659963d40bb4..bbab3d886516 100644
>>> --- a/arch/arm64/kernel/vmlinux.lds.S
>>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>>> @@ -120,6 +120,9 @@ SECTIONS
>>> IRQENTRY_TEXT
>>> SOFTIRQENTRY_TEXT
>>> ENTRY_TEXT
>>> + __mmuoff_text_start = .;
>>> + *(.mmuoff.text)
>>> + __mmuoff_text_end = .;
>>> TEXT_TEXT
>>> SCHED_TEXT
>>> LOCK_TEXT
>>> @@ -186,6 +189,11 @@ SECTIONS
>>> _sdata = .;
>>> RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
>>> PECOFF_EDATA_PADDING
>>
>> This padding needs to be at the end, it is intended to make the size
>> of Image a multiple of 512.
>
> Ah, I didn't realise that, readelf says I broke this.
> I will move it to be before.
>
Yes, please. This is required for the PE/COFF signing tools (for UEFI
secure boot)
> (secondary_holding_pen_rel appears after head.S's align 11s, so the next symbol
> isn't 2KB aligned)
>
>
>> Alternatively, you could get rid of it
>> completely, I guess, if the end of .mmuoff.text is expected to be 2 KB
>> aligned (but I wonder if you need to)
>>
>>> + .mmuoff.data : {
>>
>> .mmuoff.data : ALIGN (SZ_2K) {
>>
>>> + __mmuoff_data_start = .;
>>> + *(.mmuoff.data)
>>
>> . = ALIGN(SZ_2K);
>>
>> However, if the invalidation occurs before .bss is cleared (with the
>> caches on), perhaps there is no need to align the end of this section?
>
> We invalidate something in this section via secondary_entry() ->
> set_cpu_boot_mode_flag(), so this can happen any time.
> My understanding is that CWG is maximum amount of data that will be invalidated
> and at compile time we assume its worst case of 2KB. Aligning the end is to stop
> anything else being located in this worst case range.
>
Actually, it is not even necessary to align the end of .mmuoff.data,
as long as the next section starts at at 2 KB aligned boundary (which
is guaranteed for .bss since it covers page aligned data, although it
would make sense to make that explicit) I.e., something like
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 659963d40bb4..70aa77060729 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -185,10 +185,18 @@ SECTIONS
_data = .;
_sdata = .;
RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
+
+ .mmuoff.read : ALIGN(SZ_2K) {
+ *(.mmuoff.read)
+ }
+ .mmuoff.write : ALIGN(SZ_2K) {
+ *(.mmuoff.write)
+ }
+
PECOFF_EDATA_PADDING
_edata = .;
- BSS_SECTION(0, 0, 0)
+ BSS_SECTION(SZ_2K, SZ_2K, 0)
. = ALIGN(PAGE_SIZE);
idmap_pg_dir = .;
AFAICT, this should allow you to drop the alignments in the code. This
is also more future proof, since you can simply emit variables into
these sections anywhere, whereas the explicit .align directive aligns
that particular variable, which could lead to more waste of space.
--
Ard.
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections
2016-08-18 11:55 ` Ard Biesheuvel
@ 2016-08-22 13:15 ` James Morse
0 siblings, 0 replies; 8+ messages in thread
From: James Morse @ 2016-08-22 13:15 UTC (permalink / raw)
To: linux-arm-kernel
Hi Ard,
On 18/08/16 12:55, Ard Biesheuvel wrote:
> Actually, it is not even necessary to align the end of .mmuoff.data,
> as long as the next section starts at at 2 KB aligned boundary (which
> is guaranteed for .bss since it covers page aligned data, although it
> would make sense to make that explicit) I.e., something like
>
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 659963d40bb4..70aa77060729 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -185,10 +185,18 @@ SECTIONS
> _data = .;
> _sdata = .;
> RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> +
> + .mmuoff.read : ALIGN(SZ_2K) {
> + *(.mmuoff.read)
> + }
> + .mmuoff.write : ALIGN(SZ_2K) {
> + *(.mmuoff.write)
> + }
> +
> PECOFF_EDATA_PADDING
> _edata = .;
>
> - BSS_SECTION(0, 0, 0)
> + BSS_SECTION(SZ_2K, SZ_2K, 0)
>
> . = ALIGN(PAGE_SIZE);
> idmap_pg_dir = .;
>
> AFAICT, this should allow you to drop the alignments in the code. This
> is also more future proof, since you can simply emit variables into
> these sections anywhere, whereas the explicit .align directive aligns
> that particular variable, which could lead to more waste of space.
Thanks! This looks a lot better.
I think the 2K alignment is only needed for the mmuoff.write section as it
invalidates cache lines. I don't see a problem with the mmuoff.read section's
CWG overlapping with the BSS or other data as the cache maintenance is
clean+invalidate.
Thanks,
James
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-08-22 13:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-15 17:11 [PATCH v4 0/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
2016-08-15 17:12 ` [PATCH v4 1/3] arm64: Create sections.h James Morse
2016-08-15 17:12 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text,data} sections James Morse
2016-08-17 17:50 ` [PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections Ard Biesheuvel
2016-08-18 11:39 ` James Morse
2016-08-18 11:55 ` Ard Biesheuvel
2016-08-22 13:15 ` James Morse
2016-08-15 17:12 ` [PATCH v4 3/3] arm64: hibernate: Support DEBUG_PAGEALLOC James Morse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).