* [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel
@ 2017-03-09 8:25 Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 01/10] arm: kvm: move kvm_vgic_global_state out of .text section Ard Biesheuvel
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.
Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions
This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)
Changes since v4:
- the PTE_CONT patch has now spawned four more preparatory patches that clean
up some of the page table creation code before reintroducing the contiguous
attribute management
- add Mark's R-b to #4 and #5
Changes since v3:
- use linear alias only when patching the core kernel, and not for modules
- add patch to reintroduce the use of PTE_CONT for kernel mappings, except
for regions that are remapped read-only later on (i.e, .rodata and the
linear alias of .text+.rodata)
Changes since v2:
- ensure that text mappings remain writable under rodata=off
- rename create_mapping_late() to update_mapping_prot()
- clarify commit log of #2
- add acks
Ard Biesheuvel (10):
arm: kvm: move kvm_vgic_global_state out of .text section
arm64: mmu: move TLB maintenance from callers to create_mapping_late()
arm64: alternatives: apply boot time fixups via the linear mapping
arm64: mmu: map .text as read-only from the outset
arm64: mmu: apply strict permissions to .init.text and .init.data
arm64/mmu: align alloc_init_pte prototype with pmd/pud versions
arm64/mmu: ignore debug_pagealloc for kernel segments
arm64/mmu: add contiguous bit to sanity bug check
arm64/mmu: replace 'page_mappings_only' parameter with flags argument
arm64: mm: set the contiguous bit for kernel mappings where
appropriate
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/include/asm/pgtable.h | 10 +
arch/arm64/include/asm/sections.h | 2 +
arch/arm64/kernel/alternative.c | 11 +-
arch/arm64/kernel/smp.c | 1 +
arch/arm64/kernel/vmlinux.lds.S | 25 +-
arch/arm64/mm/mmu.c | 250 ++++++++++++++------
virt/kvm/arm/vgic/vgic.c | 4 +-
8 files changed, 212 insertions(+), 92 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 01/10] arm: kvm: move kvm_vgic_global_state out of .text section
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 02/10] arm64: mmu: move TLB maintenance from callers to create_mapping_late() Ard Biesheuvel
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
virt/kvm/arm/vgic/vgic.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..7713d96e85b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
#define DEBUG_SPINLOCK_BUG_ON(p)
#endif
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif = STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+ .gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
/*
* Locking order is always:
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 02/10] arm64: mmu: move TLB maintenance from callers to create_mapping_late()
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 01/10] arm: kvm: move kvm_vgic_global_state out of .text section Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 03/10] arm64: alternatives: apply boot time fixups via the linear mapping Ard Biesheuvel
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.
Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/mm/mmu.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..6cafd8723d1a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -319,17 +319,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
pgd_pgtable_alloc, page_mappings_only);
}
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
- phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+ phys_addr_t size, pgprot_t prot)
{
if (virt < VMALLOC_START) {
- pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
+ pr_warn("BUG: not updating mapping for %pa@0x%016lx - outside kernel range\n",
&phys, virt);
return;
}
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
NULL, debug_pagealloc_enabled());
+
+ /* flush the TLBs after updating live kernel mappings */
+ flush_tlb_kernel_range(virt, virt + size);
}
static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
@@ -402,19 +405,16 @@ void mark_rodata_ro(void)
unsigned long section_size;
section_size = (unsigned long)_etext - (unsigned long)_text;
- create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+ update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
section_size, PAGE_KERNEL_ROX);
/*
* mark .rodata as read only. Use __init_begin rather than __end_rodata
* to cover NOTES and EXCEPTION_TABLE.
*/
section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
- create_mapping_late(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
+ update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
section_size, PAGE_KERNEL_RO);
- /* flush the TLBs after updating live kernel mappings */
- flush_tlb_all();
-
debug_checkwx();
}
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 03/10] arm64: alternatives: apply boot time fixups via the linear mapping
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 01/10] arm: kvm: move kvm_vgic_global_state out of .text section Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 02/10] arm64: mmu: move TLB maintenance from callers to create_mapping_late() Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 04/10] arm64: mmu: map .text as read-only from the outset Ard Biesheuvel
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.
For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/kernel/alternative.c | 11 +++++-----
arch/arm64/kernel/smp.c | 1 +
arch/arm64/mm/mmu.c | 22 +++++++++++++++-----
4 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot, bool page_mappings_only);
extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
#endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8840c109c5d6 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -105,11 +105,11 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
return insn;
}
-static void __apply_alternatives(void *alt_region)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias)
{
struct alt_instr *alt;
struct alt_region *region = alt_region;
- u32 *origptr, *replptr;
+ u32 *origptr, *replptr, *updptr;
for (alt = region->begin; alt < region->end; alt++) {
u32 insn;
@@ -124,11 +124,12 @@ static void __apply_alternatives(void *alt_region)
origptr = ALT_ORIG_PTR(alt);
replptr = ALT_REPL_PTR(alt);
+ updptr = use_linear_alias ? (u32 *)lm_alias(origptr) : origptr;
nr_inst = alt->alt_len / sizeof(insn);
for (i = 0; i < nr_inst; i++) {
insn = get_alt_insn(alt, origptr + i, replptr + i);
- *(origptr + i) = cpu_to_le32(insn);
+ updptr[i] = cpu_to_le32(insn);
}
flush_icache_range((uintptr_t)origptr,
@@ -155,7 +156,7 @@ static int __apply_alternatives_multi_stop(void *unused)
isb();
} else {
BUG_ON(patched);
- __apply_alternatives(®ion);
+ __apply_alternatives(®ion, true);
/* Barriers provided by the cache flushing */
WRITE_ONCE(patched, 1);
}
@@ -176,5 +177,5 @@ void apply_alternatives(void *start, size_t length)
.end = start + length,
};
- __apply_alternatives(®ion);
+ __apply_alternatives(®ion, false);
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ef1caae02110..d4739552da28 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -434,6 +434,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
setup_cpu_features();
hyp_mode_check();
apply_alternatives_all();
+ mark_linear_text_alias_ro();
}
void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6cafd8723d1a..df377fbe464e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -372,16 +372,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
debug_pagealloc_enabled());
/*
- * Map the linear alias of the [_text, __init_begin) interval as
- * read-only/non-executable. This makes the contents of the
- * region accessible to subsystems such as hibernate, but
- * protects it from inadvertent modification or execution.
+ * Map the linear alias of the [_text, __init_begin) interval
+ * as non-executable now, and remove the write permission in
+ * mark_linear_text_alias_ro() below (which will be called after
+ * alternative patching has completed). This makes the contents
+ * of the region accessible to subsystems such as hibernate,
+ * but protects it from inadvertent modification or execution.
*/
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
- kernel_end - kernel_start, PAGE_KERNEL_RO,
+ kernel_end - kernel_start, PAGE_KERNEL,
early_pgtable_alloc, debug_pagealloc_enabled());
}
+void __init mark_linear_text_alias_ro(void)
+{
+ /*
+ * Remove the write permissions from the linear alias of .text/.rodata
+ */
+ update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+ (unsigned long)__init_begin - (unsigned long)_text,
+ PAGE_KERNEL_RO);
+}
+
static void __init map_mem(pgd_t *pgd)
{
struct memblock_region *reg;
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 04/10] arm64: mmu: map .text as read-only from the outset
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (2 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 03/10] arm64: alternatives: apply boot time fixups via the linear mapping Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 05/10] arm64: mmu: apply strict permissions to .init.text and .init.data Ard Biesheuvel
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.
To preserve the existing behavior under rodata=off, which is relied
upon by external debuggers to manage software breakpoints (as pointed
out by Mark), add an early_param() check for rodata=, and use RWX
permissions if it set to 'off'.
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/mm/mmu.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df377fbe464e..300e98e8cd63 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -416,9 +416,6 @@ void mark_rodata_ro(void)
{
unsigned long section_size;
- section_size = (unsigned long)_etext - (unsigned long)_text;
- update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
- section_size, PAGE_KERNEL_ROX);
/*
* mark .rodata as read only. Use __init_begin rather than __end_rodata
* to cover NOTES and EXCEPTION_TABLE.
@@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
vm_area_add_early(vma);
}
+static int __init parse_rodata(char *arg)
+{
+ return strtobool(arg, &rodata_enabled);
+}
+early_param("rodata", parse_rodata);
+
/*
* Create fine-grained mappings for the kernel.
*/
@@ -458,7 +461,14 @@ static void __init map_kernel(pgd_t *pgd)
{
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
- map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
+ /*
+ * External debuggers may need to write directly to the text
+ * mapping to install SW breakpoints. Allow this (only) when
+ * explicitly requested with rodata=off.
+ */
+ pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+
+ map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
&vmlinux_init);
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 05/10] arm64: mmu: apply strict permissions to .init.text and .init.data
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (3 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 04/10] arm64: mmu: map .text as read-only from the outset Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions Ard Biesheuvel
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.
This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).
Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/sections.h | 2 ++
arch/arm64/kernel/vmlinux.lds.S | 25 +++++++++++++-------
arch/arm64/mm/mmu.c | 12 ++++++----
3 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..941267caa39c 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,6 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
extern char __hyp_text_start[], __hyp_text_end[];
extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
extern char __irqentry_text_start[], __irqentry_text_end[];
extern char __mmuoff_data_start[], __mmuoff_data_end[];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
+ __inittext_begin = .;
INIT_TEXT_SECTION(8)
.exit.text : {
ARM_EXIT_KEEP(EXIT_TEXT)
}
+ . = ALIGN(4);
+ .altinstructions : {
+ __alt_instructions = .;
+ *(.altinstructions)
+ __alt_instructions_end = .;
+ }
+ .altinstr_replacement : {
+ *(.altinstr_replacement)
+ }
+
+ . = ALIGN(PAGE_SIZE);
+ __inittext_end = .;
+ __initdata_begin = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
PERCPU_SECTION(L1_CACHE_BYTES)
- . = ALIGN(4);
- .altinstructions : {
- __alt_instructions = .;
- *(.altinstructions)
- __alt_instructions_end = .;
- }
- .altinstr_replacement : {
- *(.altinstr_replacement)
- }
.rela : ALIGN(8) {
*(.rela .rela*)
}
@@ -181,6 +187,7 @@ SECTIONS
__rela_size = SIZEOF(.rela);
. = ALIGN(SEGMENT_ALIGN);
+ __initdata_end = .;
__init_end = .;
_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 300e98e8cd63..75e21c33caff 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -459,7 +459,8 @@ early_param("rodata", parse_rodata);
*/
static void __init map_kernel(pgd_t *pgd)
{
- static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
+ static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+ vmlinux_initdata, vmlinux_data;
/*
* External debuggers may need to write directly to the text
@@ -469,9 +470,12 @@ static void __init map_kernel(pgd_t *pgd)
pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
- map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
- map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
- &vmlinux_init);
+ map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+ &vmlinux_rodata);
+ map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+ &vmlinux_inittext);
+ map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+ &vmlinux_initdata);
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (4 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 05/10] arm64: mmu: apply strict permissions to .init.text and .init.data Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 15:53 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments Ard Biesheuvel
` (3 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
Align the function prototype of alloc_init_pte() with its pmd and pud
counterparts by replacing the pfn parameter with the equivalent physical
address.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/mm/mmu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75e21c33caff..c3963c592ec3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -107,7 +107,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
}
static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
- unsigned long end, unsigned long pfn,
+ unsigned long end, phys_addr_t phys,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(void))
{
@@ -128,8 +128,8 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
do {
pte_t old_pte = *pte;
- set_pte(pte, pfn_pte(pfn, prot));
- pfn++;
+ set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
+ phys += PAGE_SIZE;
/*
* After the PTE entry has been populated once, we
@@ -182,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
pmd_val(*pmd)));
} else {
- alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
+ alloc_init_pte(pmd, addr, next, phys,
prot, pgtable_alloc);
BUG_ON(pmd_val(old_pmd) != 0 &&
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (5 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 17:51 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check Ard Biesheuvel
` (2 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
The debug_pagealloc facility manipulates kernel mappings in the linear
region at page granularity to detect out of bounds or use-after-free
accesses. Since the kernel segments are not allocated dynamically,
there is no point in taking the debug_pagealloc_enabled flag into
account for them, and we can use block mappings unconditionally.
Note that this applies equally to the linear alias of text/rodata:
we will never have dynamic allocations there given that the same
memory is statically in use by the kernel image.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/mm/mmu.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c3963c592ec3..d3fecd20a136 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -328,8 +328,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
- NULL, debug_pagealloc_enabled());
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
/* flush the TLBs after updating live kernel mappings */
flush_tlb_kernel_range(virt, virt + size);
@@ -381,7 +380,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
*/
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, debug_pagealloc_enabled());
+ early_pgtable_alloc, false);
}
void __init mark_linear_text_alias_ro(void)
@@ -437,7 +436,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
BUG_ON(!PAGE_ALIGNED(size));
__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
- early_pgtable_alloc, debug_pagealloc_enabled());
+ early_pgtable_alloc, false);
vma->addr = va_start;
vma->phys_addr = pa_start;
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (6 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 18:04 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
9 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
A mapping with the contiguous bit cannot be safely manipulated while
live, regardless of whether the bit changes between the old and new
mapping. So take this into account when deciding whether the change
is safe.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/mm/mmu.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d3fecd20a136..a6d7a86dd2b8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -103,7 +103,15 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
*/
static const pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
- return old == 0 || new == 0 || ((old ^ new) & ~mask) == 0;
+ /* creating or taking down mappings is always safe */
+ if (old == 0 || new == 0)
+ return true;
+
+ /* live contiguous mappings may not be manipulated at all */
+ if ((old | new) & PTE_CONT)
+ return false;
+
+ return ((old ^ new) & ~mask) == 0;
}
static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (7 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 18:19 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
9 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
In preparation of extending the policy for manipulating kernel mappings
with whether or not contiguous hints may be used in the page tables,
replace the bool 'page_mappings_only' with a flags field and a flag
NO_BLOCK_MAPPINGS.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/mm/mmu.c | 45 ++++++++++++--------
1 file changed, 27 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a6d7a86dd2b8..9babafa253cf 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -43,6 +43,8 @@
#include <asm/mmu_context.h>
#include <asm/ptdump.h>
+#define NO_BLOCK_MAPPINGS BIT(0)
+
u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
u64 kimage_voffset __ro_after_init;
@@ -153,7 +155,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(void),
- bool page_mappings_only)
+ int flags)
{
pmd_t *pmd;
unsigned long next;
@@ -180,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
/* try section mapping first */
if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
- !page_mappings_only) {
+ (flags & NO_BLOCK_MAPPINGS) == 0) {
pmd_set_huge(pmd, phys, prot);
/*
@@ -217,7 +219,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(void),
- bool page_mappings_only)
+ int flags)
{
pud_t *pud;
unsigned long next;
@@ -239,7 +241,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
/*
* For 4K granule only, attempt to put down a 1GB block
*/
- if (use_1G_block(addr, next, phys) && !page_mappings_only) {
+ if (use_1G_block(addr, next, phys) &&
+ (flags & NO_BLOCK_MAPPINGS) == 0) {
pud_set_huge(pud, phys, prot);
/*
@@ -250,7 +253,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
pud_val(*pud)));
} else {
alloc_init_pmd(pud, addr, next, phys, prot,
- pgtable_alloc, page_mappings_only);
+ pgtable_alloc, flags);
BUG_ON(pud_val(old_pud) != 0 &&
pud_val(old_pud) != pud_val(*pud));
@@ -265,7 +268,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(void),
- bool page_mappings_only)
+ int flags)
{
unsigned long addr, length, end, next;
pgd_t *pgd = pgd_offset_raw(pgdir, virt);
@@ -285,7 +288,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
do {
next = pgd_addr_end(addr, end);
alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
- page_mappings_only);
+ flags);
phys += next - addr;
} while (pgd++, addr = next, addr != end);
}
@@ -314,17 +317,22 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
&phys, virt);
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
}
void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot, bool page_mappings_only)
{
+ int flags;
+
BUG_ON(mm == &init_mm);
+ if (page_mappings_only)
+ flags = NO_BLOCK_MAPPINGS;
+
__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
- pgd_pgtable_alloc, page_mappings_only);
+ pgd_pgtable_alloc, flags);
}
static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -336,7 +344,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
/* flush the TLBs after updating live kernel mappings */
flush_tlb_kernel_range(virt, virt + size);
@@ -346,6 +354,10 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ int flags = 0;
+
+ if (debug_pagealloc_enabled())
+ flags = NO_BLOCK_MAPPINGS;
/*
* Take care not to create a writable alias for the
@@ -356,8 +368,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
if (end < kernel_start || start >= kernel_end) {
__create_pgd_mapping(pgd, start, __phys_to_virt(start),
end - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
+ early_pgtable_alloc, flags);
return;
}
@@ -369,14 +380,12 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
__create_pgd_mapping(pgd, start,
__phys_to_virt(start),
kernel_start - start, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
+ early_pgtable_alloc, flags);
if (kernel_end < end)
__create_pgd_mapping(pgd, kernel_end,
__phys_to_virt(kernel_end),
end - kernel_end, PAGE_KERNEL,
- early_pgtable_alloc,
- debug_pagealloc_enabled());
+ early_pgtable_alloc, flags);
/*
* Map the linear alias of the [_text, __init_begin) interval
@@ -388,7 +397,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
*/
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, false);
+ early_pgtable_alloc, 0);
}
void __init mark_linear_text_alias_ro(void)
@@ -444,7 +453,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
BUG_ON(!PAGE_ALIGNED(size));
__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
- early_pgtable_alloc, false);
+ early_pgtable_alloc, 0);
vma->addr = va_start;
vma->phys_addr = pa_start;
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
` (8 preceding siblings ...)
2017-03-09 8:25 ` [PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument Ard Biesheuvel
@ 2017-03-09 8:25 ` Ard Biesheuvel
2017-03-09 19:33 ` Mark Rutland
9 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 8:25 UTC (permalink / raw)
To: linux-arm-kernel
This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)
This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:
granule size | cont PTE | cont PMD |
-------------+------------+------------+
4 KB | 64 KB | 32 MB |
16 KB | 2 MB | 1 GB* |
64 KB | 2 MB | 16 GB* |
* Only when built for 3 or more levels of translation. This is due to the
fact that a 2 level configuration only consists of PGDs and PTEs, and the
added complexity of dealing with folded PMDs is not justified considering
that 16 GB contiguous ranges are likely to be ignored by the hardware (and
16k/2 levels is a niche configuration)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/pgtable.h | 10 ++
arch/arm64/mm/mmu.c | 154 +++++++++++++-------
2 files changed, 114 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0eef6064bf3b..f10a7bf81849 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -74,6 +74,16 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
#define pte_user_exec(pte) (!(pte_val(pte) & PTE_UXN))
#define pte_cont(pte) (!!(pte_val(pte) & PTE_CONT))
+static inline u64 pte_cont_addr_end(u64 addr, u64 end)
+{
+ return min((addr + CONT_PTE_SIZE) & CONT_PTE_MASK, end);
+}
+
+static inline u64 pmd_cont_addr_end(u64 addr, u64 end)
+{
+ return min((addr + CONT_PMD_SIZE) & CONT_PMD_MASK, end);
+}
+
#ifdef CONFIG_ARM64_HW_AFDBM
#define pte_hw_dirty(pte) (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
#else
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9babafa253cf..e2ffab56c1a6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -44,6 +44,7 @@
#include <asm/ptdump.h>
#define NO_BLOCK_MAPPINGS BIT(0)
+#define NO_CONT_MAPPINGS BIT(1)
u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
@@ -116,11 +117,30 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
return ((old ^ new) & ~mask) == 0;
}
-static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
- unsigned long end, phys_addr_t phys,
- pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(void))
+static void init_pte(pte_t *pte, unsigned long addr, unsigned long end,
+ phys_addr_t phys, pgprot_t prot)
{
+ do {
+ pte_t old_pte = *pte;
+
+ set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
+
+ /*
+ * After the PTE entry has been populated once, we
+ * only allow updates to the permission attributes.
+ */
+ BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
+
+ } while (pte++, addr += PAGE_SIZE, phys += PAGE_SIZE, addr != end);
+}
+
+static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
+ unsigned long end, phys_addr_t phys,
+ pgprot_t prot,
+ phys_addr_t (*pgtable_alloc)(void),
+ int flags)
+{
+ unsigned long next;
pte_t *pte;
BUG_ON(pmd_sect(*pmd));
@@ -136,45 +156,30 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
pte = pte_set_fixmap_offset(pmd, addr);
do {
- pte_t old_pte = *pte;
+ pgprot_t __prot = prot;
- set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
- phys += PAGE_SIZE;
+ next = pte_cont_addr_end(addr, end);
- /*
- * After the PTE entry has been populated once, we
- * only allow updates to the permission attributes.
- */
- BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
+ /* use a contiguous mapping if the range is suitably aligned */
+ if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
+ (flags & NO_CONT_MAPPINGS) == 0)
+ __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ init_pte(pte, addr, next, phys, __prot);
+
+ phys += next - addr;
+ pte += (next - addr) / PAGE_SIZE;
+ } while (addr = next, addr != end);
pte_clear_fixmap();
}
-static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
- phys_addr_t phys, pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(void),
- int flags)
+static void init_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+ phys_addr_t phys, pgprot_t prot,
+ phys_addr_t (*pgtable_alloc)(void), int flags)
{
- pmd_t *pmd;
unsigned long next;
- /*
- * Check for initial section mappings in the pgd/pud and remove them.
- */
- BUG_ON(pud_sect(*pud));
- if (pud_none(*pud)) {
- phys_addr_t pmd_phys;
- BUG_ON(!pgtable_alloc);
- pmd_phys = pgtable_alloc();
- pmd = pmd_set_fixmap(pmd_phys);
- __pud_populate(pud, pmd_phys, PUD_TYPE_TABLE);
- pmd_clear_fixmap();
- }
- BUG_ON(pud_bad(*pud));
-
- pmd = pmd_set_fixmap_offset(pud, addr);
do {
pmd_t old_pmd = *pmd;
@@ -192,14 +197,54 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
pmd_val(*pmd)));
} else {
- alloc_init_pte(pmd, addr, next, phys,
- prot, pgtable_alloc);
+ alloc_init_cont_pte(pmd, addr, next, phys, prot,
+ pgtable_alloc, flags);
BUG_ON(pmd_val(old_pmd) != 0 &&
pmd_val(old_pmd) != pmd_val(*pmd));
}
phys += next - addr;
} while (pmd++, addr = next, addr != end);
+}
+
+static void alloc_init_cont_pmd(pud_t *pud, unsigned long addr,
+ unsigned long end, phys_addr_t phys,
+ pgprot_t prot,
+ phys_addr_t (*pgtable_alloc)(void), int flags)
+{
+ unsigned long next;
+ pmd_t *pmd;
+
+ /*
+ * Check for initial section mappings in the pgd/pud.
+ */
+ BUG_ON(pud_sect(*pud));
+ if (pud_none(*pud)) {
+ phys_addr_t pmd_phys;
+ BUG_ON(!pgtable_alloc);
+ pmd_phys = pgtable_alloc();
+ pmd = pmd_set_fixmap(pmd_phys);
+ __pud_populate(pud, pmd_phys, PUD_TYPE_TABLE);
+ pmd_clear_fixmap();
+ }
+ BUG_ON(pud_bad(*pud));
+
+ pmd = pmd_set_fixmap_offset(pud, addr);
+ do {
+ pgprot_t __prot = prot;
+
+ next = pmd_cont_addr_end(addr, end);
+
+ /* use a contiguous mapping if the range is suitably aligned */
+ if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
+ (flags & NO_CONT_MAPPINGS) == 0)
+ __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+
+ init_pmd(pmd, addr, next, phys, __prot, pgtable_alloc, flags);
+
+ phys += next - addr;
+ pmd += (next - (addr & PMD_MASK)) / PMD_SIZE;
+ } while (addr = next, addr != end);
pmd_clear_fixmap();
}
@@ -252,8 +297,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
BUG_ON(!pgattr_change_is_safe(pud_val(old_pud),
pud_val(*pud)));
} else {
- alloc_init_pmd(pud, addr, next, phys, prot,
- pgtable_alloc, flags);
+ alloc_init_cont_pmd(pud, addr, next, phys, prot,
+ pgtable_alloc, flags);
BUG_ON(pud_val(old_pud) != 0 &&
pud_val(old_pud) != pud_val(*pud));
@@ -317,19 +362,20 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
&phys, virt);
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
+ NO_CONT_MAPPINGS);
}
void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot, bool page_mappings_only)
{
- int flags;
+ int flags = NO_CONT_MAPPINGS;
BUG_ON(mm == &init_mm);
if (page_mappings_only)
- flags = NO_BLOCK_MAPPINGS;
+ flags |= NO_BLOCK_MAPPINGS;
__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
pgd_pgtable_alloc, flags);
@@ -344,7 +390,8 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
+ __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
+ NO_CONT_MAPPINGS);
/* flush the TLBs after updating live kernel mappings */
flush_tlb_kernel_range(virt, virt + size);
@@ -357,7 +404,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
int flags = 0;
if (debug_pagealloc_enabled())
- flags = NO_BLOCK_MAPPINGS;
+ flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
/*
* Take care not to create a writable alias for the
@@ -394,10 +441,12 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
* alternative patching has completed). This makes the contents
* of the region accessible to subsystems such as hibernate,
* but protects it from inadvertent modification or execution.
+ * Note that contiguous mappings cannot be remapped in this way,
+ * so we should avoid them here.
*/
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
kernel_end - kernel_start, PAGE_KERNEL,
- early_pgtable_alloc, 0);
+ early_pgtable_alloc, NO_CONT_MAPPINGS);
}
void __init mark_linear_text_alias_ro(void)
@@ -444,7 +493,8 @@ void mark_rodata_ro(void)
}
static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
- pgprot_t prot, struct vm_struct *vma)
+ pgprot_t prot, struct vm_struct *vma,
+ int flags)
{
phys_addr_t pa_start = __pa_symbol(va_start);
unsigned long size = va_end - va_start;
@@ -453,7 +503,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
BUG_ON(!PAGE_ALIGNED(size));
__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
- early_pgtable_alloc, 0);
+ early_pgtable_alloc, flags);
vma->addr = va_start;
vma->phys_addr = pa_start;
@@ -485,14 +535,18 @@ static void __init map_kernel(pgd_t *pgd)
*/
pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
- map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
+ /*
+ * Only rodata will be remapped with different permissions later on,
+ * all other segments are allowed to use contiguous mappings.
+ */
+ map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, 0);
map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
- &vmlinux_rodata);
+ &vmlinux_rodata, NO_CONT_MAPPINGS);
map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
- &vmlinux_inittext);
+ &vmlinux_inittext, 0);
map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
- &vmlinux_initdata);
- map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
+ &vmlinux_initdata, 0);
+ map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, 0);
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions
2017-03-09 8:25 ` [PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions Ard Biesheuvel
@ 2017-03-09 15:53 ` Mark Rutland
0 siblings, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2017-03-09 15:53 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 09, 2017 at 09:25:08AM +0100, Ard Biesheuvel wrote:
> Align the function prototype of alloc_init_pte() with its pmd and pud
> counterparts by replacing the pfn parameter with the equivalent physical
> address.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> arch/arm64/mm/mmu.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75e21c33caff..c3963c592ec3 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -107,7 +107,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
> }
>
> static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> - unsigned long end, unsigned long pfn,
> + unsigned long end, phys_addr_t phys,
> pgprot_t prot,
> phys_addr_t (*pgtable_alloc)(void))
> {
> @@ -128,8 +128,8 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> do {
> pte_t old_pte = *pte;
>
> - set_pte(pte, pfn_pte(pfn, prot));
> - pfn++;
> + set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
> + phys += PAGE_SIZE;
Minor nit: so as to align the strucutre of the loop with the other
functions, it'd be nice to have this on the final line of the loop body.
Either way:
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Mark.
>
> /*
> * After the PTE entry has been populated once, we
> @@ -182,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
> BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
> pmd_val(*pmd)));
> } else {
> - alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
> + alloc_init_pte(pmd, addr, next, phys,
> prot, pgtable_alloc);
>
> BUG_ON(pmd_val(old_pmd) != 0 &&
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments
2017-03-09 8:25 ` [PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments Ard Biesheuvel
@ 2017-03-09 17:51 ` Mark Rutland
0 siblings, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2017-03-09 17:51 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 09, 2017 at 09:25:09AM +0100, Ard Biesheuvel wrote:
> The debug_pagealloc facility manipulates kernel mappings in the linear
> region at page granularity to detect out of bounds or use-after-free
> accesses. Since the kernel segments are not allocated dynamically,
> there is no point in taking the debug_pagealloc_enabled flag into
> account for them, and we can use block mappings unconditionally.
>
> Note that this applies equally to the linear alias of text/rodata:
> we will never have dynamic allocations there given that the same
> memory is statically in use by the kernel image.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This makes sense to me, and I haven't found anything this breaks.
It may be worth noting that a similar reasoning already applies the the
FDT mapping, where we use create_mapping_noalloc(), and never mandate
page mappings.
Regardless:
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Mark.
> ---
> arch/arm64/mm/mmu.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c3963c592ec3..d3fecd20a136 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -328,8 +328,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
> return;
> }
>
> - __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
> - NULL, debug_pagealloc_enabled());
> + __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
>
> /* flush the TLBs after updating live kernel mappings */
> flush_tlb_kernel_range(virt, virt + size);
> @@ -381,7 +380,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
> */
> __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
> kernel_end - kernel_start, PAGE_KERNEL,
> - early_pgtable_alloc, debug_pagealloc_enabled());
> + early_pgtable_alloc, false);
> }
>
> void __init mark_linear_text_alias_ro(void)
> @@ -437,7 +436,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
> BUG_ON(!PAGE_ALIGNED(size));
>
> __create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
> - early_pgtable_alloc, debug_pagealloc_enabled());
> + early_pgtable_alloc, false);
>
> vma->addr = va_start;
> vma->phys_addr = pa_start;
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check
2017-03-09 8:25 ` [PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check Ard Biesheuvel
@ 2017-03-09 18:04 ` Mark Rutland
0 siblings, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2017-03-09 18:04 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 09, 2017 at 09:25:10AM +0100, Ard Biesheuvel wrote:
> A mapping with the contiguous bit cannot be safely manipulated while
> live, regardless of whether the bit changes between the old and new
> mapping. So take this into account when deciding whether the change
> is safe.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Strictly speaking, I think this is marginally more stringent than what
the ARM ARM describes. My reading is that the "Misprogramming of the
Contiguous bit" rules only apply when there are multiple valid entries,
and hence if you had a contiguous span with only a single valid entry
(and TLBs up-to-date), you could modify that in-place so long as you
followed the usual BBM rules.
However, I don't see us ever (deliberately) doing that, given it would
require more work, and there's no guarantee that the CPU would consider
the whole span as being mapped. It's also possible my reading of the ARM
ARM is flawed.
Thanks,
Mark.
> ---
> arch/arm64/mm/mmu.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index d3fecd20a136..a6d7a86dd2b8 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -103,7 +103,15 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
> */
> static const pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
>
> - return old == 0 || new == 0 || ((old ^ new) & ~mask) == 0;
> + /* creating or taking down mappings is always safe */
> + if (old == 0 || new == 0)
> + return true;
> +
> + /* live contiguous mappings may not be manipulated at all */
> + if ((old | new) & PTE_CONT)
> + return false;
> +
> + return ((old ^ new) & ~mask) == 0;
> }
>
> static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument
2017-03-09 8:25 ` [PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument Ard Biesheuvel
@ 2017-03-09 18:19 ` Mark Rutland
0 siblings, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2017-03-09 18:19 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 09, 2017 at 09:25:11AM +0100, Ard Biesheuvel wrote:
> In preparation of extending the policy for manipulating kernel mappings
> with whether or not contiguous hints may be used in the page tables,
> replace the bool 'page_mappings_only' with a flags field and a flag
> NO_BLOCK_MAPPINGS.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Thanks for attacking this.
I was going to comment on the name change, but I see that the next patch
introduces and uses NO_CONT_MAPPINGS, so that's fine by me.
> void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> unsigned long virt, phys_addr_t size,
> pgprot_t prot, bool page_mappings_only)
> {
> + int flags;
> +
> BUG_ON(mm == &init_mm);
>
> + if (page_mappings_only)
> + flags = NO_BLOCK_MAPPINGS;
> +
> __create_pgd_mapping(mm->pgd, phys, virt, size, prot,
> - pgd_pgtable_alloc, page_mappings_only);
> + pgd_pgtable_alloc, flags);
> }
Given we can't pass the flags in to create_pgd_mapping() without
exposing those more generally, this also looks fine.
FWIW:
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Mark.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate
2017-03-09 8:25 ` [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
@ 2017-03-09 19:33 ` Mark Rutland
2017-03-09 19:40 ` Ard Biesheuvel
0 siblings, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2017-03-09 19:33 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Mar 09, 2017 at 09:25:12AM +0100, Ard Biesheuvel wrote:
> +static inline u64 pte_cont_addr_end(u64 addr, u64 end)
> +{
> + return min((addr + CONT_PTE_SIZE) & CONT_PTE_MASK, end);
> +}
> +
> +static inline u64 pmd_cont_addr_end(u64 addr, u64 end)
> +{
> + return min((addr + CONT_PMD_SIZE) & CONT_PMD_MASK, end);
> +}
These differ structurally from the usual p??_addr_end() macros defined
in include/asm-generic/pgtable.h. I agree the asm-generic macros aren't
pretty, but it would be nice to be consistent.
I don't think the above handle a partial contiguous span at the end of
the address space (e.g. where end is initial PAGE_SIZE away from 2^64),
whereas the asm-generic form does, AFAICT.
Can we please use:
#define pte_cont_addr_end(addr, end) \
({ unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK; \
(__boundary - 1 < (end) - 1)? __boundary: (end); \
})
#define pmd_cont_addr_end(addr, end) \
({ unsigned long __boundary = ((addr) + CONT_PMD_SIZE) & CONT_PMD_MASK; \
(__boundary - 1 < (end) - 1)? __boundary: (end); \
})
... instead?
[...]
> +static void init_pte(pte_t *pte, unsigned long addr, unsigned long end,
> + phys_addr_t phys, pgprot_t prot)
> {
> + do {
> + pte_t old_pte = *pte;
> +
> + set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
> +
> + /*
> + * After the PTE entry has been populated once, we
> + * only allow updates to the permission attributes.
> + */
> + BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
> +
> + } while (pte++, addr += PAGE_SIZE, phys += PAGE_SIZE, addr != end);
> +}
> +
> +static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
> + unsigned long end, phys_addr_t phys,
> + pgprot_t prot,
> + phys_addr_t (*pgtable_alloc)(void),
> + int flags)
> +{
> + unsigned long next;
> pte_t *pte;
>
> BUG_ON(pmd_sect(*pmd));
> @@ -136,45 +156,30 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>
> pte = pte_set_fixmap_offset(pmd, addr);
> do {
> - pte_t old_pte = *pte;
> + pgprot_t __prot = prot;
>
> - set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
> - phys += PAGE_SIZE;
> + next = pte_cont_addr_end(addr, end);
>
> - /*
> - * After the PTE entry has been populated once, we
> - * only allow updates to the permission attributes.
> - */
> - BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
> + /* use a contiguous mapping if the range is suitably aligned */
> + if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> + (flags & NO_CONT_MAPPINGS) == 0)
> + __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + init_pte(pte, addr, next, phys, __prot);
> +
> + phys += next - addr;
> + pte += (next - addr) / PAGE_SIZE;
> + } while (addr = next, addr != end);
>
> pte_clear_fixmap();
> }
I think it would be preferable to pass the pmd down into
alloc_init_pte(), so that we don't have to mess with the pte in both
alloc_init_cont_pte() and alloc_init_pte().
Likewise for alloc_init_cont_pmd() and alloc_init_pmd(), regarding the
pmd.
I realise we'll redundantly map/unmap the PTE for each contiguous span,
but I doubt there's a case it has a noticeable impact.
With lots of memory we'll use blocks at a higher level, and for
debug_pagealloc we'll pass the whole pte down to init_pte() as we
currently do.
[...]
> + if (pud_none(*pud)) {
> + phys_addr_t pmd_phys;
> + BUG_ON(!pgtable_alloc);
> + pmd_phys = pgtable_alloc();
> + pmd = pmd_set_fixmap(pmd_phys);
> + __pud_populate(pud, pmd_phys, PUD_TYPE_TABLE);
> + pmd_clear_fixmap();
> + }
It looks like when the splitting logic was removed, we forgot to remove
the fixmapping here (and for the pmd_none() case). The __p?d_populate
functions don't touch the next level table, so there's no reason to
fixmap it.
Would you mind spinning a patch to rip those out?
[...]
> void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> unsigned long virt, phys_addr_t size,
> pgprot_t prot, bool page_mappings_only)
> {
> - int flags;
> + int flags = NO_CONT_MAPPINGS;
>
> BUG_ON(mm == &init_mm);
>
> if (page_mappings_only)
> - flags = NO_BLOCK_MAPPINGS;
> + flags |= NO_BLOCK_MAPPINGS;
Why is it never safe to use cont mappings here?
EFI's the only caller of this, and the only case I can see that we need
to avoid contiguous entries for are the runtime services data/code, due
to efi_set_mapping_permissions(). We map those with page_mappings_only
set.
I couldn't spot why we'd need to avoid cont entries otherwise.
What am I missing?
Thanks,
Mark.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate
2017-03-09 19:33 ` Mark Rutland
@ 2017-03-09 19:40 ` Ard Biesheuvel
0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2017-03-09 19:40 UTC (permalink / raw)
To: linux-arm-kernel
On 9 March 2017 at 20:33, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Mar 09, 2017 at 09:25:12AM +0100, Ard Biesheuvel wrote:
>> +static inline u64 pte_cont_addr_end(u64 addr, u64 end)
>> +{
>> + return min((addr + CONT_PTE_SIZE) & CONT_PTE_MASK, end);
>> +}
>> +
>> +static inline u64 pmd_cont_addr_end(u64 addr, u64 end)
>> +{
>> + return min((addr + CONT_PMD_SIZE) & CONT_PMD_MASK, end);
>> +}
>
> These differ structurally from the usual p??_addr_end() macros defined
> in include/asm-generic/pgtable.h. I agree the asm-generic macros aren't
> pretty, but it would be nice to be consistent.
>
> I don't think the above handle a partial contiguous span at the end of
> the address space (e.g. where end is initial PAGE_SIZE away from 2^64),
> whereas the asm-generic form does, AFAICT.
>
> Can we please use:
>
> #define pte_cont_addr_end(addr, end) \
> ({ unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK; \
> (__boundary - 1 < (end) - 1)? __boundary: (end); \
> })
>
> #define pmd_cont_addr_end(addr, end) \
> ({ unsigned long __boundary = ((addr) + CONT_PMD_SIZE) & CONT_PMD_MASK; \
> (__boundary - 1 < (end) - 1)? __boundary: (end); \
> })
>
> ... instead?
>
OK, so that's what the -1 is for. Either version is fine by me.
> [...]
>
>> +static void init_pte(pte_t *pte, unsigned long addr, unsigned long end,
>> + phys_addr_t phys, pgprot_t prot)
>> {
>> + do {
>> + pte_t old_pte = *pte;
>> +
>> + set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
>> +
>> + /*
>> + * After the PTE entry has been populated once, we
>> + * only allow updates to the permission attributes.
>> + */
>> + BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
>> +
>> + } while (pte++, addr += PAGE_SIZE, phys += PAGE_SIZE, addr != end);
>> +}
>> +
>> +static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
>> + unsigned long end, phys_addr_t phys,
>> + pgprot_t prot,
>> + phys_addr_t (*pgtable_alloc)(void),
>> + int flags)
>> +{
>> + unsigned long next;
>> pte_t *pte;
>>
>> BUG_ON(pmd_sect(*pmd));
>> @@ -136,45 +156,30 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>
>> pte = pte_set_fixmap_offset(pmd, addr);
>> do {
>> - pte_t old_pte = *pte;
>> + pgprot_t __prot = prot;
>>
>> - set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
>> - phys += PAGE_SIZE;
>> + next = pte_cont_addr_end(addr, end);
>>
>> - /*
>> - * After the PTE entry has been populated once, we
>> - * only allow updates to the permission attributes.
>> - */
>> - BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
>> + /* use a contiguous mapping if the range is suitably aligned */
>> + if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
>> + (flags & NO_CONT_MAPPINGS) == 0)
>> + __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>>
>> - } while (pte++, addr += PAGE_SIZE, addr != end);
>> + init_pte(pte, addr, next, phys, __prot);
>> +
>> + phys += next - addr;
>> + pte += (next - addr) / PAGE_SIZE;
>> + } while (addr = next, addr != end);
>>
>> pte_clear_fixmap();
>> }
>
> I think it would be preferable to pass the pmd down into
> alloc_init_pte(), so that we don't have to mess with the pte in both
> alloc_init_cont_pte() and alloc_init_pte().
>
> Likewise for alloc_init_cont_pmd() and alloc_init_pmd(), regarding the
> pmd.
>
> I realise we'll redundantly map/unmap the PTE for each contiguous span,
> but I doubt there's a case it has a noticeable impact.
>
OK
> With lots of memory we'll use blocks at a higher level, and for
> debug_pagealloc we'll pass the whole pte down to init_pte() as we
> currently do.
>
> [...]
>
>> + if (pud_none(*pud)) {
>> + phys_addr_t pmd_phys;
>> + BUG_ON(!pgtable_alloc);
>> + pmd_phys = pgtable_alloc();
>> + pmd = pmd_set_fixmap(pmd_phys);
>> + __pud_populate(pud, pmd_phys, PUD_TYPE_TABLE);
>> + pmd_clear_fixmap();
>> + }
>
> It looks like when the splitting logic was removed, we forgot to remove
> the fixmapping here (and for the pmd_none() case). The __p?d_populate
> functions don't touch the next level table, so there's no reason to
> fixmap it.
>
> Would you mind spinning a patch to rip those out?
>
Ah right, pmd is not even referenced in the __pud_populate invocation.
Yes, I will add a patch before this one to remove that.
> [...]
>
>> void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>> unsigned long virt, phys_addr_t size,
>> pgprot_t prot, bool page_mappings_only)
>> {
>> - int flags;
>> + int flags = NO_CONT_MAPPINGS;
>>
>> BUG_ON(mm == &init_mm);
>>
>> if (page_mappings_only)
>> - flags = NO_BLOCK_MAPPINGS;
>> + flags |= NO_BLOCK_MAPPINGS;
>
> Why is it never safe to use cont mappings here?
>
> EFI's the only caller of this, and the only case I can see that we need
> to avoid contiguous entries for are the runtime services data/code, due
> to efi_set_mapping_permissions(). We map those with page_mappings_only
> set.
>
> I couldn't spot why we'd need to avoid cont entries otherwise.
>
> What am I missing?
>
Nothing. It is erring on the side of caution, really, since there is
no performance concern here.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-03-09 19:40 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-09 8:25 [PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 01/10] arm: kvm: move kvm_vgic_global_state out of .text section Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 02/10] arm64: mmu: move TLB maintenance from callers to create_mapping_late() Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 03/10] arm64: alternatives: apply boot time fixups via the linear mapping Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 04/10] arm64: mmu: map .text as read-only from the outset Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 05/10] arm64: mmu: apply strict permissions to .init.text and .init.data Ard Biesheuvel
2017-03-09 8:25 ` [PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions Ard Biesheuvel
2017-03-09 15:53 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments Ard Biesheuvel
2017-03-09 17:51 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check Ard Biesheuvel
2017-03-09 18:04 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument Ard Biesheuvel
2017-03-09 18:19 ` Mark Rutland
2017-03-09 8:25 ` [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
2017-03-09 19:33 ` Mark Rutland
2017-03-09 19:40 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).