linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2
@ 2023-11-29 11:15 Ard Biesheuvel
  2023-11-29 11:15 ` [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime Ard Biesheuvel
                   ` (41 more replies)
  0 siblings, 42 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

At the request of Catalin, this series was split off from my LPA2 series
[0] in order to make the changes a bit more manageable.

This series reorganizes the kernel VA space, and refactors/replaces the
early mapping code so that:
- everything is done only once, in the appropriate order;
- everything is done with the MMU and caches enabled (*)
- everything is done from C code (notably, 100s of lines of
  incomprehensible asm code are removed from head.S).

(*) the initial ID map will be populated with the MMU and caches
disabled if that is how we entered from the bootloader.

This is important for LPA2, but also for other future extensions to the
page table format, as managing this entirely in early asm code as we do
today would become intractable. This applies also to things such as
copying the KAsan shadow or the fixmap from the early page tables into
the permanent ones - this is all being removed by this series. This
approach also ensures that we never execute from writable memory, or
parse the DT (which is external input) while the text/rodata segments
are mapped writable; this is an important general hardening principle,
but also a prerequisite for adding WXN support (which is implemented in
the second half of the series that has been omitted from this drop)

Another notable difference implemented by this series is the fact that
the permanent ID map always covers 48 bits of VA space, and is no longer
tied to the size of the kernel VA space. This removes awkward logic to
add a translation level above PGD level, and will be beneficial for
other reasons too (it permits future changes in the EFI logic to get rid
of SetVirtualAddressMap() entirely)

Changes since v5 [1]:
- add helpers to deal with CPU feature overrides, rather than applying
  the value and mask directly - this is necessary because an override
  may be invalid for the field that we care about or for another field
  that shares the same CPUID system register
- add missing patch to make __cpu_replace_ttbr1() out of line and insert
  it in the correct place in the series to ensure bisectability
- incorporate maz's strict type changes (and more) into the prel64
  handling in the early idreg override code

Changes since v4:
- merge a couple of followup tweaks for issues that were reported while
  the v4 was briefly queued up and pulled into -next
- rebase onto v6.7-rc1
- omit LVA/LPA2 and WXN related changes

[0] https://lore.kernel.org/all/20230912141549.278777-63-ardb@google.com/
[1] https://lore.kernel.org/all/20231124101840.944737-41-ardb@google.com/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kees Cook <keescook@chromium.org>

Ard Biesheuvel (41):
  arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  arm64: mm: Take potential load offset into account when KASLR is off
  arm64: mm: get rid of kimage_vaddr global variable
  arm64: mm: Move PCI I/O emulation region above the vmemmap region
  arm64: mm: Move fixmap region above vmemmap region
  arm64: ptdump: Allow all region boundaries to be defined at boot time
  arm64: ptdump: Discover start of vmemmap region at runtime
  arm64: vmemmap: Avoid base2 order of struct page size to dimension
    region
  arm64: mm: Reclaim unused vmemmap region for vmalloc use
  arm64: kaslr: Adjust randomization range dynamically
  arm64: kernel: Manage absolute relocations in code built under pi/
  arm64: kernel: Don't rely on objcopy to make code under pi/ __init
  arm64: head: move relocation handling to C code
  arm64: idreg-override: Omit non-NULL checks for override pointer
  arm64: idreg-override: Prepare for place relative reloc patching
  arm64: idreg-override: Avoid parameq() and parameqn()
  arm64: idreg-override: avoid strlen() to check for empty strings
  arm64: idreg-override: Avoid sprintf() for simple string concatenation
  arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit
  arm64/kernel: Move 'nokaslr' parsing out of early idreg code
  arm64: idreg-override: Move to early mini C runtime
  arm64: kernel: Remove early fdt remap code
  arm64: head: Clear BSS and the kernel page tables in one go
  arm64: Move feature overrides into the BSS section
  arm64: head: Run feature override detection before mapping the kernel
  arm64: head: move dynamic shadow call stack patching into early C
    runtime
  arm64: cpufeature: Add helper to test for CPU feature overrides
  arm64: kaslr: Use feature override instead of parsing the cmdline
    again
  arm64: idreg-override: Create a pseudo feature for rodata=off
  arm64: Add helpers to probe local CPU for PAC and BTI support
  arm64: head: allocate more pages for the kernel mapping
  arm64: head: move memstart_offset_seed handling to C code
  arm64: mm: Make kaslr_requires_kpti() a static inline
  arm64: mmu: Make __cpu_replace_ttbr1() out of line
  arm64: head: Move early kernel mapping routines into C code
  arm64: mm: Use 48-bit virtual addressing for the permanent ID map
  arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels
  arm64: kernel: Create initial ID map from C code
  arm64: mm: avoid fixmap for early swapper_pg_dir updates
  arm64: mm: omit redundant remap of kernel image
  arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"

 arch/arm64/include/asm/archrandom.h         |   2 -
 arch/arm64/include/asm/assembler.h          |  14 -
 arch/arm64/include/asm/cpufeature.h         |  77 ++++
 arch/arm64/include/asm/fixmap.h             |   1 -
 arch/arm64/include/asm/kasan.h              |   2 -
 arch/arm64/include/asm/kernel-pgtable.h     | 128 +++---
 arch/arm64/include/asm/memory.h             |  20 +-
 arch/arm64/include/asm/mmu.h                |  40 +-
 arch/arm64/include/asm/mmu_context.h        |  53 +--
 arch/arm64/include/asm/pgtable.h            |  10 +-
 arch/arm64/include/asm/scs.h                |  36 +-
 arch/arm64/include/asm/setup.h              |   3 -
 arch/arm64/kernel/Makefile                  |   7 +-
 arch/arm64/kernel/cpufeature.c              |  65 +--
 arch/arm64/kernel/head.S                    | 428 ++------------------
 arch/arm64/kernel/image-vars.h              |  33 ++
 arch/arm64/kernel/kaslr.c                   |  11 +-
 arch/arm64/kernel/module.c                  |   2 +-
 arch/arm64/kernel/pi/Makefile               |  28 +-
 arch/arm64/kernel/{ => pi}/idreg-override.c | 182 +++++----
 arch/arm64/kernel/pi/kaslr_early.c          |  78 +---
 arch/arm64/kernel/pi/map_kernel.c           | 186 +++++++++
 arch/arm64/kernel/pi/map_range.c            | 100 +++++
 arch/arm64/kernel/{ => pi}/patch-scs.c      |  36 +-
 arch/arm64/kernel/pi/pi.h                   |  36 ++
 arch/arm64/kernel/pi/relacheck.c            | 130 ++++++
 arch/arm64/kernel/pi/relocate.c             |  64 +++
 arch/arm64/kernel/setup.c                   |  22 -
 arch/arm64/kernel/vmlinux.lds.S             |  17 +-
 arch/arm64/kvm/mmu.c                        |  15 +-
 arch/arm64/mm/fixmap.c                      |  34 --
 arch/arm64/mm/kasan_init.c                  |  19 +-
 arch/arm64/mm/mmu.c                         | 167 ++++----
 arch/arm64/mm/proc.S                        |  13 +-
 arch/arm64/mm/ptdump.c                      |  56 ++-
 35 files changed, 1070 insertions(+), 1045 deletions(-)
 rename arch/arm64/kernel/{ => pi}/idreg-override.c (59%)
 create mode 100644 arch/arm64/kernel/pi/map_kernel.c
 create mode 100644 arch/arm64/kernel/pi/map_range.c
 rename arch/arm64/kernel/{ => pi}/patch-scs.c (89%)
 create mode 100644 arch/arm64/kernel/pi/pi.h
 create mode 100644 arch/arm64/kernel/pi/relacheck.c
 create mode 100644 arch/arm64/kernel/pi/relocate.c

-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
@ 2023-11-29 11:15 ` Ard Biesheuvel
  2023-11-30  4:44   ` Anshuman Khandual
  2023-11-29 11:15 ` [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off Ard Biesheuvel
                   ` (40 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In subsequent patches, mark portions of the early C code will be marked
as __init.  Unfortunarely, __init implies __latent_entropy, and this
would result in the early C code being instrumented in an unsafe manner.

Disable the latent entropy plugin for the early C code.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/pi/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 4c0ea3cd4ea4..c844a0546d7f 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -3,6 +3,7 @@
 
 KBUILD_CFLAGS	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
 		   -Os -DDISABLE_BRANCH_PROFILING $(DISABLE_STACKLEAK_PLUGIN) \
+		   $(DISABLE_LATENT_ENTROPY_PLUGIN) \
 		   $(call cc-option,-mbranch-protection=none) \
 		   -I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
 		   -include $(srctree)/include/linux/hidden.h \
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
  2023-11-29 11:15 ` [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime Ard Biesheuvel
@ 2023-11-29 11:15 ` Ard Biesheuvel
  2023-11-30  5:23   ` Anshuman Khandual
  2023-12-04 14:12   ` Mark Rutland
  2023-11-29 11:15 ` [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
                   ` (39 subsequent siblings)
  41 siblings, 2 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is
disabled, and this permits the loader (i.e., EFI) to place the kernel
anywhere in physical memory as long as the base address is 64k aligned.

This means that the 'KASLR' case described in the header that defines
the size of the statically allocated page tables could take effect even
when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 27 +++++---------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 85d26143faa5..83ddb14b95a5 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -37,27 +37,12 @@
 
 
 /*
- * If KASLR is enabled, then an offset K is added to the kernel address
- * space. The bottom 21 bits of this offset are zero to guarantee 2MB
- * alignment for PA and VA.
- *
- * For each pagetable level of the swapper, we know that the shift will
- * be larger than 21 (for the 4KB granule case we use section maps thus
- * the smallest shift is actually 30) thus there is the possibility that
- * KASLR can increase the number of pagetable entries by 1, so we make
- * room for this extra entry.
- *
- * Note KASLR cannot increase the number of required entries for a level
- * by more than one because it increments both the virtual start and end
- * addresses equally (the extra entry comes from the case where the end
- * address is just pushed over a boundary and the start address isn't).
+ * A relocatable kernel may execute from an address that differs from the one at
+ * which it was linked. In the worst case, its runtime placement may intersect
+ * with two adjacent PGDIR entries, which means that an additional page table
+ * may be needed at each subordinate level.
  */
-
-#ifdef CONFIG_RANDOMIZE_BASE
-#define EARLY_KASLR	(1)
-#else
-#define EARLY_KASLR	(0)
-#endif
+#define EXTRA_PAGE	__is_defined(CONFIG_RELOCATABLE)
 
 #define SPAN_NR_ENTRIES(vstart, vend, shift) \
 	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
@@ -83,7 +68,7 @@
 			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
 			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
 			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR))
+#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
 #if VA_BITS < 48
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
  2023-11-29 11:15 ` [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime Ard Biesheuvel
  2023-11-29 11:15 ` [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off Ard Biesheuvel
@ 2023-11-29 11:15 ` Ard Biesheuvel
  2023-11-30  5:38   ` Anshuman Khandual
  2023-11-29 11:16 ` [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region Ard Biesheuvel
                   ` (38 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We store the address of _text in kimage_vaddr, but since commit
09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings
decision"), we no longer reference this variable from modules so we no
longer need to export it.

In fact, we don't need it at all so let's just get rid of it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 6 ++----
 arch/arm64/kernel/head.S        | 2 +-
 arch/arm64/mm/mmu.c             | 3 ---
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index fde4186cc387..b8d726f951ae 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -182,6 +182,7 @@
 #include <linux/types.h>
 #include <asm/boot.h>
 #include <asm/bug.h>
+#include <asm/sections.h>
 
 #if VA_BITS > 48
 extern u64			vabits_actual;
@@ -193,15 +194,12 @@ extern s64			memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
 
-/* the virtual base of the kernel image */
-extern u64			kimage_vaddr;
-
 /* the offset between the kernel virtual and physical mappings */
 extern u64			kimage_voffset;
 
 static inline unsigned long kaslr_offset(void)
 {
-	return kimage_vaddr - KIMAGE_VADDR;
+	return (u64)&_text - KIMAGE_VADDR;
 }
 
 #ifdef CONFIG_RANDOMIZE_BASE
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 7b236994f0e1..cab7f91949d8 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -482,7 +482,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
 
-	ldr_l	x4, kimage_vaddr		// Save the offset between
+	adrp	x4, _text			// Save the offset between
 	sub	x4, x4, x0			// the kernel virtual and
 	str_l	x4, kimage_voffset, x5		// physical mappings
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 15f6347d23b6..03c73e9197ac 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -52,9 +52,6 @@ u64 vabits_actual __ro_after_init = VA_BITS_MIN;
 EXPORT_SYMBOL(vabits_actual);
 #endif
 
-u64 kimage_vaddr __ro_after_init = (u64)&_text;
-EXPORT_SYMBOL(kimage_vaddr);
-
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2023-11-29 11:15 ` [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-30  7:59   ` Anshuman Khandual
  2023-12-11 13:57   ` Mark Rutland
  2023-11-29 11:16 ` [PATCH v6 05/41] arm64: mm: Move fixmap region above " Ard Biesheuvel
                   ` (37 subsequent siblings)
  41 siblings, 2 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Move the PCI I/O region above the vmemmap region in the kernel's VA
space. This will permit us to reclaim the lower part of the vmemmap
region for vmalloc/vmap allocations when running a 52-bit VA capable
build on a 48-bit VA capable system.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 4 ++--
 arch/arm64/mm/ptdump.c          | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index b8d726f951ae..99caeff78e1a 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -49,8 +49,8 @@
 #define MODULES_VSIZE		(SZ_2G)
 #define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
 #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
-#define PCI_IO_END		(VMEMMAP_START - SZ_8M)
-#define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
+#define PCI_IO_START		(VMEMMAP_END + SZ_8M)
+#define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
 #define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)
 
 #if VA_BITS > 48
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index e305b6593c4e..d1df56d44f8a 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
 	{ VMALLOC_END,			"vmalloc() end" },
 	{ FIXADDR_TOT_START,		"Fixmap start" },
 	{ FIXADDR_TOP,			"Fixmap end" },
-	{ PCI_IO_START,			"PCI I/O start" },
-	{ PCI_IO_END,			"PCI I/O end" },
 	{ VMEMMAP_START,		"vmemmap start" },
 	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
+	{ PCI_IO_START,			"PCI I/O start" },
+	{ PCI_IO_END,			"PCI I/O end" },
 	{ -1,				NULL },
 };
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 05/41] arm64: mm: Move fixmap region above vmemmap region
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-12-11 14:23   ` Mark Rutland
  2023-11-29 11:16 ` [PATCH v6 06/41] arm64: ptdump: Allow all region boundaries to be defined at boot time Ard Biesheuvel
                   ` (36 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Move the fixmap region above the vmemmap region, so that the start of
the vmemmap delineates the end of the region available for vmalloc and
vmap allocations and the randomized placement of the kernel and modules.

In a subsequent patch, we will take advantage of this to reclaim most of
the vmemmap area when running a 52-bit VA capable build with 52-bit
virtual addressing disabled at runtime.

Note that the existing guard region of 256 MiB covers the fixmap and PCI
I/O regions as well, so we can reduce it 8 MiB, which is what we use in
other places too.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h  | 2 +-
 arch/arm64/include/asm/pgtable.h | 2 +-
 arch/arm64/mm/ptdump.c           | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 99caeff78e1a..2745bed8ae5b 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,7 +51,7 @@
 #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
 #define PCI_IO_START		(VMEMMAP_END + SZ_8M)
 #define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
-#define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)
+#define FIXADDR_TOP		(-UL(SZ_8M))
 
 #if VA_BITS > 48
 #define VA_BITS_MIN		(48)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b19a8aee684c..8d30e2787b1f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -22,7 +22,7 @@
  *	and fixed mappings
  */
 #define VMALLOC_START		(MODULES_END)
-#define VMALLOC_END		(VMEMMAP_START - SZ_256M)
+#define VMALLOC_END		(VMEMMAP_START - SZ_8M)
 
 #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
 
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index d1df56d44f8a..3958b008f908 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -45,12 +45,12 @@ static struct addr_marker address_markers[] = {
 	{ MODULES_END,			"Modules end" },
 	{ VMALLOC_START,		"vmalloc() area" },
 	{ VMALLOC_END,			"vmalloc() end" },
-	{ FIXADDR_TOT_START,		"Fixmap start" },
-	{ FIXADDR_TOP,			"Fixmap end" },
 	{ VMEMMAP_START,		"vmemmap start" },
 	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
 	{ PCI_IO_START,			"PCI I/O start" },
 	{ PCI_IO_END,			"PCI I/O end" },
+	{ FIXADDR_TOT_START,		"Fixmap start" },
+	{ FIXADDR_TOP,			"Fixmap end" },
 	{ -1,				NULL },
 };
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 06/41] arm64: ptdump: Allow all region boundaries to be defined at boot time
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 05/41] arm64: mm: Move fixmap region above " Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-12-11 14:15   ` Mark Rutland
  2023-11-29 11:16 ` [PATCH v6 07/41] arm64: ptdump: Discover start of vmemmap region at runtime Ard Biesheuvel
                   ` (35 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Rework the way the address_markers array is populated so that we can
tolerate values that are not compile time constants generally, rather
than keeping track manually of the array indexes in question, and poking
new values into them manually. This will be needed for VMALLOC_END,
which will cease to be a compile time constant after a subsequent patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/ptdump.c | 54 ++++++++------------
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 3958b008f908..bfc307890344 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -26,34 +26,6 @@
 #include <asm/ptdump.h>
 
 
-enum address_markers_idx {
-	PAGE_OFFSET_NR = 0,
-	PAGE_END_NR,
-#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
-	KASAN_START_NR,
-#endif
-};
-
-static struct addr_marker address_markers[] = {
-	{ PAGE_OFFSET,			"Linear Mapping start" },
-	{ 0 /* PAGE_END */,		"Linear Mapping end" },
-#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
-	{ 0 /* KASAN_SHADOW_START */,	"Kasan shadow start" },
-	{ KASAN_SHADOW_END,		"Kasan shadow end" },
-#endif
-	{ MODULES_VADDR,		"Modules start" },
-	{ MODULES_END,			"Modules end" },
-	{ VMALLOC_START,		"vmalloc() area" },
-	{ VMALLOC_END,			"vmalloc() end" },
-	{ VMEMMAP_START,		"vmemmap start" },
-	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
-	{ PCI_IO_START,			"PCI I/O start" },
-	{ PCI_IO_END,			"PCI I/O end" },
-	{ FIXADDR_TOT_START,		"Fixmap start" },
-	{ FIXADDR_TOP,			"Fixmap end" },
-	{ -1,				NULL },
-};
-
 #define pt_dump_seq_printf(m, fmt, args...)	\
 ({						\
 	if (m)					\
@@ -339,9 +311,8 @@ static void __init ptdump_initialize(void)
 				pg_level[i].mask |= pg_level[i].bits[j].mask;
 }
 
-static struct ptdump_info kernel_ptdump_info = {
+static struct ptdump_info kernel_ptdump_info __ro_after_init = {
 	.mm		= &init_mm,
-	.markers	= address_markers,
 	.base_addr	= PAGE_OFFSET,
 };
 
@@ -375,10 +346,29 @@ void ptdump_check_wx(void)
 
 static int __init ptdump_init(void)
 {
-	address_markers[PAGE_END_NR].start_address = PAGE_END;
+	struct addr_marker m[] = {
+		{ PAGE_OFFSET,			"Linear Mapping start" },
+		{ PAGE_END,			"Linear Mapping end" },
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
-	address_markers[KASAN_START_NR].start_address = KASAN_SHADOW_START;
+		{ KASAN_SHADOW_START,		"Kasan shadow start" },
+		{ KASAN_SHADOW_END,		"Kasan shadow end" },
 #endif
+		{ MODULES_VADDR,		"Modules start" },
+		{ MODULES_END,			"Modules end" },
+		{ VMALLOC_START,		"vmalloc() area" },
+		{ VMALLOC_END,			"vmalloc() end" },
+		{ VMEMMAP_START,		"vmemmap start" },
+		{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
+		{ PCI_IO_START,			"PCI I/O start" },
+		{ PCI_IO_END,			"PCI I/O end" },
+		{ FIXADDR_TOT_START,		"Fixmap start" },
+		{ FIXADDR_TOP,			"Fixmap end" },
+		{ -1,				NULL },
+	};
+	static struct addr_marker address_markers[ARRAY_SIZE(m)] __ro_after_init;
+
+	kernel_ptdump_info.markers = memcpy(address_markers, m, sizeof(m));
+
 	ptdump_initialize();
 	ptdump_debugfs_register(&kernel_ptdump_info, "kernel_page_tables");
 	return 0;
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 07/41] arm64: ptdump: Discover start of vmemmap region at runtime
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 06/41] arm64: ptdump: Allow all region boundaries to be defined at boot time Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region Ard Biesheuvel
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will soon reclaim the part of the vmemmap region that covers VA space
that is not addressable by the hardware. To avoid confusion, ensure that
the 'vmemmap start' marker points at the start of the region that is
actually being used for the struct page array, rather than the start of
the region we set aside for it at build time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/ptdump.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index bfc307890344..f3fdbf3bb6ad 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -346,6 +346,8 @@ void ptdump_check_wx(void)
 
 static int __init ptdump_init(void)
 {
+	u64 page_offset = _PAGE_OFFSET(vabits_actual);
+	u64 vmemmap_start = (u64)virt_to_page((void *)page_offset);
 	struct addr_marker m[] = {
 		{ PAGE_OFFSET,			"Linear Mapping start" },
 		{ PAGE_END,			"Linear Mapping end" },
@@ -357,7 +359,7 @@ static int __init ptdump_init(void)
 		{ MODULES_END,			"Modules end" },
 		{ VMALLOC_START,		"vmalloc() area" },
 		{ VMALLOC_END,			"vmalloc() end" },
-		{ VMEMMAP_START,		"vmemmap start" },
+		{ vmemmap_start,		"vmemmap start" },
 		{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
 		{ PCI_IO_START,			"PCI I/O start" },
 		{ PCI_IO_END,			"PCI I/O end" },
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 07/41] arm64: ptdump: Discover start of vmemmap region at runtime Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-12-11 14:35   ` Mark Rutland
  2023-11-29 11:16 ` [PATCH v6 09/41] arm64: mm: Reclaim unused vmemmap region for vmalloc use Ard Biesheuvel
                   ` (33 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The placement and size of the vmemmap region in the kernel virtual
address space is currently derived from the base2 order of the size of a
struct page. This makes for nicely aligned constants with lots of
leading 0xf and trailing 0x0 digits, but given that the actual struct
pages are indexed as an ordinary array, this resulting region is
severely overdimensioned when the size of a struct page is just over a
power of 2.

This doesn't matter today, but once we enable 52-bit virtual addressing
for 4k pages configurations, the vmemmap region may take up almost half
of the upper VA region with the current struct page upper bound at 64
bytes. And once we enable KMSAN or other features that push the size of
a struct page over 64 bytes, we will run out of VMALLOC space entirely.

So instead, let's derive the region size from the actual size of a
struct page, and place the entire region 1 GB from the top of the VA
space, where it still doesn't share any lower level translation table
entries with the fixmap.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 2745bed8ae5b..b49575a92afc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -30,8 +30,8 @@
  * keep a constant PAGE_OFFSET and "fallback" to using the higher end
  * of the VMEMMAP where 52-bit support is not available in hardware.
  */
-#define VMEMMAP_SHIFT	(PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
-#define VMEMMAP_SIZE	((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
+#define VMEMMAP_RANGE	(_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
+#define VMEMMAP_SIZE	((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
 
 /*
  * PAGE_OFFSET - the virtual address of the start of the linear map, at the
@@ -47,8 +47,8 @@
 #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
 #define MODULES_VADDR		(_PAGE_END(VA_BITS_MIN))
 #define MODULES_VSIZE		(SZ_2G)
-#define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
-#define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
+#define VMEMMAP_START		(VMEMMAP_END - VMEMMAP_SIZE)
+#define VMEMMAP_END		(-UL(SZ_1G))
 #define PCI_IO_START		(VMEMMAP_END + SZ_8M)
 #define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
 #define FIXADDR_TOP		(-UL(SZ_8M))
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 09/41] arm64: mm: Reclaim unused vmemmap region for vmalloc use
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 10/41] arm64: kaslr: Adjust randomization range dynamically Ard Biesheuvel
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The vmemmap array is statically sized based on the maximum supported
size of the virtual address space, but it is located inside the upper VA
region, which is statically sized based on the *minimum* supported size
of the VA space. This doesn't matter much when using 64k pages, which is
the only configuration that currently supports 52-bit virtual
addressing.

However, upcoming LPA2 support will change this picture somewhat, as in
that case, the vmemmap array will take up more than 25% of the upper VA
region when using 4k pages. Given that most of this space is never used
when running on a system that does not support 52-bit virtual
addressing, let's reclaim the unused vmemmap area in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgtable.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 8d30e2787b1f..5b5bedfa320e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -18,11 +18,15 @@
  * VMALLOC range.
  *
  * VMALLOC_START: beginning of the kernel vmalloc space
- * VMALLOC_END: extends to the available space below vmemmap, PCI I/O space
- *	and fixed mappings
+ * VMALLOC_END: extends to the available space below vmemmap
  */
 #define VMALLOC_START		(MODULES_END)
+#if VA_BITS == VA_BITS_MIN
 #define VMALLOC_END		(VMEMMAP_START - SZ_8M)
+#else
+#define VMEMMAP_UNUSED_NPAGES	((_PAGE_OFFSET(vabits_actual) - PAGE_OFFSET) >> PAGE_SHIFT)
+#define VMALLOC_END		(VMEMMAP_START + VMEMMAP_UNUSED_NPAGES * sizeof(struct page) - SZ_8M)
+#endif
 
 #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 10/41] arm64: kaslr: Adjust randomization range dynamically
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 09/41] arm64: mm: Reclaim unused vmemmap region for vmalloc use Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Currently, we base the KASLR randomization range on a rough estimate of
the available space in the upper VA region: the lower 1/4th has the
module region and the upper 1/4th has the fixmap, vmemmap and PCI I/O
ranges, and so we pick a random location in the remaining space in the
middle.

Once we enable support for 5-level paging with 4k pages, this no longer
works: the vmemmap region, being dimensioned to cover a 52-bit linear
region, takes up so much space in the upper VA region (the size of which
is based on a 48-bit VA space for compatibility with non-LVA hardware)
that the region above the vmalloc region takes up more than a quarter of
the available space.

So instead of a heuristic, let's derive the randomization range from the
actual boundaries of the vmalloc region.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/image-vars.h     |  2 ++
 arch/arm64/kernel/pi/kaslr_early.c | 11 ++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 5e4dc72ab1bd..e931ce078a00 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -36,6 +36,8 @@ PROVIDE(__pi___memcpy			= __pi_memcpy);
 PROVIDE(__pi___memmove			= __pi_memmove);
 PROVIDE(__pi___memset			= __pi_memset);
 
+PROVIDE(__pi_vabits_actual		= vabits_actual);
+
 #ifdef CONFIG_KVM
 
 /*
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index 17bff6e399e4..b9e0bb4bc6a9 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -14,6 +14,7 @@
 
 #include <asm/archrandom.h>
 #include <asm/memory.h>
+#include <asm/pgtable.h>
 
 /* taken from lib/string.c */
 static char *__strstr(const char *s1, const char *s2)
@@ -87,7 +88,7 @@ static u64 get_kaslr_seed(void *fdt)
 
 asmlinkage u64 kaslr_early_init(void *fdt)
 {
-	u64 seed;
+	u64 seed, range;
 
 	if (is_kaslr_disabled_cmdline(fdt))
 		return 0;
@@ -102,9 +103,9 @@ asmlinkage u64 kaslr_early_init(void *fdt)
 	/*
 	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
 	 * kernel image offset from the seed. Let's place the kernel in the
-	 * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
-	 * the lower and upper quarters to avoid colliding with other
-	 * allocations.
+	 * 'middle' half of the VMALLOC area, and stay clear of the lower and
+	 * upper quarters to avoid colliding with other allocations.
 	 */
-	return BIT(VA_BITS_MIN - 3) + (seed & GENMASK(VA_BITS_MIN - 3, 0));
+	range = (VMALLOC_END - KIMAGE_VADDR) / 2;
+	return range / 2 + (((__uint128_t)range * seed) >> 64);
 }
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 10/41] arm64: kaslr: Adjust randomization range dynamically Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 12:27   ` Marc Zyngier
  2023-11-29 11:16 ` [PATCH v6 12/41] arm64: kernel: Don't rely on objcopy to make code under pi/ __init Ard Biesheuvel
                   ` (30 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The mini C runtime runs before relocations are processed, and so it
cannot rely on statically initialized pointer variables.

Add a check to ensure that such code does not get introduced by
accident, by going over the relocations in each object, identifying the
ones that operate on data sections that are part of the executable
image, and raising an error if any relocations of type R_AARCH64_ABS64
exist. Note that such relocations are permitted in other places (e.g.,
debug sections) and will never occur in compiler generated code sections
when using the small code model, so only check sections that have
SHF_ALLOC set and SHF_EXECINSTR cleared.

To accommodate cases where statically initialized symbol references are
unavoidable, introduce a special case for ELF input data sections that
have ".rodata.prel64" in their names, and in these cases, instead of
rejecting any encountered ABS64 relocations, convert them into PREL64
relocations, which don't require any runtime fixups. Note that the code
in question must still be modified to deal with this, as it needs to
convert the 64-bit signed offsets into absolute addresses before use.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/pi/Makefile    |   9 +-
 arch/arm64/kernel/pi/pi.h        |  18 +++
 arch/arm64/kernel/pi/relacheck.c | 130 ++++++++++++++++++++
 3 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index c844a0546d7f..bc32a431fe35 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -22,11 +22,16 @@ KCSAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCOV_INSTRUMENT	:= n
 
+hostprogs	:= relacheck
+
+quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
+      cmd_piobjcopy = $(cmd_objcopy) && $(obj)/relacheck $(@) $(<)
+
 $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
 			       --remove-section=.note.gnu.property \
 			       --prefix-alloc-sections=.init
-$(obj)/%.pi.o: $(obj)/%.o FORCE
-	$(call if_changed,objcopy)
+$(obj)/%.pi.o: $(obj)/%.o $(obj)/relacheck FORCE
+	$(call if_changed,piobjcopy)
 
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
new file mode 100644
index 000000000000..7c2d9bbf0ff9
--- /dev/null
+++ b/arch/arm64/kernel/pi/pi.h
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+#define __prel64_initconst	__section(".init.rodata.prel64")
+
+#define PREL64(type, name)	union { type *name; prel64_t name ## _prel; }
+
+#define prel64_pointer(__d)	(typeof(__d))prel64_to_pointer(&__d##_prel)
+
+typedef volatile signed long prel64_t;
+
+static inline void *prel64_to_pointer(const prel64_t *offset)
+{
+	if (!*offset)
+		return NULL;
+	return (void *)offset + *offset;
+}
diff --git a/arch/arm64/kernel/pi/relacheck.c b/arch/arm64/kernel/pi/relacheck.c
new file mode 100644
index 000000000000..b0cd4d0d275b
--- /dev/null
+++ b/arch/arm64/kernel/pi/relacheck.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 - Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ */
+
+#include <elf.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define HOST_ORDER ELFDATA2LSB
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define HOST_ORDER ELFDATA2MSB
+#endif
+
+static Elf64_Ehdr *ehdr;
+static Elf64_Shdr *shdr;
+static const char *strtab;
+static bool swap;
+
+static uint64_t swab_elfxword(uint64_t val)
+{
+	return swap ? __builtin_bswap64(val) : val;
+}
+
+static uint32_t swab_elfword(uint32_t val)
+{
+	return swap ? __builtin_bswap32(val) : val;
+}
+
+static uint16_t swab_elfhword(uint16_t val)
+{
+	return swap ? __builtin_bswap16(val) : val;
+}
+
+int main(int argc, char *argv[])
+{
+	struct stat stat;
+	int fd, ret;
+
+	if (argc < 3) {
+		fprintf(stderr, "file arguments missing\n");
+		exit(EXIT_FAILURE);
+	}
+
+	fd = open(argv[1], O_RDWR);
+	if (fd < 0) {
+		fprintf(stderr, "failed to open %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		fprintf(stderr, "failed to stat() %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	ehdr = mmap(0, stat.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	if (ehdr == MAP_FAILED) {
+		fprintf(stderr, "failed to mmap() %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	swap = ehdr->e_ident[EI_DATA] != HOST_ORDER;
+	shdr = (void *)ehdr + swab_elfxword(ehdr->e_shoff);
+	strtab = (void *)ehdr +
+		 swab_elfxword(shdr[swab_elfhword(ehdr->e_shstrndx)].sh_offset);
+
+	for (int i = 0; i < swab_elfhword(ehdr->e_shnum); i++) {
+		unsigned long info, flags;
+		bool prel64 = false;
+		Elf64_Rela *rela;
+		int numrela;
+
+		if (swab_elfword(shdr[i].sh_type) != SHT_RELA)
+			continue;
+
+		/* only consider RELA sections operating on data */
+		info = swab_elfword(shdr[i].sh_info);
+		flags = swab_elfxword(shdr[info].sh_flags);
+		if ((flags & (SHF_ALLOC | SHF_EXECINSTR)) != SHF_ALLOC)
+			continue;
+
+		/*
+		 * We generally don't permit ABS64 relocations in the code that
+		 * runs before relocation processing occurs. If statically
+		 * initialized absolute symbol references are unavoidable, they
+		 * may be emitted into a *.rodata.prel64 section and they will
+		 * be converted to place-relative 64-bit references. This
+		 * requires special handling in the referring code.
+		 */
+		if (strstr(strtab + swab_elfword(shdr[info].sh_name),
+			   ".rodata.prel64")) {
+			prel64 = true;
+		}
+
+		rela = (void *)ehdr + swab_elfxword(shdr[i].sh_offset);
+		numrela = swab_elfxword(shdr[i].sh_size) / sizeof(*rela);
+
+		for (int j = 0; j < numrela; j++) {
+			uint64_t info = swab_elfxword(rela[j].r_info);
+
+			if (ELF64_R_TYPE(info) != R_AARCH64_ABS64)
+				continue;
+
+			if (prel64) {
+				/* convert ABS64 into PREL64 */
+				info ^= R_AARCH64_ABS64 ^ R_AARCH64_PREL64;
+				rela[j].r_info = swab_elfxword(info);
+			} else {
+				fprintf(stderr,
+					"Unexpected absolute relocations detected in %s\n",
+					argv[2]);
+				close(fd);
+				unlink(argv[1]);
+				exit(EXIT_FAILURE);
+			}
+		}
+	}
+	close(fd);
+	return 0;
+}
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 12/41] arm64: kernel: Don't rely on objcopy to make code under pi/ __init
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 13/41] arm64: head: move relocation handling to C code Ard Biesheuvel
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will add some code under pi/ that contains global variables that
should not end up in __initdata, as they will not be writable via the
initial ID map. So only rely on objcopy for making the libfdt code
__init, and use explicit annotations for the rest.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/pi/Makefile      |  6 ++++--
 arch/arm64/kernel/pi/kaslr_early.c | 16 +++++++++-------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index bc32a431fe35..2bbe866417d4 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -28,11 +28,13 @@ quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
       cmd_piobjcopy = $(cmd_objcopy) && $(obj)/relacheck $(@) $(<)
 
 $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
-			       --remove-section=.note.gnu.property \
-			       --prefix-alloc-sections=.init
+			       --remove-section=.note.gnu.property
 $(obj)/%.pi.o: $(obj)/%.o $(obj)/relacheck FORCE
 	$(call if_changed,piobjcopy)
 
+# ensure that all the lib- code ends up as __init code and data
+$(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
+
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index b9e0bb4bc6a9..167081b30a15 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -17,7 +17,7 @@
 #include <asm/pgtable.h>
 
 /* taken from lib/string.c */
-static char *__strstr(const char *s1, const char *s2)
+static char *__init __strstr(const char *s1, const char *s2)
 {
 	size_t l1, l2;
 
@@ -33,7 +33,7 @@ static char *__strstr(const char *s1, const char *s2)
 	}
 	return NULL;
 }
-static bool cmdline_contains_nokaslr(const u8 *cmdline)
+static bool __init cmdline_contains_nokaslr(const u8 *cmdline)
 {
 	const u8 *str;
 
@@ -41,7 +41,7 @@ static bool cmdline_contains_nokaslr(const u8 *cmdline)
 	return str == cmdline || (str > cmdline && *(str - 1) == ' ');
 }
 
-static bool is_kaslr_disabled_cmdline(void *fdt)
+static bool __init is_kaslr_disabled_cmdline(void *fdt)
 {
 	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
 		int node;
@@ -67,17 +67,19 @@ static bool is_kaslr_disabled_cmdline(void *fdt)
 	return cmdline_contains_nokaslr(CONFIG_CMDLINE);
 }
 
-static u64 get_kaslr_seed(void *fdt)
+static u64 __init get_kaslr_seed(void *fdt)
 {
+	static char const chosen_str[] __initconst = "chosen";
+	static char const seed_str[] __initconst = "kaslr-seed";
 	int node, len;
 	fdt64_t *prop;
 	u64 ret;
 
-	node = fdt_path_offset(fdt, "/chosen");
+	node = fdt_path_offset(fdt, chosen_str);
 	if (node < 0)
 		return 0;
 
-	prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len);
+	prop = fdt_getprop_w(fdt, node, seed_str, &len);
 	if (!prop || len != sizeof(u64))
 		return 0;
 
@@ -86,7 +88,7 @@ static u64 get_kaslr_seed(void *fdt)
 	return ret;
 }
 
-asmlinkage u64 kaslr_early_init(void *fdt)
+asmlinkage u64 __init kaslr_early_init(void *fdt)
 {
 	u64 seed, range;
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 13/41] arm64: head: move relocation handling to C code
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 12/41] arm64: kernel: Don't rely on objcopy to make code under pi/ __init Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 14/41] arm64: idreg-override: Omit non-NULL checks for override pointer Ard Biesheuvel
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that we have a mini C runtime before the kernel mapping is up, we
can move the non-trivial relocation processing code out of head.S and
reimplement it in C.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/Makefile      |   3 +-
 arch/arm64/kernel/head.S        | 104 ++------------------
 arch/arm64/kernel/pi/Makefile   |   5 +-
 arch/arm64/kernel/pi/relocate.c |  62 ++++++++++++
 arch/arm64/kernel/vmlinux.lds.S |  12 ++-
 5 files changed, 82 insertions(+), 104 deletions(-)

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index d95b3d6b471a..a8487749cabd 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -57,7 +57,8 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
-obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o pi/
+obj-$(CONFIG_RELOCATABLE)		+= pi/
+obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_ELF_CORE)			+= elfcore.o
 obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index cab7f91949d8..a8fa64fc30d7 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -81,7 +81,7 @@
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
-	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
+	 *  x23        __primary_switch()                       physical misalignment/KASLR offset
 	 *  x24        __primary_switch()                       linear map KASLR seed
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
@@ -389,7 +389,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	/* Remap the kernel page tables r/w in the ID map */
 	adrp	x1, _text
 	adrp	x2, init_pg_dir
-	adrp	x3, init_pg_end
+	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov_q	x5, SWAPPER_RW_MMUFLAGS
 	mov	x6, #SWAPPER_BLOCK_SHIFT
@@ -779,97 +779,6 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 	b	1b
 SYM_FUNC_END(__no_granule_support)
 
-#ifdef CONFIG_RELOCATABLE
-SYM_FUNC_START_LOCAL(__relocate_kernel)
-	/*
-	 * Iterate over each entry in the relocation table, and apply the
-	 * relocations in place.
-	 */
-	adr_l	x9, __rela_start
-	adr_l	x10, __rela_end
-	mov_q	x11, KIMAGE_VADDR		// default virtual offset
-	add	x11, x11, x23			// actual virtual offset
-
-0:	cmp	x9, x10
-	b.hs	1f
-	ldp	x12, x13, [x9], #24
-	ldr	x14, [x9, #-8]
-	cmp	w13, #R_AARCH64_RELATIVE
-	b.ne	0b
-	add	x14, x14, x23			// relocate
-	str	x14, [x12, x23]
-	b	0b
-
-1:
-#ifdef CONFIG_RELR
-	/*
-	 * Apply RELR relocations.
-	 *
-	 * RELR is a compressed format for storing relative relocations. The
-	 * encoded sequence of entries looks like:
-	 * [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
-	 *
-	 * i.e. start with an address, followed by any number of bitmaps. The
-	 * address entry encodes 1 relocation. The subsequent bitmap entries
-	 * encode up to 63 relocations each, at subsequent offsets following
-	 * the last address entry.
-	 *
-	 * The bitmap entries must have 1 in the least significant bit. The
-	 * assumption here is that an address cannot have 1 in lsb. Odd
-	 * addresses are not supported. Any odd addresses are stored in the RELA
-	 * section, which is handled above.
-	 *
-	 * Excluding the least significant bit in the bitmap, each non-zero
-	 * bit in the bitmap represents a relocation to be applied to
-	 * a corresponding machine word that follows the base address
-	 * word. The second least significant bit represents the machine
-	 * word immediately following the initial address, and each bit
-	 * that follows represents the next word, in linear order. As such,
-	 * a single bitmap can encode up to 63 relocations in a 64-bit object.
-	 *
-	 * In this implementation we store the address of the next RELR table
-	 * entry in x9, the address being relocated by the current address or
-	 * bitmap entry in x13 and the address being relocated by the current
-	 * bit in x14.
-	 */
-	adr_l	x9, __relr_start
-	adr_l	x10, __relr_end
-
-2:	cmp	x9, x10
-	b.hs	7f
-	ldr	x11, [x9], #8
-	tbnz	x11, #0, 3f			// branch to handle bitmaps
-	add	x13, x11, x23
-	ldr	x12, [x13]			// relocate address entry
-	add	x12, x12, x23
-	str	x12, [x13], #8			// adjust to start of bitmap
-	b	2b
-
-3:	mov	x14, x13
-4:	lsr	x11, x11, #1
-	cbz	x11, 6f
-	tbz	x11, #0, 5f			// skip bit if not set
-	ldr	x12, [x14]			// relocate bit
-	add	x12, x12, x23
-	str	x12, [x14]
-
-5:	add	x14, x14, #8			// move to next bit's address
-	b	4b
-
-6:	/*
-	 * Move to the next bitmap's address. 8 is the word size, and 63 is the
-	 * number of significant bits in a bitmap entry.
-	 */
-	add	x13, x13, #(8 * 63)
-	b	2b
-
-7:
-#endif
-	ret
-
-SYM_FUNC_END(__relocate_kernel)
-#endif
-
 SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
@@ -877,11 +786,11 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 #ifdef CONFIG_RELOCATABLE
 	adrp	x23, KERNEL_START
 	and	x23, x23, MIN_KIMG_ALIGN - 1
-#ifdef CONFIG_RANDOMIZE_BASE
-	mov	x0, x22
-	adrp	x1, init_pg_end
+	adrp	x1, early_init_stack
 	mov	sp, x1
 	mov	x29, xzr
+#ifdef CONFIG_RANDOMIZE_BASE
+	mov	x0, x22
 	bl	__pi_kaslr_early_init
 	and	x24, x0, #SZ_2M - 1		// capture memstart offset seed
 	bic	x0, x0, #SZ_2M - 1
@@ -894,7 +803,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, init_pg_dir
 	load_ttbr1 x1, x1, x2
 #ifdef CONFIG_RELOCATABLE
-	bl	__relocate_kernel
+	mov	x0, x23
+	bl	__pi_relocate_kernel
 #endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 2bbe866417d4..d084c1dcf416 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -38,5 +38,6 @@ $(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y		:= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
-extra-y		:= $(patsubst %.pi.o,%.o,$(obj-y))
+obj-y				:= relocate.pi.o
+obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+extra-y				:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/pi/relocate.c b/arch/arm64/kernel/pi/relocate.c
new file mode 100644
index 000000000000..1853408ea76b
--- /dev/null
+++ b/arch/arm64/kernel/pi/relocate.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Authors: Ard Biesheuvel <ardb@google.com>
+//          Peter Collingbourne <pcc@google.com>
+
+#include <linux/elf.h>
+#include <linux/init.h>
+#include <linux/types.h>
+
+extern const Elf64_Rela rela_start[], rela_end[];
+extern const u64 relr_start[], relr_end[];
+
+void __init relocate_kernel(u64 offset)
+{
+	u64 *place = NULL;
+
+	for (const Elf64_Rela *rela = rela_start; rela < rela_end; rela++) {
+		if (ELF64_R_TYPE(rela->r_info) != R_AARCH64_RELATIVE)
+			continue;
+		*(u64 *)(rela->r_offset + offset) = rela->r_addend + offset;
+	}
+
+	if (!IS_ENABLED(CONFIG_RELR) || !offset)
+		return;
+
+	/*
+	 * Apply RELR relocations.
+	 *
+	 * RELR is a compressed format for storing relative relocations. The
+	 * encoded sequence of entries looks like:
+	 * [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
+	 *
+	 * i.e. start with an address, followed by any number of bitmaps. The
+	 * address entry encodes 1 relocation. The subsequent bitmap entries
+	 * encode up to 63 relocations each, at subsequent offsets following
+	 * the last address entry.
+	 *
+	 * The bitmap entries must have 1 in the least significant bit. The
+	 * assumption here is that an address cannot have 1 in lsb. Odd
+	 * addresses are not supported. Any odd addresses are stored in the
+	 * RELA section, which is handled above.
+	 *
+	 * With the exception of the least significant bit, each bit in the
+	 * bitmap corresponds with a machine word that follows the base address
+	 * word, and the bit value indicates whether or not a relocation needs
+	 * to be applied to it. The second least significant bit represents the
+	 * machine word immediately following the initial address, and each bit
+	 * that follows represents the next word, in linear order. As such, a
+	 * single bitmap can encode up to 63 relocations in a 64-bit object.
+	 */
+	for (const u64 *relr = relr_start; relr < relr_end; relr++) {
+		if ((*relr & 1) == 0) {
+			place = (u64 *)(*relr + offset);
+			*place++ += offset;
+		} else {
+			for (u64 *p = place, r = *relr >> 1; r; p++, r >>= 1)
+				if (r & 1)
+					*p += offset;
+			place += 63;
+		}
+	}
+}
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3cd7e76cc562..8dd5dda66f7c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -270,15 +270,15 @@ SECTIONS
 	HYPERVISOR_RELOC_SECTION
 
 	.rela.dyn : ALIGN(8) {
-		__rela_start = .;
+		__pi_rela_start = .;
 		*(.rela .rela*)
-		__rela_end = .;
+		__pi_rela_end = .;
 	}
 
 	.relr.dyn : ALIGN(8) {
-		__relr_start = .;
+		__pi_relr_start = .;
 		*(.relr.dyn)
-		__relr_end = .;
+		__pi_relr_end = .;
 	}
 
 	. = ALIGN(SEGMENT_ALIGN);
@@ -317,6 +317,10 @@ SECTIONS
 	init_pg_dir = .;
 	. += INIT_DIR_SIZE;
 	init_pg_end = .;
+#ifdef CONFIG_RELOCATABLE
+	. += SZ_4K;		/* stack for the early relocation code */
+	early_init_stack = .;
+#endif
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__pecoff_data_size = ABSOLUTE(. - __initdata_begin);
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 14/41] arm64: idreg-override: Omit non-NULL checks for override pointer
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 13/41] arm64: head: move relocation handling to C code Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 15/41] arm64: idreg-override: Prepare for place relative reloc patching Ard Biesheuvel
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that override pointers are always set, we can drop the various
non-NULL checks that we have in the code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 3addc09f8746..536bc33859bc 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -216,9 +216,6 @@ static void __init match_options(const char *cmdline)
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
 		int f;
 
-		if (!regs[i]->override)
-			continue;
-
 		for (f = 0; strlen(regs[i]->fields[f].name); f++) {
 			u64 shift = regs[i]->fields[f].shift;
 			u64 width = regs[i]->fields[f].width ?: 4;
@@ -319,10 +316,8 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
-		if (regs[i]->override) {
-			regs[i]->override->val  = 0;
-			regs[i]->override->mask = 0;
-		}
+		regs[i]->override->val  = 0;
+		regs[i]->override->mask = 0;
 	}
 
 	__boot_status = boot_status;
@@ -330,9 +325,8 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 	parse_cmdline();
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
-		if (regs[i]->override)
-			dcache_clean_inval_poc((unsigned long)regs[i]->override,
-					    (unsigned long)regs[i]->override +
-					    sizeof(*regs[i]->override));
+		dcache_clean_inval_poc((unsigned long)regs[i]->override,
+				       (unsigned long)regs[i]->override +
+				       sizeof(*regs[i]->override));
 	}
 }
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 15/41] arm64: idreg-override: Prepare for place relative reloc patching
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 14/41] arm64: idreg-override: Omit non-NULL checks for override pointer Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 16/41] arm64: idreg-override: Avoid parameq() and parameqn() Ard Biesheuvel
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The ID reg override handling code uses a rather elaborate data structure
that relies on statically initialized absolute address values in pointer
fields. This means that this code cannot run until relocation fixups
have been applied, and this is unfortunate, because it means we cannot
discover overrides for KASLR or LVA/LPA without creating the kernel
mapping and performing the relocations first.

This can be solved by switching to place-relative relocations, which can
be applied by the linker at build time. This means some additional
arithmetic is required when dereferencing these pointers, as we can no
longer dereference the pointer members directly.

So let's implement this for idreg-override.c in a preliminary way, i.e.,
convert all the references in code to use a special accessor that
produces the correct absolute value at runtime.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 89 ++++++++++++--------
 1 file changed, 56 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 536bc33859bc..ca1b8d2dbe99 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -21,14 +21,25 @@
 
 static u64 __boot_status __initdata;
 
+// temporary __prel64 related definitions
+// to be removed when this code is moved under pi/
+
+#define __prel64_initconst	__initconst
+
+#define PREL64(type, name)	union { type *name; }
+
+#define prel64_pointer(__d)	(__d)
+
+typedef bool filter_t(u64 val);
+
 struct ftr_set_desc {
 	char 				name[FTR_DESC_NAME_LEN];
-	struct arm64_ftr_override	*override;
+	PREL64(struct arm64_ftr_override, override);
 	struct {
 		char			name[FTR_DESC_FIELD_LEN];
 		u8			shift;
 		u8			width;
-		bool			(*filter)(u64 val);
+		PREL64(filter_t,	filter);
 	} 				fields[];
 };
 
@@ -46,7 +57,7 @@ static bool __init mmfr1_vh_filter(u64 val)
 		 val == 0);
 }
 
-static const struct ftr_set_desc mmfr1 __initconst = {
+static const struct ftr_set_desc mmfr1 __prel64_initconst = {
 	.name		= "id_aa64mmfr1",
 	.override	= &id_aa64mmfr1_override,
 	.fields		= {
@@ -70,7 +81,7 @@ static bool __init pfr0_sve_filter(u64 val)
 	return true;
 }
 
-static const struct ftr_set_desc pfr0 __initconst = {
+static const struct ftr_set_desc pfr0 __prel64_initconst = {
 	.name		= "id_aa64pfr0",
 	.override	= &id_aa64pfr0_override,
 	.fields		= {
@@ -94,7 +105,7 @@ static bool __init pfr1_sme_filter(u64 val)
 	return true;
 }
 
-static const struct ftr_set_desc pfr1 __initconst = {
+static const struct ftr_set_desc pfr1 __prel64_initconst = {
 	.name		= "id_aa64pfr1",
 	.override	= &id_aa64pfr1_override,
 	.fields		= {
@@ -105,7 +116,7 @@ static const struct ftr_set_desc pfr1 __initconst = {
 	},
 };
 
-static const struct ftr_set_desc isar1 __initconst = {
+static const struct ftr_set_desc isar1 __prel64_initconst = {
 	.name		= "id_aa64isar1",
 	.override	= &id_aa64isar1_override,
 	.fields		= {
@@ -117,7 +128,7 @@ static const struct ftr_set_desc isar1 __initconst = {
 	},
 };
 
-static const struct ftr_set_desc isar2 __initconst = {
+static const struct ftr_set_desc isar2 __prel64_initconst = {
 	.name		= "id_aa64isar2",
 	.override	= &id_aa64isar2_override,
 	.fields		= {
@@ -128,7 +139,7 @@ static const struct ftr_set_desc isar2 __initconst = {
 	},
 };
 
-static const struct ftr_set_desc smfr0 __initconst = {
+static const struct ftr_set_desc smfr0 __prel64_initconst = {
 	.name		= "id_aa64smfr0",
 	.override	= &id_aa64smfr0_override,
 	.fields		= {
@@ -149,7 +160,7 @@ static bool __init hvhe_filter(u64 val)
 						     ID_AA64MMFR1_EL1_VH_SHIFT));
 }
 
-static const struct ftr_set_desc sw_features __initconst = {
+static const struct ftr_set_desc sw_features __prel64_initconst = {
 	.name		= "arm64_sw",
 	.override	= &arm64_sw_feature_override,
 	.fields		= {
@@ -159,14 +170,15 @@ static const struct ftr_set_desc sw_features __initconst = {
 	},
 };
 
-static const struct ftr_set_desc * const regs[] __initconst = {
-	&mmfr1,
-	&pfr0,
-	&pfr1,
-	&isar1,
-	&isar2,
-	&smfr0,
-	&sw_features,
+static const
+PREL64(const struct ftr_set_desc, reg) regs[] __prel64_initconst = {
+	{ &mmfr1	},
+	{ &pfr0 	},
+	{ &pfr1 	},
+	{ &isar1	},
+	{ &isar2	},
+	{ &smfr0	},
+	{ &sw_features	},
 };
 
 static const struct {
@@ -214,15 +226,20 @@ static void __init match_options(const char *cmdline)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
+		const struct ftr_set_desc *reg = prel64_pointer(regs[i].reg);
+		struct arm64_ftr_override *override;
 		int f;
 
-		for (f = 0; strlen(regs[i]->fields[f].name); f++) {
-			u64 shift = regs[i]->fields[f].shift;
-			u64 width = regs[i]->fields[f].width ?: 4;
+		override = prel64_pointer(reg->override);
+
+		for (f = 0; strlen(reg->fields[f].name); f++) {
+			u64 shift = reg->fields[f].shift;
+			u64 width = reg->fields[f].width ?: 4;
 			u64 mask = GENMASK_ULL(shift + width - 1, shift);
+			bool (*filter)(u64 val);
 			u64 v;
 
-			if (find_field(cmdline, regs[i], f, &v))
+			if (find_field(cmdline, reg, f, &v))
 				continue;
 
 			/*
@@ -230,16 +247,16 @@ static void __init match_options(const char *cmdline)
 			 * it by setting the value to the all-ones while
 			 * clearing the mask... Yes, this is fragile.
 			 */
-			if (regs[i]->fields[f].filter &&
-			    !regs[i]->fields[f].filter(v)) {
-				regs[i]->override->val  |= mask;
-				regs[i]->override->mask &= ~mask;
+			filter = prel64_pointer(reg->fields[f].filter);
+			if (filter && !filter(v)) {
+				override->val  |= mask;
+				override->mask &= ~mask;
 				continue;
 			}
 
-			regs[i]->override->val  &= ~mask;
-			regs[i]->override->val  |= (v << shift) & mask;
-			regs[i]->override->mask |= mask;
+			override->val  &= ~mask;
+			override->val  |= (v << shift) & mask;
+			override->mask |= mask;
 
 			return;
 		}
@@ -313,11 +330,16 @@ void init_feature_override(u64 boot_status);
 
 asmlinkage void __init init_feature_override(u64 boot_status)
 {
+	struct arm64_ftr_override *override;
+	const struct ftr_set_desc *reg;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
-		regs[i]->override->val  = 0;
-		regs[i]->override->mask = 0;
+		reg = prel64_pointer(regs[i].reg);
+		override = prel64_pointer(reg->override);
+
+		override->val  = 0;
+		override->mask = 0;
 	}
 
 	__boot_status = boot_status;
@@ -325,8 +347,9 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 	parse_cmdline();
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
-		dcache_clean_inval_poc((unsigned long)regs[i]->override,
-				       (unsigned long)regs[i]->override +
-				       sizeof(*regs[i]->override));
+		reg = prel64_pointer(regs[i].reg);
+		override = prel64_pointer(reg->override);
+		dcache_clean_inval_poc((unsigned long)override,
+				       (unsigned long)(override + 1));
 	}
 }
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 16/41] arm64: idreg-override: Avoid parameq() and parameqn()
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 15/41] arm64: idreg-override: Prepare for place relative reloc patching Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 17/41] arm64: idreg-override: avoid strlen() to check for empty strings Ard Biesheuvel
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The only way parameq() and parameqn() deviate from the ordinary string
and memory routines is that they ignore the difference between dashes
and underscores.

Since we copy each command line argument into a buffer before passing it
to parameq() and parameqn() numerous times, let's just convert all
dashes to underscores just once, and update the alias array accordingly.

This also helps reduce the dependency on kernel APIs that are no longer
available once we move this code into the early mini C runtime.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 28 ++++++++++++--------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index ca1b8d2dbe99..1eca93446345 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -185,8 +185,8 @@ static const struct {
 	char	alias[FTR_ALIAS_NAME_LEN];
 	char	feature[FTR_ALIAS_OPTION_LEN];
 } aliases[] __initconst = {
-	{ "kvm-arm.mode=nvhe",		"id_aa64mmfr1.vh=0" },
-	{ "kvm-arm.mode=protected",	"id_aa64mmfr1.vh=0" },
+	{ "kvm_arm.mode=nvhe",		"id_aa64mmfr1.vh=0" },
+	{ "kvm_arm.mode=protected",	"id_aa64mmfr1.vh=0" },
 	{ "arm64.nosve",		"id_aa64pfr0.sve=0" },
 	{ "arm64.nosme",		"id_aa64pfr1.sme=0" },
 	{ "arm64.nobti",		"id_aa64pfr1.bt=0" },
@@ -215,7 +215,7 @@ static int __init find_field(const char *cmdline,
 	len = snprintf(opt, ARRAY_SIZE(opt), "%s.%s=",
 		       reg->name, reg->fields[f].name);
 
-	if (!parameqn(cmdline, opt, len))
+	if (memcmp(cmdline, opt, len))
 		return -1;
 
 	return kstrtou64(cmdline + len, 0, v);
@@ -272,23 +272,29 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
 
 		cmdline = skip_spaces(cmdline);
 
-		for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++);
-		if (!len)
+		/* terminate on "--" appearing on the command line by itself */
+		if (cmdline[0] == '-' && cmdline[1] == '-' && isspace(cmdline[2]))
 			return;
 
-		len = min(len, ARRAY_SIZE(buf) - 1);
-		memcpy(buf, cmdline, len);
-		buf[len] = '\0';
-
-		if (strcmp(buf, "--") == 0)
+		for (len = 0; cmdline[len] && !isspace(cmdline[len]); len++) {
+			if (len >= sizeof(buf) - 1)
+				break;
+			if (cmdline[len] == '-')
+				buf[len] = '_';
+			else
+				buf[len] = cmdline[len];
+		}
+		if (!len)
 			return;
 
+		buf[len] = 0;
+
 		cmdline += len;
 
 		match_options(buf);
 
 		for (i = 0; parse_aliases && i < ARRAY_SIZE(aliases); i++)
-			if (parameq(buf, aliases[i].alias))
+			if (!memcmp(buf, aliases[i].alias, len + 1))
 				__parse_cmdline(aliases[i].feature, false);
 	} while (1);
 }
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 17/41] arm64: idreg-override: avoid strlen() to check for empty strings
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 16/41] arm64: idreg-override: Avoid parameq() and parameqn() Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 18/41] arm64: idreg-override: Avoid sprintf() for simple string concatenation Ard Biesheuvel
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

strlen() is a costly way to decide whether a string is empty, as in that
case, the first character will be NUL so we can check for that directly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 1eca93446345..8b22ca523186 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -232,7 +232,7 @@ static void __init match_options(const char *cmdline)
 
 		override = prel64_pointer(reg->override);
 
-		for (f = 0; strlen(reg->fields[f].name); f++) {
+		for (f = 0; reg->fields[f].name[0] != '\0'; f++) {
 			u64 shift = reg->fields[f].shift;
 			u64 width = reg->fields[f].width ?: 4;
 			u64 mask = GENMASK_ULL(shift + width - 1, shift);
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 18/41] arm64: idreg-override: Avoid sprintf() for simple string concatenation
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 17/41] arm64: idreg-override: avoid strlen() to check for empty strings Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 19/41] arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit Ard Biesheuvel
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Instead of using sprintf() with the "%s.%s=" format, where the first
string argument is always the same in the inner loop of match_options(),
use simple memcpy() for string concatenation, and move the first copy to
the outer loop. This removes the dependency on sprintf(), which will be
difficult to fulfil when we move this code into the early mini C
runtime.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 8b22ca523186..cf1df5f6fbc1 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -206,14 +206,15 @@ static int __init parse_nokaslr(char *unused)
 }
 early_param("nokaslr", parse_nokaslr);
 
-static int __init find_field(const char *cmdline,
+static int __init find_field(const char *cmdline, char *opt, int len,
 			     const struct ftr_set_desc *reg, int f, u64 *v)
 {
-	char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2];
-	int len;
+	int flen = strlen(reg->fields[f].name);
 
-	len = snprintf(opt, ARRAY_SIZE(opt), "%s.%s=",
-		       reg->name, reg->fields[f].name);
+	// append '<fieldname>=' to obtain '<name>.<fieldname>='
+	memcpy(opt + len, reg->fields[f].name, flen);
+	len += flen;
+	opt[len++] = '=';
 
 	if (memcmp(cmdline, opt, len))
 		return -1;
@@ -223,15 +224,21 @@ static int __init find_field(const char *cmdline,
 
 static void __init match_options(const char *cmdline)
 {
+	char opt[FTR_DESC_NAME_LEN + FTR_DESC_FIELD_LEN + 2];
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
 		const struct ftr_set_desc *reg = prel64_pointer(regs[i].reg);
 		struct arm64_ftr_override *override;
+		int len = strlen(reg->name);
 		int f;
 
 		override = prel64_pointer(reg->override);
 
+		// set opt[] to '<name>.'
+		memcpy(opt, reg->name, len);
+		opt[len++] = '.';
+
 		for (f = 0; reg->fields[f].name[0] != '\0'; f++) {
 			u64 shift = reg->fields[f].shift;
 			u64 width = reg->fields[f].width ?: 4;
@@ -239,7 +246,7 @@ static void __init match_options(const char *cmdline)
 			bool (*filter)(u64 val);
 			u64 v;
 
-			if (find_field(cmdline, reg, f, &v))
+			if (find_field(cmdline, opt, len, reg, f, &v))
 				continue;
 
 			/*
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 19/41] arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 18/41] arm64: idreg-override: Avoid sprintf() for simple string concatenation Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 20/41] arm64/kernel: Move 'nokaslr' parsing out of early idreg code Ard Biesheuvel
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

All ID register value overrides are =0 with the exception of the nokaslr
pseudo feature which uses =1. In order to remove the dependency on
kstrtou64(), which is part of the core kernel and no longer usable once
we move idreg-override into the early mini C runtime, let's just parse a
single hex digit (with optional leading 0x) and set the output value
accordingly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index cf1df5f6fbc1..9646f94094ed 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -206,6 +206,20 @@ static int __init parse_nokaslr(char *unused)
 }
 early_param("nokaslr", parse_nokaslr);
 
+static int __init parse_hexdigit(const char *p, u64 *v)
+{
+	// skip "0x" if it comes next
+	if (p[0] == '0' && tolower(p[1]) == 'x')
+		p += 2;
+
+	// check whether the RHS is a single hex digit
+	if (!isxdigit(p[0]) || (p[1] && !isspace(p[1])))
+		return -EINVAL;
+
+	*v = tolower(*p) - (isdigit(*p) ? '0' : 'a' - 10);
+	return 0;
+}
+
 static int __init find_field(const char *cmdline, char *opt, int len,
 			     const struct ftr_set_desc *reg, int f, u64 *v)
 {
@@ -219,7 +233,7 @@ static int __init find_field(const char *cmdline, char *opt, int len,
 	if (memcmp(cmdline, opt, len))
 		return -1;
 
-	return kstrtou64(cmdline + len, 0, v);
+	return parse_hexdigit(cmdline + len, v);
 }
 
 static void __init match_options(const char *cmdline)
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 20/41] arm64/kernel: Move 'nokaslr' parsing out of early idreg code
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 19/41] arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 21/41] arm64: idreg-override: Move to early mini C runtime Ard Biesheuvel
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Parsing and ignoring 'nokaslr' can be done from anywhere, except from
the code that runs very early and is therefore built with limitations on
the kind of relocations it is permitted to use.

So move it to a source file that is part of the ordinary kernel build.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 7 -------
 arch/arm64/kernel/kaslr.c          | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 9646f94094ed..e30fd9e32ef3 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -199,13 +199,6 @@ static const struct {
 	{ "nokaslr",			"arm64_sw.nokaslr=1" },
 };
 
-static int __init parse_nokaslr(char *unused)
-{
-	/* nokaslr param handling is done by early cpufeature code */
-	return 0;
-}
-early_param("nokaslr", parse_nokaslr);
-
 static int __init parse_hexdigit(const char *p, u64 *v)
 {
 	// skip "0x" if it comes next
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 94a269cd1f07..12c7f3c8ba76 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -36,3 +36,10 @@ void __init kaslr_init(void)
 	pr_info("KASLR enabled\n");
 	__kaslr_is_enabled = true;
 }
+
+static int __init parse_nokaslr(char *unused)
+{
+	/* nokaslr param handling is done by early cpufeature code */
+	return 0;
+}
+early_param("nokaslr", parse_nokaslr);
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 21/41] arm64: idreg-override: Move to early mini C runtime
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (19 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 20/41] arm64/kernel: Move 'nokaslr' parsing out of early idreg code Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 22/41] arm64: kernel: Remove early fdt remap code Ard Biesheuvel
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will want to parse the ID register overrides even earlier, so that we
can take them into account before creating the kernel mapping. So
migrate the code and make it work in the context of the early C runtime.
We will move the invocation to an earlier stage in a subsequent patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/Makefile                  |  4 +--
 arch/arm64/kernel/head.S                    |  5 ++-
 arch/arm64/kernel/image-vars.h              |  9 +++++
 arch/arm64/kernel/pi/Makefile               |  5 +--
 arch/arm64/kernel/{ => pi}/idreg-override.c | 35 +++++++++-----------
 5 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a8487749cabd..dc85dc2ee4ed 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -33,8 +33,7 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o proton-pack.o idreg-override.o idle.o	\
-			   patching.o
+			   syscall.o proton-pack.o idle.o patching.o pi/
 
 obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
 					   sys_compat.o
@@ -57,7 +56,6 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
-obj-$(CONFIG_RELOCATABLE)		+= pi/
 obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_ELF_CORE)			+= elfcore.o
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a8fa64fc30d7..ca5e5fbefcd3 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -510,10 +510,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
-	mov	x0, x21				// pass FDT address in x0
-	bl	early_fdt_map			// Try mapping the FDT early
 	mov	x0, x20				// pass the full boot status
-	bl	init_feature_override		// Parse cpu feature overrides
+	mov	x1, x22				// pass the low FDT mapping
+	bl	__pi_init_feature_override	// Parse cpu feature overrides
 #ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
 	bl	scs_patch_vmlinux
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index e931ce078a00..eacc3d167733 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -37,6 +37,15 @@ PROVIDE(__pi___memmove			= __pi_memmove);
 PROVIDE(__pi___memset			= __pi_memset);
 
 PROVIDE(__pi_vabits_actual		= vabits_actual);
+PROVIDE(__pi_id_aa64isar1_override	= id_aa64isar1_override);
+PROVIDE(__pi_id_aa64isar2_override	= id_aa64isar2_override);
+PROVIDE(__pi_id_aa64mmfr1_override	= id_aa64mmfr1_override);
+PROVIDE(__pi_id_aa64pfr0_override	= id_aa64pfr0_override);
+PROVIDE(__pi_id_aa64pfr1_override	= id_aa64pfr1_override);
+PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
+PROVIDE(__pi_id_aa64zfr0_override	= id_aa64zfr0_override);
+PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
+PROVIDE(__pi__ctype			= _ctype);
 
 #ifdef CONFIG_KVM
 
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index d084c1dcf416..7f6dfce893c3 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -38,6 +38,7 @@ $(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y				:= relocate.pi.o
-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-y				:= idreg-override.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-$(CONFIG_RELOCATABLE)	+= relocate.pi.o
+obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o
 extra-y				:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
similarity index 93%
rename from arch/arm64/kernel/idreg-override.c
rename to arch/arm64/kernel/pi/idreg-override.c
index e30fd9e32ef3..f9e05c10faab 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -14,6 +14,8 @@
 #include <asm/cpufeature.h>
 #include <asm/setup.h>
 
+#include "pi.h"
+
 #define FTR_DESC_NAME_LEN	20
 #define FTR_DESC_FIELD_LEN	10
 #define FTR_ALIAS_NAME_LEN	30
@@ -21,15 +23,6 @@
 
 static u64 __boot_status __initdata;
 
-// temporary __prel64 related definitions
-// to be removed when this code is moved under pi/
-
-#define __prel64_initconst	__initconst
-
-#define PREL64(type, name)	union { type *name; }
-
-#define prel64_pointer(__d)	(__d)
-
 typedef bool filter_t(u64 val);
 
 struct ftr_set_desc {
@@ -313,16 +306,11 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
 	} while (1);
 }
 
-static __init const u8 *get_bootargs_cmdline(void)
+static __init const u8 *get_bootargs_cmdline(const void *fdt)
 {
 	const u8 *prop;
-	void *fdt;
 	int node;
 
-	fdt = get_early_fdt_ptr();
-	if (!fdt)
-		return NULL;
-
 	node = fdt_path_offset(fdt, "/chosen");
 	if (node < 0)
 		return NULL;
@@ -334,9 +322,9 @@ static __init const u8 *get_bootargs_cmdline(void)
 	return strlen(prop) ? prop : NULL;
 }
 
-static __init void parse_cmdline(void)
+static __init void parse_cmdline(const void *fdt)
 {
-	const u8 *prop = get_bootargs_cmdline();
+	const u8 *prop = get_bootargs_cmdline(fdt);
 
 	if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
 		__parse_cmdline(CONFIG_CMDLINE, true);
@@ -346,9 +334,9 @@ static __init void parse_cmdline(void)
 }
 
 /* Keep checkers quiet */
-void init_feature_override(u64 boot_status);
+void init_feature_override(u64 boot_status, const void *fdt);
 
-asmlinkage void __init init_feature_override(u64 boot_status)
+asmlinkage void __init init_feature_override(u64 boot_status, const void *fdt)
 {
 	struct arm64_ftr_override *override;
 	const struct ftr_set_desc *reg;
@@ -364,7 +352,7 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 
 	__boot_status = boot_status;
 
-	parse_cmdline();
+	parse_cmdline(fdt);
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
 		reg = prel64_pointer(regs[i].reg);
@@ -373,3 +361,10 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 				       (unsigned long)(override + 1));
 	}
 }
+
+char * __init skip_spaces(const char *str)
+{
+	while (isspace(*str))
+		++str;
+	return (char *)str;
+}
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 22/41] arm64: kernel: Remove early fdt remap code
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (20 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 21/41] arm64: idreg-override: Move to early mini C runtime Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 23/41] arm64: head: Clear BSS and the kernel page tables in one go Ard Biesheuvel
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The early FDT remap code is no longer used so let's drop it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/setup.h |  3 ---
 arch/arm64/kernel/setup.c      | 15 ---------------
 2 files changed, 18 deletions(-)

diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
index 2e4d7da74fb8..ba269a7a3201 100644
--- a/arch/arm64/include/asm/setup.h
+++ b/arch/arm64/include/asm/setup.h
@@ -7,9 +7,6 @@
 
 #include <uapi/asm/setup.h>
 
-void *get_early_fdt_ptr(void);
-void early_fdt_map(u64 dt_phys);
-
 /*
  * These two variables are used in the head.S file.
  */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 417a8a86b2db..4b0b3515ee20 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -166,21 +166,6 @@ static void __init smp_build_mpidr_hash(void)
 		pr_warn("Large number of MPIDR hash buckets detected\n");
 }
 
-static void *early_fdt_ptr __initdata;
-
-void __init *get_early_fdt_ptr(void)
-{
-	return early_fdt_ptr;
-}
-
-asmlinkage void __init early_fdt_map(u64 dt_phys)
-{
-	int fdt_size;
-
-	early_fixmap_init();
-	early_fdt_ptr = fixmap_remap_fdt(dt_phys, &fdt_size, PAGE_KERNEL);
-}
-
 static void __init setup_machine_fdt(phys_addr_t dt_phys)
 {
 	int size;
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 23/41] arm64: head: Clear BSS and the kernel page tables in one go
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (21 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 22/41] arm64: kernel: Remove early fdt remap code Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 24/41] arm64: Move feature overrides into the BSS section Ard Biesheuvel
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will move the CPU feature overrides into BSS in a subsequent patch,
and this requires that BSS is zeroed before the feature override
detection code runs. So let's map BSS read-write in the ID map, and zero
it via this mapping.

Since the kernel page tables are right next to it, and also zeroed via
the ID map, let's drop the separate clear_page_tables() function, and
just zero everything in one go.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S        | 33 +++++++-------------
 arch/arm64/kernel/vmlinux.lds.S |  3 ++
 2 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ca5e5fbefcd3..2af518161f3a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -177,17 +177,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 	ret
 SYM_CODE_END(preserve_boot_args)
 
-SYM_FUNC_START_LOCAL(clear_page_tables)
-	/*
-	 * Clear the init page tables.
-	 */
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	sub	x2, x1, x0
-	mov	x1, xzr
-	b	__pi_memset			// tail call
-SYM_FUNC_END(clear_page_tables)
-
 /*
  * Macro to populate page table entries, these entries can be pointers to the next level
  * or last level entries pointing to physical memory.
@@ -386,9 +375,9 @@ SYM_FUNC_START_LOCAL(create_idmap)
 
 	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
 
-	/* Remap the kernel page tables r/w in the ID map */
+	/* Remap BSS and the kernel page tables r/w in the ID map */
 	adrp	x1, _text
-	adrp	x2, init_pg_dir
+	adrp	x2, __bss_start
 	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov_q	x5, SWAPPER_RW_MMUFLAGS
@@ -489,14 +478,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	mov	x0, x20
 	bl	set_cpu_boot_mode_flag
 
-	// Clear BSS
-	adr_l	x0, __bss_start
-	mov	x1, xzr
-	adr_l	x2, __bss_stop
-	sub	x2, x2, x0
-	bl	__pi_memset
-	dsb	ishst				// Make zero page visible to PTW
-
 #if VA_BITS > 48
 	adr_l	x8, vabits_actual		// Set this early so KASAN early init
 	str	x25, [x8]			// ... observes the correct value
@@ -782,6 +763,15 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
+
+	// Clear BSS
+	adrp	x0, __bss_start
+	mov	x1, xzr
+	adrp	x2, init_pg_end
+	sub	x2, x2, x0
+	bl	__pi_memset
+	dsb	ishst				// Make zero page visible to PTW
+
 #ifdef CONFIG_RELOCATABLE
 	adrp	x23, KERNEL_START
 	and	x23, x23, MIN_KIMG_ALIGN - 1
@@ -796,7 +786,6 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	orr	x23, x23, x0			// record kernel offset
 #endif
 #endif
-	bl	clear_page_tables
 	bl	create_kernel_mapping
 
 	adrp	x1, init_pg_dir
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 8dd5dda66f7c..8a3c6aacc355 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -311,12 +311,15 @@ SECTIONS
 	__pecoff_data_rawsize = ABSOLUTE(. - __initdata_begin);
 	_edata = .;
 
+	/* start of zero-init region */
 	BSS_SECTION(SBSS_ALIGN, 0, 0)
 
 	. = ALIGN(PAGE_SIZE);
 	init_pg_dir = .;
 	. += INIT_DIR_SIZE;
 	init_pg_end = .;
+	/* end of zero-init region */
+
 #ifdef CONFIG_RELOCATABLE
 	. += SZ_4K;		/* stack for the early relocation code */
 	early_init_stack = .;
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 24/41] arm64: Move feature overrides into the BSS section
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (22 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 23/41] arm64: head: Clear BSS and the kernel page tables in one go Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 25/41] arm64: head: Run feature override detection before mapping the kernel Ard Biesheuvel
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In order to allow the CPU feature override detection code to run even
earlier, move the feature override global variables into BSS, which is
the only part of the static kernel image that is mapped read-write in
the initial ID map.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 646591c67e7a..b95374387946 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -655,13 +655,13 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 #define ARM64_FTR_REG(id, table)		\
 	__ARM64_FTR_REG_OVERRIDE(#id, id, table, &no_override)
 
-struct arm64_ftr_override __ro_after_init id_aa64mmfr1_override;
-struct arm64_ftr_override __ro_after_init id_aa64pfr0_override;
-struct arm64_ftr_override __ro_after_init id_aa64pfr1_override;
-struct arm64_ftr_override __ro_after_init id_aa64zfr0_override;
-struct arm64_ftr_override __ro_after_init id_aa64smfr0_override;
-struct arm64_ftr_override __ro_after_init id_aa64isar1_override;
-struct arm64_ftr_override __ro_after_init id_aa64isar2_override;
+struct arm64_ftr_override id_aa64mmfr1_override;
+struct arm64_ftr_override id_aa64pfr0_override;
+struct arm64_ftr_override id_aa64pfr1_override;
+struct arm64_ftr_override id_aa64zfr0_override;
+struct arm64_ftr_override id_aa64smfr0_override;
+struct arm64_ftr_override id_aa64isar1_override;
+struct arm64_ftr_override id_aa64isar2_override;
 
 struct arm64_ftr_override arm64_sw_feature_override;
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 25/41] arm64: head: Run feature override detection before mapping the kernel
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (23 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 24/41] arm64: Move feature overrides into the BSS section Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 26/41] arm64: head: move dynamic shadow call stack patching into early C runtime Ard Biesheuvel
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

To permit the feature overrides to be taken into account before the
KASLR init code runs and the kernel mapping is created, move the
detection code to an earlier stage in the boot.

In a subsequent patch, this will be taken advantage of by merging the
preliminary and permanent mappings of the kernel text and data into a
single one that gets created and relocated before start_kernel() is
called.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S        | 17 +++++++++--------
 arch/arm64/kernel/vmlinux.lds.S |  4 +---
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 2af518161f3a..865ecc1f8255 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -375,9 +375,9 @@ SYM_FUNC_START_LOCAL(create_idmap)
 
 	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
 
-	/* Remap BSS and the kernel page tables r/w in the ID map */
+	/* Remap [.init].data, BSS and the kernel page tables r/w in the ID map */
 	adrp	x1, _text
-	adrp	x2, __bss_start
+	adrp	x2, __initdata_begin
 	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov_q	x5, SWAPPER_RW_MMUFLAGS
@@ -491,9 +491,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
-	mov	x0, x20				// pass the full boot status
-	mov	x1, x22				// pass the low FDT mapping
-	bl	__pi_init_feature_override	// Parse cpu feature overrides
 #ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
 	bl	scs_patch_vmlinux
 #endif
@@ -772,12 +769,16 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
-#ifdef CONFIG_RELOCATABLE
-	adrp	x23, KERNEL_START
-	and	x23, x23, MIN_KIMG_ALIGN - 1
 	adrp	x1, early_init_stack
 	mov	sp, x1
 	mov	x29, xzr
+	mov	x0, x20				// pass the full boot status
+	mov	x1, x22				// pass the low FDT mapping
+	bl	__pi_init_feature_override	// Parse cpu feature overrides
+
+#ifdef CONFIG_RELOCATABLE
+	adrp	x23, KERNEL_START
+	and	x23, x23, MIN_KIMG_ALIGN - 1
 #ifdef CONFIG_RANDOMIZE_BASE
 	mov	x0, x22
 	bl	__pi_kaslr_early_init
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 8a3c6aacc355..3afb4223a5e8 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -320,10 +320,8 @@ SECTIONS
 	init_pg_end = .;
 	/* end of zero-init region */
 
-#ifdef CONFIG_RELOCATABLE
-	. += SZ_4K;		/* stack for the early relocation code */
+	. += SZ_4K;		/* stack for the early C runtime */
 	early_init_stack = .;
-#endif
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__pecoff_data_size = ABSOLUTE(. - __initdata_begin);
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 26/41] arm64: head: move dynamic shadow call stack patching into early C runtime
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (24 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 25/41] arm64: head: Run feature override detection before mapping the kernel Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 27/41] arm64: cpufeature: Add helper to test for CPU feature overrides Ard Biesheuvel
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Once we update the early kernel mapping code to only map the kernel once
with the right permissions, we can no longer perform code patching via
this mapping.

So move this code to an earlier stage of the boot, right after applying
the relocations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/scs.h           |  4 +--
 arch/arm64/kernel/Makefile             |  2 --
 arch/arm64/kernel/head.S               |  8 +++---
 arch/arm64/kernel/module.c             |  2 +-
 arch/arm64/kernel/pi/Makefile          | 10 +++++---
 arch/arm64/kernel/{ => pi}/patch-scs.c | 26 ++++++++++----------
 6 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
index 3fdae5fe3142..eca2ba5a6276 100644
--- a/arch/arm64/include/asm/scs.h
+++ b/arch/arm64/include/asm/scs.h
@@ -72,8 +72,8 @@ static inline void dynamic_scs_init(void)
 static inline void dynamic_scs_init(void) {}
 #endif
 
-int scs_patch(const u8 eh_frame[], int size);
-asmlinkage void scs_patch_vmlinux(void);
+int __pi_scs_patch(const u8 eh_frame[], int size);
+asmlinkage void __pi_scs_patch_vmlinux(void);
 
 #endif /* __ASSEMBLY __ */
 
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index dc85dc2ee4ed..14b4a179bad3 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -71,8 +71,6 @@ obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
-obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
-CFLAGS_patch-scs.o			+= -mbranch-protection=none
 
 # Force dependency (vdso*-wrap.S includes vdso.so through incbin)
 $(obj)/vdso-wrap.o: $(obj)/vdso/vdso.so
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 865ecc1f8255..b320702032a7 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -490,9 +490,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
-#endif
-#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
-	bl	scs_patch_vmlinux
 #endif
 	mov	x0, x20
 	bl	finalise_el2			// Prefer VHE if possible
@@ -794,6 +791,11 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 #ifdef CONFIG_RELOCATABLE
 	mov	x0, x23
 	bl	__pi_relocate_kernel
+#endif
+#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
+	ldr	x0, =__eh_frame_start
+	ldr	x1, =__eh_frame_end
+	bl	__pi_scs_patch_vmlinux
 #endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index dd851297596e..47e0be610bb6 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -595,7 +595,7 @@ int module_finalize(const Elf_Ehdr *hdr,
 	if (scs_is_dynamic()) {
 		s = find_section(hdr, sechdrs, ".init.eh_frame");
 		if (s)
-			scs_patch((void *)s->sh_addr, s->sh_size);
+			__pi_scs_patch((void *)s->sh_addr, s->sh_size);
 	}
 
 	return module_init_ftrace_plt(hdr, sechdrs, me);
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 7f6dfce893c3..a8b302245f15 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -38,7 +38,9 @@ $(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y				:= idreg-override.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
-obj-$(CONFIG_RELOCATABLE)	+= relocate.pi.o
-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o
-extra-y				:= $(patsubst %.pi.o,%.o,$(obj-y))
+obj-y					:= idreg-override.pi.o \
+					   lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-$(CONFIG_RELOCATABLE)		+= relocate.pi.o
+obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr_early.pi.o
+obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.pi.o
+extra-y					:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/pi/patch-scs.c
similarity index 91%
rename from arch/arm64/kernel/patch-scs.c
rename to arch/arm64/kernel/pi/patch-scs.c
index a1fe4b4ff591..c65ef40d1e6b 100644
--- a/arch/arm64/kernel/patch-scs.c
+++ b/arch/arm64/kernel/pi/patch-scs.c
@@ -4,14 +4,11 @@
  * Author: Ard Biesheuvel <ardb@google.com>
  */
 
-#include <linux/bug.h>
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/linkage.h>
-#include <linux/printk.h>
 #include <linux/types.h>
 
-#include <asm/cacheflush.h>
 #include <asm/scs.h>
 
 //
@@ -81,7 +78,11 @@ static void __always_inline scs_patch_loc(u64 loc)
 		 */
 		return;
 	}
-	dcache_clean_pou(loc, loc + sizeof(u32));
+	if (IS_ENABLED(CONFIG_ARM64_WORKAROUND_CLEAN_CACHE))
+		asm("dc civac, %0" :: "r"(loc));
+	else
+		asm(ALTERNATIVE("dc cvau, %0", "nop", ARM64_HAS_CACHE_IDC)
+		    :: "r"(loc));
 }
 
 /*
@@ -128,10 +129,10 @@ struct eh_frame {
 	};
 };
 
-static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
-					bool fde_has_augmentation_data,
-					int code_alignment_factor,
-					bool dry_run)
+static int scs_handle_fde_frame(const struct eh_frame *frame,
+				bool fde_has_augmentation_data,
+				int code_alignment_factor,
+				bool dry_run)
 {
 	int size = frame->size - offsetof(struct eh_frame, opcodes) + 4;
 	u64 loc = (u64)offset_to_ptr(&frame->initial_loc);
@@ -198,14 +199,13 @@ static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
 			break;
 
 		default:
-			pr_err("unhandled opcode: %02x in FDE frame %lx\n", opcode[-1], (uintptr_t)frame);
 			return -ENOEXEC;
 		}
 	}
 	return 0;
 }
 
-int noinstr scs_patch(const u8 eh_frame[], int size)
+int scs_patch(const u8 eh_frame[], int size)
 {
 	const u8 *p = eh_frame;
 
@@ -251,12 +251,12 @@ int noinstr scs_patch(const u8 eh_frame[], int size)
 	return 0;
 }
 
-asmlinkage void __init scs_patch_vmlinux(void)
+asmlinkage void __init scs_patch_vmlinux(const u8 start[], const u8 end[])
 {
 	if (!should_patch_pac_into_scs())
 		return;
 
-	WARN_ON(scs_patch(__eh_frame_start, __eh_frame_end - __eh_frame_start));
-	icache_inval_all_pou();
+	scs_patch(start, end - start);
+	asm("ic ialluis");
 	isb();
 }
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 27/41] arm64: cpufeature: Add helper to test for CPU feature overrides
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (25 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 26/41] arm64: head: move dynamic shadow call stack patching into early C runtime Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 28/41] arm64: kaslr: Use feature override instead of parsing the cmdline again Ard Biesheuvel
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add some helpers to extract and apply feature overrides to the bare
idreg values. This involves inspecting the value and mask of the
specific field that we are interested in, given that an override
value/mask pair might be invalid for one field but valid for another.

Then, wire up the new helper for the hVHE test - note that we can drop
the sysreg test here, as the override will be invalid when trying to
enable hVHE on non-VHE capable hardware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 39 ++++++++++++++++++++
 arch/arm64/kernel/cpufeature.c      |  9 +----
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index f6d416fe49b0..a41cb0bc220b 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -909,6 +909,45 @@ extern struct arm64_ftr_override id_aa64isar2_override;
 
 extern struct arm64_ftr_override arm64_sw_feature_override;
 
+static inline
+u64 arm64_apply_feature_override(u64 val, int feat, int width,
+				 const struct arm64_ftr_override *override)
+{
+	u64 oval = override->val;
+
+	/*
+	 * When it encounters an invalid override (e.g., an override that
+	 * cannot be honoured due to a missing CPU feature), the early idreg
+	 * override code will set the mask to 0x0 and the value to non-zero for
+	 * the field in question. In order to determine whether the override is
+	 * valid or not for the field we are interested in, we first need to
+	 * disregard bits belonging to other fields.
+	 */
+	oval &= GENMASK_ULL(feat + width - 1, feat);
+
+	/*
+	 * The override is valid if all value bits are accounted for in the
+	 * mask. If so, replace the masked bits with the override value.
+	 */
+	if (oval == (oval & override->mask)) {
+		val &= ~override->mask;
+		val |= oval;
+	}
+
+	/* Extract the field from the updated value */
+	return cpuid_feature_extract_unsigned_field(val, feat);
+}
+
+static inline bool arm64_test_sw_feature_override(int feat)
+{
+	/*
+	 * Software features are pseudo CPU features that have no underlying
+	 * CPUID system register value to apply the override to.
+	 */
+	return arm64_apply_feature_override(0, feat, 4,
+					    &arm64_sw_feature_override);
+}
+
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b95374387946..889411f87e90 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2034,14 +2034,7 @@ static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
 static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
 			  int __unused)
 {
-	u64 val;
-
-	val = read_sysreg(id_aa64mmfr1_el1);
-	if (!cpuid_feature_extract_unsigned_field(val, ID_AA64MMFR1_EL1_VH_SHIFT))
-		return false;
-
-	val = arm64_sw_feature_override.val & arm64_sw_feature_override.mask;
-	return cpuid_feature_extract_unsigned_field(val, ARM64_SW_FEATURE_OVERRIDE_HVHE);
+	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
 }
 
 #ifdef CONFIG_ARM64_PAN
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 28/41] arm64: kaslr: Use feature override instead of parsing the cmdline again
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (26 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 27/41] arm64: cpufeature: Add helper to test for CPU feature overrides Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 29/41] arm64: idreg-override: Create a pseudo feature for rodata=off Ard Biesheuvel
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The early kaslr code open codes the detection of 'nokaslr' on the kernel
command line, and this is no longer necessary now that the feature
detection code, which also looks for the same string, executes before
this code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h |  5 ++
 arch/arm64/kernel/kaslr.c           |  4 +-
 arch/arm64/kernel/pi/kaslr_early.c  | 53 +-------------------
 3 files changed, 7 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index a41cb0bc220b..9e8c3e345951 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -948,6 +948,11 @@ static inline bool arm64_test_sw_feature_override(int feat)
 					    &arm64_sw_feature_override);
 }
 
+static inline bool kaslr_disabled_cmdline(void)
+{
+	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_NOKASLR);
+}
+
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 12c7f3c8ba76..1da3e25f9d9e 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -16,9 +16,7 @@ bool __ro_after_init __kaslr_is_enabled = false;
 
 void __init kaslr_init(void)
 {
-	if (cpuid_feature_extract_unsigned_field(arm64_sw_feature_override.val &
-						 arm64_sw_feature_override.mask,
-						 ARM64_SW_FEATURE_OVERRIDE_NOKASLR)) {
+	if (kaslr_disabled_cmdline()) {
 		pr_info("KASLR disabled on command line\n");
 		return;
 	}
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index 167081b30a15..f2305e276ec3 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -16,57 +16,6 @@
 #include <asm/memory.h>
 #include <asm/pgtable.h>
 
-/* taken from lib/string.c */
-static char *__init __strstr(const char *s1, const char *s2)
-{
-	size_t l1, l2;
-
-	l2 = strlen(s2);
-	if (!l2)
-		return (char *)s1;
-	l1 = strlen(s1);
-	while (l1 >= l2) {
-		l1--;
-		if (!memcmp(s1, s2, l2))
-			return (char *)s1;
-		s1++;
-	}
-	return NULL;
-}
-static bool __init cmdline_contains_nokaslr(const u8 *cmdline)
-{
-	const u8 *str;
-
-	str = __strstr(cmdline, "nokaslr");
-	return str == cmdline || (str > cmdline && *(str - 1) == ' ');
-}
-
-static bool __init is_kaslr_disabled_cmdline(void *fdt)
-{
-	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
-		int node;
-		const u8 *prop;
-
-		node = fdt_path_offset(fdt, "/chosen");
-		if (node < 0)
-			goto out;
-
-		prop = fdt_getprop(fdt, node, "bootargs", NULL);
-		if (!prop)
-			goto out;
-
-		if (cmdline_contains_nokaslr(prop))
-			return true;
-
-		if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
-			goto out;
-
-		return false;
-	}
-out:
-	return cmdline_contains_nokaslr(CONFIG_CMDLINE);
-}
-
 static u64 __init get_kaslr_seed(void *fdt)
 {
 	static char const chosen_str[] __initconst = "chosen";
@@ -92,7 +41,7 @@ asmlinkage u64 __init kaslr_early_init(void *fdt)
 {
 	u64 seed, range;
 
-	if (is_kaslr_disabled_cmdline(fdt))
+	if (kaslr_disabled_cmdline())
 		return 0;
 
 	seed = get_kaslr_seed(fdt);
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 29/41] arm64: idreg-override: Create a pseudo feature for rodata=off
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (27 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 28/41] arm64: kaslr: Use feature override instead of parsing the cmdline again Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 30/41] arm64: Add helpers to probe local CPU for PAC and BTI support Ard Biesheuvel
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add rodata=off to the set of kernel command line options that is parsed
early using the CPU feature override detection code, so we can easily
refer to it when creating the kernel mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h   | 1 +
 arch/arm64/kernel/pi/idreg-override.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 9e8c3e345951..5e71d5ebe918 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -17,6 +17,7 @@
 
 #define ARM64_SW_FEATURE_OVERRIDE_NOKASLR	0
 #define ARM64_SW_FEATURE_OVERRIDE_HVHE		4
+#define ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF	8
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index f9e05c10faab..e4bcabcc6860 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -159,6 +159,7 @@ static const struct ftr_set_desc sw_features __prel64_initconst = {
 	.fields		= {
 		FIELD("nokaslr", ARM64_SW_FEATURE_OVERRIDE_NOKASLR, NULL),
 		FIELD("hvhe", ARM64_SW_FEATURE_OVERRIDE_HVHE, hvhe_filter),
+		FIELD("rodataoff", ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF, NULL),
 		{}
 	},
 };
@@ -190,6 +191,7 @@ static const struct {
 	{ "arm64.nomops",		"id_aa64isar2.mops=0" },
 	{ "arm64.nomte",		"id_aa64pfr1.mte=0" },
 	{ "nokaslr",			"arm64_sw.nokaslr=1" },
+	{ "rodata=off",			"arm64_sw.rodataoff=1" },
 };
 
 static int __init parse_hexdigit(const char *p, u64 *v)
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 30/41] arm64: Add helpers to probe local CPU for PAC and BTI support
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (28 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 29/41] arm64: idreg-override: Create a pseudo feature for rodata=off Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 31/41] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add some helpers that will be used by the early kernel mapping code to
check feature support on the local CPU. This permits the early kernel
mapping to be created with the right attributes, removing the need for
tearing it down and recreating it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 32 ++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 5e71d5ebe918..33b7749313c5 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -957,6 +957,38 @@ static inline bool kaslr_disabled_cmdline(void)
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
+static inline bool cpu_has_bti(void)
+{
+	if (!IS_ENABLED(CONFIG_ARM64_BTI))
+		return false;
+
+	return arm64_apply_feature_override(read_cpuid(ID_AA64PFR1_EL1),
+					    ID_AA64PFR1_EL1_BT_SHIFT, 4,
+					    &id_aa64pfr1_override);
+}
+
+static inline bool cpu_has_pac(void)
+{
+	u64 isar1, isar2;
+
+	if (!IS_ENABLED(CONFIG_ARM64_PTR_AUTH))
+		return false;
+
+	isar1 = read_cpuid(ID_AA64ISAR1_EL1);
+	isar2 = read_cpuid(ID_AA64ISAR2_EL1);
+
+	if (arm64_apply_feature_override(isar1, ID_AA64ISAR1_EL1_APA_SHIFT, 4,
+					 &id_aa64isar1_override))
+		return true;
+
+	if (arm64_apply_feature_override(isar1, ID_AA64ISAR1_EL1_API_SHIFT, 4,
+					 &id_aa64isar1_override))
+		return true;
+
+	return arm64_apply_feature_override(isar2, ID_AA64ISAR2_EL1_APA3_SHIFT, 4,
+					    &id_aa64isar2_override);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 31/41] arm64: head: allocate more pages for the kernel mapping
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (29 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 30/41] arm64: Add helpers to probe local CPU for PAC and BTI support Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 32/41] arm64: head: move memstart_offset_seed handling to C code Ard Biesheuvel
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In preparation for switching to an early kernel mapping routine that
maps each segment according to its precise boundaries, and with the
correct attributes, let's allocate some extra pages for page tables for
the 4k page size configuration. This is necessary because the start and
end of each segment may not be aligned to the block size, and so we'll
need an extra page table at each segment boundary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 83ddb14b95a5..0631604995ee 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -68,7 +68,7 @@
 			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
 			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
 			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE))
+#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE) + EARLY_SEGMENT_EXTRA_PAGES))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
 #if VA_BITS < 48
@@ -89,6 +89,15 @@
 #define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+/* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
+#define KERNEL_SEGMENT_COUNT	5
+
+#if SWAPPER_BLOCK_SIZE > SEGMENT_ALIGN
+#define EARLY_SEGMENT_EXTRA_PAGES (KERNEL_SEGMENT_COUNT + 1)
+#else
+#define EARLY_SEGMENT_EXTRA_PAGES 0
+#endif
+
 /*
  * Initial memory map attributes.
  */
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 32/41] arm64: head: move memstart_offset_seed handling to C code
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (30 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 31/41] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 33/41] arm64: mm: Make kaslr_requires_kpti() a static inline Ard Biesheuvel
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that we can set BSS variables from the early code running from the
ID map, we can set memstart_offset_seed directly from the C code that
derives the value instead of passing it back and forth between C and asm
code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S           | 7 -------
 arch/arm64/kernel/image-vars.h     | 1 +
 arch/arm64/kernel/pi/kaslr_early.c | 4 ++++
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b320702032a7..aa7766dc64d9 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -82,7 +82,6 @@
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
 	 *  x23        __primary_switch()                       physical misalignment/KASLR offset
-	 *  x24        __primary_switch()                       linear map KASLR seed
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
 	 */
@@ -483,11 +482,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	str	x25, [x8]			// ... observes the correct value
 	dc	civac, x8			// Make visible to booting secondaries
 #endif
-
-#ifdef CONFIG_RANDOMIZE_BASE
-	adrp	x5, memstart_offset_seed	// Save KASLR linear map seed
-	strh	w24, [x5, :lo12:memstart_offset_seed]
-#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
@@ -779,7 +773,6 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 #ifdef CONFIG_RANDOMIZE_BASE
 	mov	x0, x22
 	bl	__pi_kaslr_early_init
-	and	x24, x0, #SZ_2M - 1		// capture memstart offset seed
 	bic	x0, x0, #SZ_2M - 1
 	orr	x23, x23, x0			// record kernel offset
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index eacc3d167733..8d96052079e8 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -46,6 +46,7 @@ PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
 PROVIDE(__pi_id_aa64zfr0_override	= id_aa64zfr0_override);
 PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
 PROVIDE(__pi__ctype			= _ctype);
+PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
 #ifdef CONFIG_KVM
 
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index f2305e276ec3..eeecee7ffd6f 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -16,6 +16,8 @@
 #include <asm/memory.h>
 #include <asm/pgtable.h>
 
+extern u16 memstart_offset_seed;
+
 static u64 __init get_kaslr_seed(void *fdt)
 {
 	static char const chosen_str[] __initconst = "chosen";
@@ -51,6 +53,8 @@ asmlinkage u64 __init kaslr_early_init(void *fdt)
 			return 0;
 	}
 
+	memstart_offset_seed = seed & U16_MAX;
+
 	/*
 	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
 	 * kernel image offset from the seed. Let's place the kernel in the
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 33/41] arm64: mm: Make kaslr_requires_kpti() a static inline
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (31 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 32/41] arm64: head: move memstart_offset_seed handling to C code Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 34/41] arm64: mmu: Make __cpu_replace_ttbr1() out of line Ard Biesheuvel
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In preparation for moving the first assignment of arm64_use_ng_mappings
to an earlier stage in the boot, ensure that kaslr_requires_kpti() is
accessible without relying on the core kernel's view on whether or not
KASLR is enabled. So make it a static inline, and move the
kaslr_enabled() check out of it and into the callers, one of which will
disappear in a subsequent patch.

Once/when support for the obsolete ThunderX 1 platform is dropped, this
check reduces to a E0PD feature check on the local CPU.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu.h   | 38 +++++++++++++++++-
 arch/arm64/kernel/cpufeature.c | 42 +-------------------
 arch/arm64/kernel/setup.c      |  2 +-
 3 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 2fcf51231d6e..d0b8b4b413b6 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -71,7 +71,43 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
 extern void mark_linear_text_alias_ro(void);
-extern bool kaslr_requires_kpti(void);
+
+/*
+ * This check is triggered during the early boot before the cpufeature
+ * is initialised. Checking the status on the local CPU allows the boot
+ * CPU to detect the need for non-global mappings and thus avoiding a
+ * pagetable re-write after all the CPUs are booted. This check will be
+ * anyway run on individual CPUs, allowing us to get the consistent
+ * state once the SMP CPUs are up and thus make the switch to non-global
+ * mappings if required.
+ */
+static inline bool kaslr_requires_kpti(void)
+{
+	/*
+	 * E0PD does a similar job to KPTI so can be used instead
+	 * where available.
+	 */
+	if (IS_ENABLED(CONFIG_ARM64_E0PD)) {
+		u64 mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+		if (cpuid_feature_extract_unsigned_field(mmfr2,
+						ID_AA64MMFR2_EL1_E0PD_SHIFT))
+			return false;
+	}
+
+	/*
+	 * Systems affected by Cavium erratum 24756 are incompatible
+	 * with KPTI.
+	 */
+	if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
+		extern const struct midr_range cavium_erratum_27456_cpus[];
+
+		if (is_midr_in_range_list(read_cpuid_id(),
+					  cavium_erratum_27456_cpus))
+			return false;
+	}
+
+	return true;
+}
 
 #define INIT_MM_CONTEXT(name)	\
 	.pgd = init_pg_dir,
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 889411f87e90..39f5df0773cb 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1649,46 +1649,6 @@ has_useable_cnp(const struct arm64_cpu_capabilities *entry, int scope)
 	return has_cpuid_feature(entry, scope);
 }
 
-/*
- * This check is triggered during the early boot before the cpufeature
- * is initialised. Checking the status on the local CPU allows the boot
- * CPU to detect the need for non-global mappings and thus avoiding a
- * pagetable re-write after all the CPUs are booted. This check will be
- * anyway run on individual CPUs, allowing us to get the consistent
- * state once the SMP CPUs are up and thus make the switch to non-global
- * mappings if required.
- */
-bool kaslr_requires_kpti(void)
-{
-	if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
-		return false;
-
-	/*
-	 * E0PD does a similar job to KPTI so can be used instead
-	 * where available.
-	 */
-	if (IS_ENABLED(CONFIG_ARM64_E0PD)) {
-		u64 mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
-		if (cpuid_feature_extract_unsigned_field(mmfr2,
-						ID_AA64MMFR2_EL1_E0PD_SHIFT))
-			return false;
-	}
-
-	/*
-	 * Systems affected by Cavium erratum 24756 are incompatible
-	 * with KPTI.
-	 */
-	if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
-		extern const struct midr_range cavium_erratum_27456_cpus[];
-
-		if (is_midr_in_range_list(read_cpuid_id(),
-					  cavium_erratum_27456_cpus))
-			return false;
-	}
-
-	return kaslr_enabled();
-}
-
 static bool __meltdown_safe = true;
 static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */
 
@@ -1741,7 +1701,7 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
 	}
 
 	/* Useful for KASLR robustness */
-	if (kaslr_requires_kpti()) {
+	if (kaslr_enabled() && kaslr_requires_kpti()) {
 		if (!__kpti_forced) {
 			str = "KASLR";
 			__kpti_forced = 1;
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 4b0b3515ee20..c2d6852a4e0c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -288,7 +288,7 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 	 * mappings from the start, avoiding the cost of rewriting
 	 * everything later.
 	 */
-	arm64_use_ng_mappings = kaslr_requires_kpti();
+	arm64_use_ng_mappings = kaslr_enabled() && kaslr_requires_kpti();
 
 	early_fixmap_init();
 	early_ioremap_init();
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 34/41] arm64: mmu: Make __cpu_replace_ttbr1() out of line
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (32 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 33/41] arm64: mm: Make kaslr_requires_kpti() a static inline Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 35/41] arm64: head: Move early kernel mapping routines into C code Ard Biesheuvel
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

__cpu_replace_ttbr1() is a static inline, and so it gets instantiated
wherever it is used. This is not really necessary, as it is never called
on a hot path. It also has the unfortunate side effect that the symbol
idmap_cpu_replace_ttbr1 may never be referenced from kCFI enabled C
code, and this means the type id symbol may not exist either.  This will
result in a build error once we start referring to this symbol from asm
code as well. (Note that this problem only occurs when CnP, KAsan and
suspend/resume are all disabled in the Kconfig but that is a valid
config, if unusual).

So let's just move it out of line so all callers will share the same
implementation, which will reference idmap_cpu_replace_ttbr1
unconditionally.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 32 +-------------------
 arch/arm64/mm/mmu.c                  | 32 ++++++++++++++++++++
 2 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 9ce4200508b1..926fbbcecbe0 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -148,37 +148,7 @@ static inline void cpu_install_ttbr0(phys_addr_t ttbr0, unsigned long t0sz)
 	isb();
 }
 
-/*
- * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
- * avoiding the possibility of conflicting TLB entries being allocated.
- */
-static inline void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
-{
-	typedef void (ttbr_replace_func)(phys_addr_t);
-	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
-	ttbr_replace_func *replace_phys;
-	unsigned long daif;
-
-	/* phys_to_ttbr() zeros lower 2 bits of ttbr with 52-bit PA */
-	phys_addr_t ttbr1 = phys_to_ttbr(virt_to_phys(pgdp));
-
-	if (cnp)
-		ttbr1 |= TTBR_CNP_BIT;
-
-	replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
-
-	__cpu_install_idmap(idmap);
-
-	/*
-	 * We really don't want to take *any* exceptions while TTBR1 is
-	 * in the process of being replaced so mask everything.
-	 */
-	daif = local_daif_save();
-	replace_phys(ttbr1);
-	local_daif_restore(daif);
-
-	cpu_uninstall_idmap();
-}
+void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp);
 
 static inline void cpu_enable_swapper_cnp(void)
 {
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 03c73e9197ac..22104febebf2 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1483,3 +1483,35 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte
 {
 	set_pte_at(vma->vm_mm, addr, ptep, pte);
 }
+
+/*
+ * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
+ * avoiding the possibility of conflicting TLB entries being allocated.
+ */
+void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
+{
+	typedef void (ttbr_replace_func)(phys_addr_t);
+	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
+	ttbr_replace_func *replace_phys;
+	unsigned long daif;
+
+	/* phys_to_ttbr() zeros lower 2 bits of ttbr with 52-bit PA */
+	phys_addr_t ttbr1 = phys_to_ttbr(virt_to_phys(pgdp));
+
+	if (cnp)
+		ttbr1 |= TTBR_CNP_BIT;
+
+	replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
+
+	__cpu_install_idmap(idmap);
+
+	/*
+	 * We really don't want to take *any* exceptions while TTBR1 is
+	 * in the process of being replaced so mask everything.
+	 */
+	daif = local_daif_save();
+	replace_phys(ttbr1);
+	local_daif_restore(daif);
+
+	cpu_uninstall_idmap();
+}
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 35/41] arm64: head: Move early kernel mapping routines into C code
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (33 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 34/41] arm64: mmu: Make __cpu_replace_ttbr1() out of line Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 36/41] arm64: mm: Use 48-bit virtual addressing for the permanent ID map Ard Biesheuvel
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The asm version of the kernel mapping code works fine for creating a
coarse grained identity map, but for mapping the kernel down to its
exact boundaries with the right attributes, it is not suitable. This is
why we create a preliminary RWX kernel mapping first, and then rebuild
it from scratch later on.

So let's reimplement this in C, in a way that will make it unnecessary
to create the kernel page tables yet another time in paging_init().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/archrandom.h   |   2 -
 arch/arm64/include/asm/scs.h          |  32 +---
 arch/arm64/kernel/head.S              |  52 +------
 arch/arm64/kernel/image-vars.h        |  19 +++
 arch/arm64/kernel/pi/Makefile         |   1 +
 arch/arm64/kernel/pi/idreg-override.c |  22 ++-
 arch/arm64/kernel/pi/kaslr_early.c    |  12 +-
 arch/arm64/kernel/pi/map_kernel.c     | 164 ++++++++++++++++++++
 arch/arm64/kernel/pi/map_range.c      |  88 +++++++++++
 arch/arm64/kernel/pi/patch-scs.c      |  16 +-
 arch/arm64/kernel/pi/pi.h             |  14 ++
 arch/arm64/kernel/pi/relocate.c       |   2 +
 arch/arm64/kernel/setup.c             |   7 -
 arch/arm64/kernel/vmlinux.lds.S       |   4 +-
 arch/arm64/mm/proc.S                  |   1 +
 15 files changed, 315 insertions(+), 121 deletions(-)

diff --git a/arch/arm64/include/asm/archrandom.h b/arch/arm64/include/asm/archrandom.h
index ecdb3cfcd0f8..8babfbe31f95 100644
--- a/arch/arm64/include/asm/archrandom.h
+++ b/arch/arm64/include/asm/archrandom.h
@@ -129,6 +129,4 @@ static inline bool __init __early_cpu_has_rndr(void)
 	return (ftr >> ID_AA64ISAR0_EL1_RNDR_SHIFT) & 0xf;
 }
 
-u64 kaslr_early_init(void *fdt);
-
 #endif /* _ASM_ARCHRANDOM_H */
diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
index eca2ba5a6276..2e010ea76be2 100644
--- a/arch/arm64/include/asm/scs.h
+++ b/arch/arm64/include/asm/scs.h
@@ -33,37 +33,11 @@
 #include <asm/cpufeature.h>
 
 #ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
-static inline bool should_patch_pac_into_scs(void)
-{
-	u64 reg;
-
-	/*
-	 * We only enable the shadow call stack dynamically if we are running
-	 * on a system that does not implement PAC or BTI. PAC and SCS provide
-	 * roughly the same level of protection, and BTI relies on the PACIASP
-	 * instructions serving as landing pads, preventing us from patching
-	 * those instructions into something else.
-	 */
-	reg = read_sysreg_s(SYS_ID_AA64ISAR1_EL1);
-	if (SYS_FIELD_GET(ID_AA64ISAR1_EL1, APA, reg) |
-	    SYS_FIELD_GET(ID_AA64ISAR1_EL1, API, reg))
-		return false;
-
-	reg = read_sysreg_s(SYS_ID_AA64ISAR2_EL1);
-	if (SYS_FIELD_GET(ID_AA64ISAR2_EL1, APA3, reg))
-		return false;
-
-	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) {
-		reg = read_sysreg_s(SYS_ID_AA64PFR1_EL1);
-		if (reg & (0xf << ID_AA64PFR1_EL1_BT_SHIFT))
-			return false;
-	}
-	return true;
-}
-
 static inline void dynamic_scs_init(void)
 {
-	if (should_patch_pac_into_scs()) {
+	extern bool __pi_dynamic_scs_is_enabled;
+
+	if (__pi_dynamic_scs_is_enabled) {
 		pr_info("Enabling dynamic shadow call stack\n");
 		static_branch_enable(&dynamic_scs_enabled);
 	}
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index aa7766dc64d9..ffacce7b5a02 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -81,7 +81,6 @@
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
-	 *  x23        __primary_switch()                       physical misalignment/KASLR offset
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
 	 */
@@ -408,24 +407,6 @@ SYM_FUNC_START_LOCAL(create_idmap)
 0:	ret	x28
 SYM_FUNC_END(create_idmap)
 
-SYM_FUNC_START_LOCAL(create_kernel_mapping)
-	adrp	x0, init_pg_dir
-	mov_q	x5, KIMAGE_VADDR		// compile time __va(_text)
-#ifdef CONFIG_RELOCATABLE
-	add	x5, x5, x23			// add KASLR displacement
-#endif
-	adrp	x6, _end			// runtime __pa(_end)
-	adrp	x3, _text			// runtime __pa(_text)
-	sub	x6, x6, x3			// _end - _text
-	add	x6, x6, x5			// runtime __va(_end)
-	mov_q	x7, SWAPPER_RW_MMUFLAGS
-
-	map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
-
-	dsb	ishst				// sync with page table walker
-	ret
-SYM_FUNC_END(create_kernel_mapping)
-
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
 	 *
@@ -752,44 +733,13 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
 
-	// Clear BSS
-	adrp	x0, __bss_start
-	mov	x1, xzr
-	adrp	x2, init_pg_end
-	sub	x2, x2, x0
-	bl	__pi_memset
-	dsb	ishst				// Make zero page visible to PTW
-
 	adrp	x1, early_init_stack
 	mov	sp, x1
 	mov	x29, xzr
 	mov	x0, x20				// pass the full boot status
 	mov	x1, x22				// pass the low FDT mapping
-	bl	__pi_init_feature_override	// Parse cpu feature overrides
-
-#ifdef CONFIG_RELOCATABLE
-	adrp	x23, KERNEL_START
-	and	x23, x23, MIN_KIMG_ALIGN - 1
-#ifdef CONFIG_RANDOMIZE_BASE
-	mov	x0, x22
-	bl	__pi_kaslr_early_init
-	bic	x0, x0, #SZ_2M - 1
-	orr	x23, x23, x0			// record kernel offset
-#endif
-#endif
-	bl	create_kernel_mapping
+	bl	__pi_early_map_kernel		// Map and relocate the kernel
 
-	adrp	x1, init_pg_dir
-	load_ttbr1 x1, x1, x2
-#ifdef CONFIG_RELOCATABLE
-	mov	x0, x23
-	bl	__pi_relocate_kernel
-#endif
-#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
-	ldr	x0, =__eh_frame_start
-	ldr	x1, =__eh_frame_end
-	bl	__pi_scs_patch_vmlinux
-#endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
 	br	x8
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 8d96052079e8..e566b32f9c22 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -45,9 +45,28 @@ PROVIDE(__pi_id_aa64pfr1_override	= id_aa64pfr1_override);
 PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
 PROVIDE(__pi_id_aa64zfr0_override	= id_aa64zfr0_override);
 PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
+PROVIDE(__pi_arm64_use_ng_mappings	= arm64_use_ng_mappings);
+#ifdef CONFIG_CAVIUM_ERRATUM_27456
+PROVIDE(__pi_cavium_erratum_27456_cpus	= cavium_erratum_27456_cpus);
+#endif
 PROVIDE(__pi__ctype			= _ctype);
 PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
+PROVIDE(__pi_init_pg_dir		= init_pg_dir);
+PROVIDE(__pi_init_pg_end		= init_pg_end);
+
+PROVIDE(__pi__text			= _text);
+PROVIDE(__pi__stext               	= _stext);
+PROVIDE(__pi__etext               	= _etext);
+PROVIDE(__pi___start_rodata       	= __start_rodata);
+PROVIDE(__pi___inittext_begin     	= __inittext_begin);
+PROVIDE(__pi___inittext_end       	= __inittext_end);
+PROVIDE(__pi___initdata_begin     	= __initdata_begin);
+PROVIDE(__pi___initdata_end       	= __initdata_end);
+PROVIDE(__pi__data                	= _data);
+PROVIDE(__pi___bss_start		= __bss_start);
+PROVIDE(__pi__end			= _end);
+
 #ifdef CONFIG_KVM
 
 /*
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index a8b302245f15..8c2f80a46b93 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -39,6 +39,7 @@ $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
 obj-y					:= idreg-override.pi.o \
+					   map_kernel.pi.o map_range.pi.o \
 					   lib-fdt.pi.o lib-fdt_ro.pi.o
 obj-$(CONFIG_RELOCATABLE)		+= relocate.pi.o
 obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr_early.pi.o
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index e4bcabcc6860..1884bd936c0d 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -308,37 +308,35 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
 	} while (1);
 }
 
-static __init const u8 *get_bootargs_cmdline(const void *fdt)
+static __init const u8 *get_bootargs_cmdline(const void *fdt, int node)
 {
+	static char const bootargs[] __initconst = "bootargs";
 	const u8 *prop;
-	int node;
 
-	node = fdt_path_offset(fdt, "/chosen");
 	if (node < 0)
 		return NULL;
 
-	prop = fdt_getprop(fdt, node, "bootargs", NULL);
+	prop = fdt_getprop(fdt, node, bootargs, NULL);
 	if (!prop)
 		return NULL;
 
 	return strlen(prop) ? prop : NULL;
 }
 
-static __init void parse_cmdline(const void *fdt)
+static __init void parse_cmdline(const void *fdt, int chosen)
 {
-	const u8 *prop = get_bootargs_cmdline(fdt);
+	static char const cmdline[] __initconst = CONFIG_CMDLINE;
+	const u8 *prop = get_bootargs_cmdline(fdt, chosen);
 
 	if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
-		__parse_cmdline(CONFIG_CMDLINE, true);
+		__parse_cmdline(cmdline, true);
 
 	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && prop)
 		__parse_cmdline(prop, true);
 }
 
-/* Keep checkers quiet */
-void init_feature_override(u64 boot_status, const void *fdt);
-
-asmlinkage void __init init_feature_override(u64 boot_status, const void *fdt)
+void __init init_feature_override(u64 boot_status, const void *fdt,
+				  int chosen)
 {
 	struct arm64_ftr_override *override;
 	const struct ftr_set_desc *reg;
@@ -354,7 +352,7 @@ asmlinkage void __init init_feature_override(u64 boot_status, const void *fdt)
 
 	__boot_status = boot_status;
 
-	parse_cmdline(fdt);
+	parse_cmdline(fdt, chosen);
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
 		reg = prel64_pointer(regs[i].reg);
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index eeecee7ffd6f..0257b43819db 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -16,17 +16,17 @@
 #include <asm/memory.h>
 #include <asm/pgtable.h>
 
+#include "pi.h"
+
 extern u16 memstart_offset_seed;
 
-static u64 __init get_kaslr_seed(void *fdt)
+static u64 __init get_kaslr_seed(void *fdt, int node)
 {
-	static char const chosen_str[] __initconst = "chosen";
 	static char const seed_str[] __initconst = "kaslr-seed";
-	int node, len;
 	fdt64_t *prop;
 	u64 ret;
+	int len;
 
-	node = fdt_path_offset(fdt, chosen_str);
 	if (node < 0)
 		return 0;
 
@@ -39,14 +39,14 @@ static u64 __init get_kaslr_seed(void *fdt)
 	return ret;
 }
 
-asmlinkage u64 __init kaslr_early_init(void *fdt)
+u64 __init kaslr_early_init(void *fdt, int chosen)
 {
 	u64 seed, range;
 
 	if (kaslr_disabled_cmdline())
 		return 0;
 
-	seed = get_kaslr_seed(fdt);
+	seed = get_kaslr_seed(fdt, chosen);
 	if (!seed) {
 		if (!__early_cpu_has_rndr() ||
 		    !__arm64_rndr((unsigned long *)&seed))
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
new file mode 100644
index 000000000000..f206373b28b0
--- /dev/null
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+#include <linux/init.h>
+#include <linux/libfdt.h>
+#include <linux/linkage.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+
+#include <asm/memory.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+#include "pi.h"
+
+extern const u8 __eh_frame_start[], __eh_frame_end[];
+
+extern void idmap_cpu_replace_ttbr1(void *pgdir);
+
+static void __init map_segment(pgd_t *pg_dir, u64 *pgd, u64 va_offset,
+			       void *start, void *end, pgprot_t prot,
+			       bool may_use_cont, int root_level)
+{
+	map_range(pgd, ((u64)start + va_offset) & ~PAGE_OFFSET,
+		  ((u64)end + va_offset) & ~PAGE_OFFSET, (u64)start,
+		  prot, root_level, (pte_t *)pg_dir, may_use_cont, 0);
+}
+
+static void __init unmap_segment(pgd_t *pg_dir, u64 va_offset, void *start,
+				 void *end, int root_level)
+{
+	map_segment(pg_dir, NULL, va_offset, start, end, __pgprot(0),
+		    false, root_level);
+}
+
+static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
+{
+	bool enable_scs = IS_ENABLED(CONFIG_UNWIND_PATCH_PAC_INTO_SCS);
+	bool twopass = IS_ENABLED(CONFIG_RELOCATABLE);
+	u64 pgdp = (u64)init_pg_dir + PAGE_SIZE;
+	pgprot_t text_prot = PAGE_KERNEL_ROX;
+	pgprot_t data_prot = PAGE_KERNEL;
+	pgprot_t prot;
+
+	/*
+	 * External debuggers may need to write directly to the text mapping to
+	 * install SW breakpoints. Allow this (only) when explicitly requested
+	 * with rodata=off.
+	 */
+	if (arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF))
+		text_prot = PAGE_KERNEL_EXEC;
+
+	/*
+	 * We only enable the shadow call stack dynamically if we are running
+	 * on a system that does not implement PAC or BTI. PAC and SCS provide
+	 * roughly the same level of protection, and BTI relies on the PACIASP
+	 * instructions serving as landing pads, preventing us from patching
+	 * those instructions into something else.
+	 */
+	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) && cpu_has_pac())
+		enable_scs = false;
+
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && cpu_has_bti()) {
+		enable_scs = false;
+
+		/*
+		 * If we have a CPU that supports BTI and a kernel built for
+		 * BTI then mark the kernel executable text as guarded pages
+		 * now so we don't have to rewrite the page tables later.
+		 */
+		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
+	}
+
+	/* Map all code read-write on the first pass if needed */
+	twopass |= enable_scs;
+	prot = twopass ? data_prot : text_prot;
+
+	map_segment(init_pg_dir, &pgdp, va_offset, _stext, _etext, prot,
+		    !twopass, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, __start_rodata,
+		    __inittext_begin, data_prot, false, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, __inittext_begin,
+		    __inittext_end, prot, false, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, __initdata_begin,
+		    __initdata_end, data_prot, false, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, _data, _end, data_prot,
+		    true, root_level);
+	dsb(ishst);
+
+	idmap_cpu_replace_ttbr1(init_pg_dir);
+
+	if (twopass) {
+		if (IS_ENABLED(CONFIG_RELOCATABLE))
+			relocate_kernel(kaslr_offset);
+
+		if (enable_scs) {
+			scs_patch(__eh_frame_start + va_offset,
+				  __eh_frame_end - __eh_frame_start);
+			asm("ic ialluis");
+
+			dynamic_scs_is_enabled = true;
+		}
+
+		/*
+		 * Unmap the text region before remapping it, to avoid
+		 * potential TLB conflicts when creating the contiguous
+		 * descriptors.
+		 */
+		unmap_segment(init_pg_dir, va_offset, _stext, _etext,
+			      root_level);
+		dsb(ishst);
+		isb();
+		__tlbi(vmalle1);
+		isb();
+
+		/*
+		 * Remap these segments with different permissions
+		 * No new page table allocations should be needed
+		 */
+		map_segment(init_pg_dir, NULL, va_offset, _stext, _etext,
+			    text_prot, true, root_level);
+		map_segment(init_pg_dir, NULL, va_offset, __inittext_begin,
+			    __inittext_end, text_prot, false, root_level);
+		dsb(ishst);
+	}
+}
+
+asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
+{
+	static char const chosen_str[] __initconst = "/chosen";
+	u64 va_base, pa_base = (u64)&_text;
+	u64 kaslr_offset = pa_base % MIN_KIMG_ALIGN;
+	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
+	int chosen;
+
+	/* Clear BSS and the initial page tables */
+	memset(__bss_start, 0, (u64)init_pg_end - (u64)__bss_start);
+
+	/* Parse the command line for CPU feature overrides */
+	chosen = fdt_path_offset(fdt, chosen_str);
+	init_feature_override(boot_status, fdt, chosen);
+
+	/*
+	 * The virtual KASLR displacement modulo 2MiB is decided by the
+	 * physical placement of the image, as otherwise, we might not be able
+	 * to create the early kernel mapping using 2 MiB block descriptors. So
+	 * take the low bits of the KASLR offset from the physical address, and
+	 * fill in the high bits from the seed.
+	 */
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		u64 kaslr_seed = kaslr_early_init(fdt, chosen);
+
+		if (kaslr_seed && kaslr_requires_kpti())
+			arm64_use_ng_mappings = true;
+
+		kaslr_offset |= kaslr_seed & ~(MIN_KIMG_ALIGN - 1);
+	}
+
+	va_base = KIMAGE_VADDR + kaslr_offset;
+	map_kernel(kaslr_offset, va_base - pa_base, root_level);
+}
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
new file mode 100644
index 000000000000..c31feda18f47
--- /dev/null
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+#include <linux/types.h>
+#include <linux/sizes.h>
+
+#include <asm/memory.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+#include "pi.h"
+
+/**
+ * map_range - Map a contiguous range of physical pages into virtual memory
+ *
+ * @pte:		Address of physical pointer to array of pages to
+ *			allocate page tables from
+ * @start:		Virtual address of the start of the range
+ * @end:		Virtual address of the end of the range (exclusive)
+ * @pa:			Physical address of the start of the range
+ * @prot:		Access permissions of the range
+ * @level:		Translation level for the mapping
+ * @tbl:		The level @level page table to create the mappings in
+ * @may_use_cont:	Whether the use of the contiguous attribute is allowed
+ * @va_offset:		Offset between a physical page and its current mapping
+ * 			in the VA space
+ */
+void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
+		      int level, pte_t *tbl, bool may_use_cont, u64 va_offset)
+{
+	u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 : U64_MAX;
+	u64 protval = pgprot_val(prot) & ~PTE_TYPE_MASK;
+	int lshift = (3 - level) * (PAGE_SHIFT - 3);
+	u64 lmask = (PAGE_SIZE << lshift) - 1;
+
+	start	&= PAGE_MASK;
+	pa	&= PAGE_MASK;
+
+	/* Advance tbl to the entry that covers start */
+	tbl += (start >> (lshift + PAGE_SHIFT)) % PTRS_PER_PTE;
+
+	/*
+	 * Set the right block/page bits for this level unless we are
+	 * clearing the mapping
+	 */
+	if (protval)
+		protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
+
+	while (start < end) {
+		u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
+
+		if (level < 3 && (start | next | pa) & lmask) {
+			/*
+			 * This chunk needs a finer grained mapping. Create a
+			 * table mapping if necessary and recurse.
+			 */
+			if (pte_none(*tbl)) {
+				*tbl = __pte(__phys_to_pte_val(*pte) |
+					     PMD_TYPE_TABLE | PMD_TABLE_UXN);
+				*pte += PTRS_PER_PTE * sizeof(pte_t);
+			}
+			map_range(pte, start, next, pa, prot, level + 1,
+				  (pte_t *)(__pte_to_phys(*tbl) + va_offset),
+				  may_use_cont, va_offset);
+		} else {
+			/*
+			 * Start a contiguous range if start and pa are
+			 * suitably aligned
+			 */
+			if (((start | pa) & cmask) == 0 && may_use_cont)
+				protval |= PTE_CONT;
+
+			/*
+			 * Clear the contiguous attribute if the remaining
+			 * range does not cover a contiguous block
+			 */
+			if ((end & ~cmask) <= start)
+				protval &= ~PTE_CONT;
+
+			/* Put down a block or page mapping */
+			*tbl = __pte(__phys_to_pte_val(pa) | protval);
+		}
+		pa += next - start;
+		start = next;
+		tbl++;
+	}
+}
diff --git a/arch/arm64/kernel/pi/patch-scs.c b/arch/arm64/kernel/pi/patch-scs.c
index c65ef40d1e6b..49d8b40e61bc 100644
--- a/arch/arm64/kernel/pi/patch-scs.c
+++ b/arch/arm64/kernel/pi/patch-scs.c
@@ -11,6 +11,10 @@
 
 #include <asm/scs.h>
 
+#include "pi.h"
+
+bool dynamic_scs_is_enabled;
+
 //
 // This minimal DWARF CFI parser is partially based on the code in
 // arch/arc/kernel/unwind.c, and on the document below:
@@ -46,8 +50,6 @@
 #define DW_CFA_GNU_negative_offset_extended 0x2f
 #define DW_CFA_hi_user                      0x3f
 
-extern const u8 __eh_frame_start[], __eh_frame_end[];
-
 enum {
 	PACIASP		= 0xd503233f,
 	AUTIASP		= 0xd50323bf,
@@ -250,13 +252,3 @@ int scs_patch(const u8 eh_frame[], int size)
 	}
 	return 0;
 }
-
-asmlinkage void __init scs_patch_vmlinux(const u8 start[], const u8 end[])
-{
-	if (!should_patch_pac_into_scs())
-		return;
-
-	scs_patch(start, end - start);
-	asm("ic ialluis");
-	isb();
-}
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
index 7c2d9bbf0ff9..d307c58e9741 100644
--- a/arch/arm64/kernel/pi/pi.h
+++ b/arch/arm64/kernel/pi/pi.h
@@ -2,6 +2,8 @@
 // Copyright 2023 Google LLC
 // Author: Ard Biesheuvel <ardb@google.com>
 
+#include <linux/types.h>
+
 #define __prel64_initconst	__section(".init.rodata.prel64")
 
 #define PREL64(type, name)	union { type *name; prel64_t name ## _prel; }
@@ -16,3 +18,15 @@ static inline void *prel64_to_pointer(const prel64_t *offset)
 		return NULL;
 	return (void *)offset + *offset;
 }
+
+extern bool dynamic_scs_is_enabled;
+
+void init_feature_override(u64 boot_status, const void *fdt, int chosen);
+u64 kaslr_early_init(void *fdt, int chosen);
+void relocate_kernel(u64 offset);
+int scs_patch(const u8 eh_frame[], int size);
+
+void map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
+	       int level, pte_t *tbl, bool may_use_cont, u64 va_offset);
+
+asmlinkage void early_map_kernel(u64 boot_status, void *fdt);
diff --git a/arch/arm64/kernel/pi/relocate.c b/arch/arm64/kernel/pi/relocate.c
index 1853408ea76b..2407d2696398 100644
--- a/arch/arm64/kernel/pi/relocate.c
+++ b/arch/arm64/kernel/pi/relocate.c
@@ -7,6 +7,8 @@
 #include <linux/init.h>
 #include <linux/types.h>
 
+#include "pi.h"
+
 extern const Elf64_Rela rela_start[], rela_end[];
 extern const u64 relr_start[], relr_end[];
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c2d6852a4e0c..f75e03b9ad28 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -283,13 +283,6 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
 	kaslr_init();
 
-	/*
-	 * If know now we are going to need KPTI then use non-global
-	 * mappings from the start, avoiding the cost of rewriting
-	 * everything later.
-	 */
-	arm64_use_ng_mappings = kaslr_enabled() && kaslr_requires_kpti();
-
 	early_fixmap_init();
 	early_ioremap_init();
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3afb4223a5e8..755a22d4f840 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -126,9 +126,9 @@ jiffies = jiffies_64;
 #ifdef CONFIG_UNWIND_TABLES
 #define UNWIND_DATA_SECTIONS				\
 	.eh_frame : {					\
-		__eh_frame_start = .;			\
+		__pi___eh_frame_start = .;		\
 		*(.eh_frame)				\
-		__eh_frame_end = .;			\
+		__pi___eh_frame_end = .;		\
 	}
 #else
 #define UNWIND_DATA_SECTIONS
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index f66c37a1610e..7c1bdaf25408 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -195,6 +195,7 @@ SYM_TYPED_FUNC_START(idmap_cpu_replace_ttbr1)
 
 	ret
 SYM_FUNC_END(idmap_cpu_replace_ttbr1)
+SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 	.popsection
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 36/41] arm64: mm: Use 48-bit virtual addressing for the permanent ID map
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (34 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 35/41] arm64: head: Move early kernel mapping routines into C code Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 37/41] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels Ard Biesheuvel
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Even though we support loading kernels anywhere in 48-bit addressable
physical memory, we create the ID maps based on the number of levels
that we happened to configure for the kernel VA and user VA spaces.

The reason for this is that the PGD/PUD/PMD based classification of
translation levels, along with the associated folding when the number of
levels is less than 5, does not permit creating a page table hierarchy
of a set number of levels. This means that, for instance, on 39-bit VA
kernels we need to configure an additional level above PGD level on the
fly, and 36-bit VA kernels still only support 47-bit virtual addressing
with this trick applied.

Now that we have a separate helper to populate page table hierarchies
that does not define the levels in terms of PUDS/PMDS/etc at all, let's
reuse it to create the permanent ID map with a fixed VA size of 48 bits.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h |  3 ++
 arch/arm64/kernel/head.S                |  5 +++
 arch/arm64/kvm/mmu.c                    | 15 +++------
 arch/arm64/mm/mmu.c                     | 32 +++++++++++---------
 arch/arm64/mm/proc.S                    |  9 ++----
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 0631604995ee..742a4b2778f7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -35,6 +35,9 @@
 #define SWAPPER_PGTABLE_LEVELS	(CONFIG_PGTABLE_LEVELS)
 #endif
 
+#define IDMAP_VA_BITS		48
+#define IDMAP_LEVELS		ARM64_HW_PGTABLE_LEVELS(IDMAP_VA_BITS)
+#define IDMAP_ROOT_LEVEL	(4 - IDMAP_LEVELS)
 
 /*
  * A relocatable kernel may execute from an address that differs from the one at
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ffacce7b5a02..a1c29d64e875 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -729,6 +729,11 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 SYM_FUNC_END(__no_granule_support)
 
 SYM_FUNC_START_LOCAL(__primary_switch)
+	mrs		x1, tcr_el1
+	mov		x2, #64 - VA_BITS
+	tcr_set_t0sz	x1, x2
+	msr		tcr_el1, x1
+
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d87c8fcc4c24..234b03e18ff7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1873,16 +1873,9 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	/*
-	 * The ID map may be configured to use an extended virtual address
-	 * range. This is only the case if system RAM is out of range for the
-	 * currently configured page size and VA_BITS_MIN, in which case we will
-	 * also need the extended virtual range for the HYP ID map, or we won't
-	 * be able to enable the EL2 MMU.
-	 *
-	 * However, in some cases the ID map may be configured for fewer than
-	 * the number of VA bits used by the regular kernel stage 1. This
-	 * happens when VA_BITS=52 and the kernel image is placed in PA space
-	 * below 48 bits.
+	 * The ID map is always configured for 48 bits of translation, which
+	 * may be fewer than the number of VA bits used by the regular kernel
+	 * stage 1, when VA_BITS=52.
 	 *
 	 * At EL2, there is only one TTBR register, and we can't switch between
 	 * translation tables *and* update TCR_EL2.T0SZ at the same time. Bottom
@@ -1893,7 +1886,7 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
 	 * 1 VA bits to assure that the hypervisor can both ID map its code page
 	 * and map any kernel memory.
 	 */
-	idmap_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+	idmap_bits = IDMAP_VA_BITS;
 	kernel_bits = vabits_actual;
 	*hyp_va_bits = max(idmap_bits, kernel_bits);
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 22104febebf2..aae32a6c0428 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -754,22 +754,21 @@ static void __init map_kernel(pgd_t *pgdp)
 	kasan_copy_shadow(pgdp);
 }
 
+void __pi_map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
+		    int level, pte_t *tbl, bool may_use_cont, u64 va_offset);
+
+static u8 idmap_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init,
+	  kpti_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init;
+
 static void __init create_idmap(void)
 {
 	u64 start = __pa_symbol(__idmap_text_start);
-	u64 size = __pa_symbol(__idmap_text_end) - start;
-	pgd_t *pgd = idmap_pg_dir;
-	u64 pgd_phys;
-
-	/* check if we need an additional level of translation */
-	if (VA_BITS < 48 && idmap_t0sz < (64 - VA_BITS_MIN)) {
-		pgd_phys = early_pgtable_alloc(PAGE_SHIFT);
-		set_pgd(&idmap_pg_dir[start >> VA_BITS],
-			__pgd(pgd_phys | P4D_TYPE_TABLE));
-		pgd = __va(pgd_phys);
-	}
-	__create_pgd_mapping(pgd, start, start, size, PAGE_KERNEL_ROX,
-			     early_pgtable_alloc, 0);
+	u64 end   = __pa_symbol(__idmap_text_end);
+	u64 ptep  = __pa_symbol(idmap_ptes);
+
+	__pi_map_range(&ptep, start, end, start, PAGE_KERNEL_ROX,
+		       IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false,
+		       __phys_to_virt(ptep) - ptep);
 
 	if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) {
 		extern u32 __idmap_kpti_flag;
@@ -779,8 +778,10 @@ static void __init create_idmap(void)
 		 * The KPTI G-to-nG conversion code needs a read-write mapping
 		 * of its synchronization flag in the ID map.
 		 */
-		__create_pgd_mapping(pgd, pa, pa, sizeof(u32), PAGE_KERNEL,
-				     early_pgtable_alloc, 0);
+		ptep = __pa_symbol(kpti_ptes);
+		__pi_map_range(&ptep, pa, pa + sizeof(u32), pa, PAGE_KERNEL,
+			       IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false,
+			       __phys_to_virt(ptep) - ptep);
 	}
 }
 
@@ -805,6 +806,7 @@ void __init paging_init(void)
 	memblock_allow_resize();
 
 	create_idmap();
+	idmap_t0sz = TCR_T0SZ(IDMAP_VA_BITS);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 7c1bdaf25408..47ede52bb900 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -421,9 +421,9 @@ SYM_FUNC_START(__cpu_setup)
 	mair	.req	x17
 	tcr	.req	x16
 	mov_q	mair, MAIR_EL1_SET
-	mov_q	tcr, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
-			TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
-			TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS | TCR_MTE_FLAGS
+	mov_q	tcr, TCR_T0SZ(IDMAP_VA_BITS) | TCR_T1SZ(VA_BITS) | TCR_CACHE_FLAGS | \
+		     TCR_SMP_FLAGS | TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
+		     TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS | TCR_MTE_FLAGS
 
 	tcr_clear_errata_bits tcr, x9, x5
 
@@ -431,10 +431,7 @@ SYM_FUNC_START(__cpu_setup)
 	sub		x9, xzr, x0
 	add		x9, x9, #64
 	tcr_set_t1sz	tcr, x9
-#else
-	idmap_get_t0sz	x9
 #endif
-	tcr_set_t0sz	tcr, x9
 
 	/*
 	 * Set the IPS bits in TCR_EL1.
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 37/41] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (35 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 36/41] arm64: mm: Use 48-bit virtual addressing for the permanent ID map Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 38/41] arm64: kernel: Create initial ID map from C code Ard Biesheuvel
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The mapping from PGD/PUD/PMD to levels and shifts is very confusing,
given that, due to folding, the shifts may be equal for different
levels, if the macros are even #define'd to begin with.

In a subsequent patch, we will modify the ID mapping code to decouple
the number of levels from the kernel's view of how these types are
folded, so prepare for this by reformulating the macros without the use
of these types.

Instead, use SWAPPER_BLOCK_SHIFT as the base quantity, and derive it
from either PAGE_SHIFT or PMD_SHIFT, which -if defined at all- are
defined unambiguously for a given page size, regardless of the number of
configured levels.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 65 ++++++--------------
 1 file changed, 19 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 742a4b2778f7..f1fc98a233d5 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -13,27 +13,22 @@
 #include <asm/sparsemem.h>
 
 /*
- * The linear mapping and the start of memory are both 2M aligned (per
- * the arm64 booting.txt requirements). Hence we can use section mapping
- * with 4K (section size = 2M) but not with 16K (section size = 32M) or
- * 64K (section size = 512M).
+ * The physical and virtual addresses of the start of the kernel image are
+ * equal modulo 2 MiB (per the arm64 booting.txt requirements). Hence we can
+ * use section mapping with 4K (section size = 2M) but not with 16K (section
+ * size = 32M) or 64K (section size = 512M).
  */
-
-/*
- * The idmap and swapper page tables need some space reserved in the kernel
- * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
- * map the kernel. With the 64K page configuration, swapper and idmap need to
- * map to pte level. The swapper also maps the FDT (see __create_page_tables
- * for more information). Note that the number of ID map translation levels
- * could be increased on the fly if system RAM is out of reach for the default
- * VA range, so pages required to map highest possible PA are reserved in all
- * cases.
- */
-#ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_PGTABLE_LEVELS	(CONFIG_PGTABLE_LEVELS - 1)
+#if defined(PMD_SIZE) && PMD_SIZE <= MIN_KIMG_ALIGN
+#define SWAPPER_BLOCK_SHIFT	PMD_SHIFT
+#define SWAPPER_SKIP_LEVEL	1
 #else
-#define SWAPPER_PGTABLE_LEVELS	(CONFIG_PGTABLE_LEVELS)
+#define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
+#define SWAPPER_SKIP_LEVEL	0
 #endif
+#define SWAPPER_BLOCK_SIZE	(UL(1) << SWAPPER_BLOCK_SHIFT)
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+
+#define SWAPPER_PGTABLE_LEVELS		(CONFIG_PGTABLE_LEVELS - SWAPPER_SKIP_LEVEL)
 
 #define IDMAP_VA_BITS		48
 #define IDMAP_LEVELS		ARM64_HW_PGTABLE_LEVELS(IDMAP_VA_BITS)
@@ -53,24 +48,13 @@
 #define EARLY_ENTRIES(vstart, vend, shift, add) \
 	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
 
-#define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
-
-#if SWAPPER_PGTABLE_LEVELS > 3
-#define EARLY_PUDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PUD_SHIFT, add))
-#else
-#define EARLY_PUDS(vstart, vend, add) (0)
-#endif
+#define EARLY_LEVEL(lvl, vstart, vend, add)	\
+	(SWAPPER_PGTABLE_LEVELS > lvl ? EARLY_ENTRIES(vstart, vend, SWAPPER_BLOCK_SHIFT + lvl * (PAGE_SHIFT - 3), add) : 0)
 
-#if SWAPPER_PGTABLE_LEVELS > 2
-#define EARLY_PMDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, SWAPPER_TABLE_SHIFT, add))
-#else
-#define EARLY_PMDS(vstart, vend, add) (0)
-#endif
-
-#define EARLY_PAGES(vstart, vend, add) ( 1 			/* PGDIR page */				\
-			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
-			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
-			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
+#define EARLY_PAGES(vstart, vend, add) (1 	/* PGDIR page */				\
+	+ EARLY_LEVEL(3, (vstart), (vend), add) /* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(2, (vstart), (vend), add)	/* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(1, (vstart), (vend), add))/* each entry needs a next level page table */
 #define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE) + EARLY_SEGMENT_EXTRA_PAGES))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
@@ -81,17 +65,6 @@
 #endif
 #define INIT_IDMAP_DIR_PAGES	EARLY_PAGES(KIMAGE_VADDR, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE, 1)
 
-/* Initial memory map size */
-#ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_BLOCK_SHIFT	PMD_SHIFT
-#define SWAPPER_BLOCK_SIZE	PMD_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
-#else
-#define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
-#define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
-#endif
-
 /* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
 #define KERNEL_SEGMENT_COUNT	5
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 38/41] arm64: kernel: Create initial ID map from C code
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (36 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 37/41] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 39/41] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.

So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.

Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h      |  14 -
 arch/arm64/include/asm/kernel-pgtable.h |  50 ++--
 arch/arm64/include/asm/mmu_context.h    |   6 +-
 arch/arm64/kernel/head.S                | 267 ++------------------
 arch/arm64/kernel/image-vars.h          |   1 +
 arch/arm64/kernel/pi/Makefile           |   3 +
 arch/arm64/kernel/pi/map_kernel.c       |  18 ++
 arch/arm64/kernel/pi/map_range.c        |  12 +
 arch/arm64/kernel/pi/pi.h               |   4 +
 arch/arm64/mm/mmu.c                     |   5 -
 arch/arm64/mm/proc.S                    |   3 +-
 11 files changed, 88 insertions(+), 295 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 376a980f2bad..beb53bbd8c19 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -345,20 +345,6 @@ alternative_cb_end
 	bfi	\valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
 	.endm
 
-/*
- * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
- *
- * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
- * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
- * this number conveniently equals the number of leading zeroes in
- * the physical address of _end.
- */
-	.macro	idmap_get_t0sz, reg
-	adrp	\reg, _end
-	orr	\reg, \reg, #(1 << VA_BITS_MIN) - 1
-	clz	\reg, \reg
-	.endm
-
 /*
  * tcr_compute_pa_size - set TCR.(I)PS to the highest supported
  * ID_AA64MMFR0_EL1.PARange value
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index f1fc98a233d5..bf05a77873a4 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -29,6 +29,7 @@
 #define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
 
 #define SWAPPER_PGTABLE_LEVELS		(CONFIG_PGTABLE_LEVELS - SWAPPER_SKIP_LEVEL)
+#define INIT_IDMAP_PGTABLE_LEVELS	(IDMAP_LEVELS - SWAPPER_SKIP_LEVEL)
 
 #define IDMAP_VA_BITS		48
 #define IDMAP_LEVELS		ARM64_HW_PGTABLE_LEVELS(IDMAP_VA_BITS)
@@ -48,44 +49,39 @@
 #define EARLY_ENTRIES(vstart, vend, shift, add) \
 	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
 
-#define EARLY_LEVEL(lvl, vstart, vend, add)	\
-	(SWAPPER_PGTABLE_LEVELS > lvl ? EARLY_ENTRIES(vstart, vend, SWAPPER_BLOCK_SHIFT + lvl * (PAGE_SHIFT - 3), add) : 0)
+#define EARLY_LEVEL(lvl, lvls, vstart, vend, add)	\
+	(lvls > lvl ? EARLY_ENTRIES(vstart, vend, SWAPPER_BLOCK_SHIFT + lvl * (PAGE_SHIFT - 3), add) : 0)
 
-#define EARLY_PAGES(vstart, vend, add) (1 	/* PGDIR page */				\
-	+ EARLY_LEVEL(3, (vstart), (vend), add) /* each entry needs a next level page table */	\
-	+ EARLY_LEVEL(2, (vstart), (vend), add)	/* each entry needs a next level page table */	\
-	+ EARLY_LEVEL(1, (vstart), (vend), add))/* each entry needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE) + EARLY_SEGMENT_EXTRA_PAGES))
+#define EARLY_PAGES(lvls, vstart, vend, add) (1 	/* PGDIR page */				\
+	+ EARLY_LEVEL(3, (lvls), (vstart), (vend), add) /* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(2, (lvls), (vstart), (vend), add)	/* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(1, (lvls), (vstart), (vend), add))/* each entry needs a next level page table */
+#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(SWAPPER_PGTABLE_LEVELS, KIMAGE_VADDR, _end, EXTRA_PAGE) \
+				    + EARLY_SEGMENT_EXTRA_PAGES))
 
-/* the initial ID map may need two extra pages if it needs to be extended */
-#if VA_BITS < 48
-#define INIT_IDMAP_DIR_SIZE	((INIT_IDMAP_DIR_PAGES + 2) * PAGE_SIZE)
-#else
-#define INIT_IDMAP_DIR_SIZE	(INIT_IDMAP_DIR_PAGES * PAGE_SIZE)
-#endif
-#define INIT_IDMAP_DIR_PAGES	EARLY_PAGES(KIMAGE_VADDR, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE, 1)
+#define INIT_IDMAP_DIR_PAGES	(EARLY_PAGES(INIT_IDMAP_PGTABLE_LEVELS, KIMAGE_VADDR, _end, 1))
+#define INIT_IDMAP_DIR_SIZE	((INIT_IDMAP_DIR_PAGES + EARLY_IDMAP_EXTRA_PAGES) * PAGE_SIZE)
+
+#define INIT_IDMAP_FDT_PAGES	(EARLY_PAGES(INIT_IDMAP_PGTABLE_LEVELS, 0UL, UL(MAX_FDT_SIZE), 1) - 1)
+#define INIT_IDMAP_FDT_SIZE	((INIT_IDMAP_FDT_PAGES + EARLY_IDMAP_EXTRA_FDT_PAGES) * PAGE_SIZE)
 
 /* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
 #define KERNEL_SEGMENT_COUNT	5
 
 #if SWAPPER_BLOCK_SIZE > SEGMENT_ALIGN
 #define EARLY_SEGMENT_EXTRA_PAGES (KERNEL_SEGMENT_COUNT + 1)
-#else
-#define EARLY_SEGMENT_EXTRA_PAGES 0
-#endif
-
 /*
- * Initial memory map attributes.
+ * The initial ID map consists of the kernel image, mapped as two separate
+ * segments, and may appear misaligned wrt the swapper block size. This means
+ * we need 3 additional pages. The DT could straddle a swapper block boundary,
+ * so it may need 2.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | PTE_UXN)
-
-#ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_RW_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS | PTE_WRITE)
-#define SWAPPER_RX_MMUFLAGS	(SWAPPER_RW_MMUFLAGS | PMD_SECT_RDONLY)
+#define EARLY_IDMAP_EXTRA_PAGES		3
+#define EARLY_IDMAP_EXTRA_FDT_PAGES	2
 #else
-#define SWAPPER_RW_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS | PTE_WRITE)
-#define SWAPPER_RX_MMUFLAGS	(SWAPPER_RW_MMUFLAGS | PTE_RDONLY)
+#define EARLY_SEGMENT_EXTRA_PAGES	0
+#define EARLY_IDMAP_EXTRA_PAGES		0
+#define EARLY_IDMAP_EXTRA_FDT_PAGES	0
 #endif
 
 #endif	/* __ASM_KERNEL_PGTABLE_H */
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 926fbbcecbe0..a8a89a0f2867 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -61,11 +61,9 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
 }
 
 /*
- * TCR.T0SZ value to use when the ID map is active. Usually equals
- * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
- * physical memory, in which case it will be smaller.
+ * TCR.T0SZ value to use when the ID map is active.
  */
-extern int idmap_t0sz;
+#define idmap_t0sz	TCR_T0SZ(IDMAP_VA_BITS)
 
 /*
  * Ensure TCR.T0SZ is set to the provided value.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a1c29d64e875..545b5d8976f4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -80,26 +80,42 @@
 	 *  x19        primary_entry() .. start_kernel()        whether we entered with the MMU on
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
-	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
-	 *  x28        create_idmap()                           callee preserved temp register
 	 */
 SYM_CODE_START(primary_entry)
 	bl	record_mmu_state
 	bl	preserve_boot_args
-	bl	create_idmap
+
+	adrp	x1, early_init_stack
+	mov	sp, x1
+	mov	x29, xzr
+	adrp	x0, init_idmap_pg_dir
+	bl	__pi_create_init_idmap
+
+	/*
+	 * If the page tables have been populated with non-cacheable
+	 * accesses (MMU disabled), invalidate those tables again to
+	 * remove any speculatively loaded cache lines.
+	 */
+	cbnz	x19, 0f
+	dmb     sy
+	mov	x1, x0				// end of used region
+	adrp    x0, init_idmap_pg_dir
+	adr_l	x2, dcache_inval_poc
+	blr	x2
+	b	1f
 
 	/*
 	 * If we entered with the MMU and caches on, clean the ID mapped part
 	 * of the primary boot code to the PoC so we can safely execute it with
 	 * the MMU off.
 	 */
-	cbz	x19, 0f
-	adrp	x0, __idmap_text_start
+0:	adrp	x0, __idmap_text_start
 	adr_l	x1, __idmap_text_end
 	adr_l	x2, dcache_clean_poc
 	blr	x2
-0:	mov	x0, x19
+
+1:	mov	x0, x19
 	bl	init_kernel_el			// w0=cpu_boot_mode
 	mov	x20, x0
 
@@ -175,238 +191,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 	ret
 SYM_CODE_END(preserve_boot_args)
 
-/*
- * Macro to populate page table entries, these entries can be pointers to the next level
- * or last level entries pointing to physical memory.
- *
- *	tbl:	page table address
- *	rtbl:	pointer to page table or physical memory
- *	index:	start index to write
- *	eindex:	end index to write - [index, eindex] written to
- *	flags:	flags for pagetable entry to or in
- *	inc:	increment to rtbl between each entry
- *	tmp1:	temporary variable
- *
- * Preserves:	tbl, eindex, flags, inc
- * Corrupts:	index, tmp1
- * Returns:	rtbl
- */
-	.macro populate_entries, tbl, rtbl, index, eindex, flags, inc, tmp1
-.Lpe\@:	phys_to_pte \tmp1, \rtbl
-	orr	\tmp1, \tmp1, \flags	// tmp1 = table entry
-	str	\tmp1, [\tbl, \index, lsl #3]
-	add	\rtbl, \rtbl, \inc	// rtbl = pa next level
-	add	\index, \index, #1
-	cmp	\index, \eindex
-	b.ls	.Lpe\@
-	.endm
-
-/*
- * Compute indices of table entries from virtual address range. If multiple entries
- * were needed in the previous page table level then the next page table level is assumed
- * to be composed of multiple pages. (This effectively scales the end index).
- *
- *	vstart:	virtual address of start of range
- *	vend:	virtual address of end of range - we map [vstart, vend]
- *	shift:	shift used to transform virtual address into index
- *	order:  #imm 2log(number of entries in page table)
- *	istart:	index in table corresponding to vstart
- *	iend:	index in table corresponding to vend
- *	count:	On entry: how many extra entries were required in previous level, scales
- *			  our end index.
- *		On exit: returns how many extra entries required for next page table level
- *
- * Preserves:	vstart, vend
- * Returns:	istart, iend, count
- */
-	.macro compute_indices, vstart, vend, shift, order, istart, iend, count
-	ubfx	\istart, \vstart, \shift, \order
-	ubfx	\iend, \vend, \shift, \order
-	add	\iend, \iend, \count, lsl \order
-	sub	\count, \iend, \istart
-	.endm
-
-/*
- * Map memory for specified virtual address range. Each level of page table needed supports
- * multiple entries. If a level requires n entries the next page table level is assumed to be
- * formed from n pages.
- *
- *	tbl:	location of page table
- *	rtbl:	address to be used for first level page table entry (typically tbl + PAGE_SIZE)
- *	vstart:	virtual address of start of range
- *	vend:	virtual address of end of range - we map [vstart, vend - 1]
- *	flags:	flags to use to map last level entries
- *	phys:	physical address corresponding to vstart - physical memory is contiguous
- *	order:  #imm 2log(number of entries in PGD table)
- *
- * If extra_shift is set, an extra level will be populated if the end address does
- * not fit in 'extra_shift' bits. This assumes vend is in the TTBR0 range.
- *
- * Temporaries:	istart, iend, tmp, count, sv - these need to be different registers
- * Preserves:	vstart, flags
- * Corrupts:	tbl, rtbl, vend, istart, iend, tmp, count, sv
- */
-	.macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv, extra_shift
-	sub \vend, \vend, #1
-	add \rtbl, \tbl, #PAGE_SIZE
-	mov \count, #0
-
-	.ifnb	\extra_shift
-	tst	\vend, #~((1 << (\extra_shift)) - 1)
-	b.eq	.L_\@
-	compute_indices \vstart, \vend, #\extra_shift, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-	.endif
-.L_\@:
-	compute_indices \vstart, \vend, #PGDIR_SHIFT, #\order, \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-
-#if SWAPPER_PGTABLE_LEVELS > 3
-	compute_indices \vstart, \vend, #PUD_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-#endif
-
-#if SWAPPER_PGTABLE_LEVELS > 2
-	compute_indices \vstart, \vend, #SWAPPER_TABLE_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-#endif
-
-	compute_indices \vstart, \vend, #SWAPPER_BLOCK_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	bic \rtbl, \phys, #SWAPPER_BLOCK_SIZE - 1
-	populate_entries \tbl, \rtbl, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
-	.endm
-
-/*
- * Remap a subregion created with the map_memory macro with modified attributes
- * or output address. The entire remapped region must have been covered in the
- * invocation of map_memory.
- *
- * x0: last level table address (returned in first argument to map_memory)
- * x1: start VA of the existing mapping
- * x2: start VA of the region to update
- * x3: end VA of the region to update (exclusive)
- * x4: start PA associated with the region to update
- * x5: attributes to set on the updated region
- * x6: order of the last level mappings
- */
-SYM_FUNC_START_LOCAL(remap_region)
-	sub	x3, x3, #1		// make end inclusive
-
-	// Get the index offset for the start of the last level table
-	lsr	x1, x1, x6
-	bfi	x1, xzr, #0, #PAGE_SHIFT - 3
-
-	// Derive the start and end indexes into the last level table
-	// associated with the provided region
-	lsr	x2, x2, x6
-	lsr	x3, x3, x6
-	sub	x2, x2, x1
-	sub	x3, x3, x1
-
-	mov	x1, #1
-	lsl	x6, x1, x6		// block size at this level
-
-	populate_entries x0, x4, x2, x3, x5, x6, x7
-	ret
-SYM_FUNC_END(remap_region)
-
-SYM_FUNC_START_LOCAL(create_idmap)
-	mov	x28, lr
-	/*
-	 * The ID map carries a 1:1 mapping of the physical address range
-	 * covered by the loaded image, which could be anywhere in DRAM. This
-	 * means that the required size of the VA (== PA) space is decided at
-	 * boot time, and could be more than the configured size of the VA
-	 * space for ordinary kernel and user space mappings.
-	 *
-	 * There are three cases to consider here:
-	 * - 39 <= VA_BITS < 48, and the ID map needs up to 48 VA bits to cover
-	 *   the placement of the image. In this case, we configure one extra
-	 *   level of translation on the fly for the ID map only. (This case
-	 *   also covers 42-bit VA/52-bit PA on 64k pages).
-	 *
-	 * - VA_BITS == 48, and the ID map needs more than 48 VA bits. This can
-	 *   only happen when using 64k pages, in which case we need to extend
-	 *   the root level table rather than add a level. Note that we can
-	 *   treat this case as 'always extended' as long as we take care not
-	 *   to program an unsupported T0SZ value into the TCR register.
-	 *
-	 * - Combinations that would require two additional levels of
-	 *   translation are not supported, e.g., VA_BITS==36 on 16k pages, or
-	 *   VA_BITS==39/4k pages with 5-level paging, where the input address
-	 *   requires more than 47 or 48 bits, respectively.
-	 */
-#if (VA_BITS < 48)
-#define IDMAP_PGD_ORDER	(VA_BITS - PGDIR_SHIFT)
-#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
-
-	/*
-	 * If VA_BITS < 48, we have to configure an additional table level.
-	 * First, we have to verify our assumption that the current value of
-	 * VA_BITS was chosen such that all translation levels are fully
-	 * utilised, and that lowering T0SZ will always result in an additional
-	 * translation level to be configured.
-	 */
-#if VA_BITS != EXTRA_SHIFT
-#error "Mismatch between VA_BITS and page size/number of translation levels"
-#endif
-#else
-#define IDMAP_PGD_ORDER	(PHYS_MASK_SHIFT - PGDIR_SHIFT)
-#define EXTRA_SHIFT
-	/*
-	 * If VA_BITS == 48, we don't have to configure an additional
-	 * translation level, but the top-level table has more entries.
-	 */
-#endif
-	adrp	x0, init_idmap_pg_dir
-	adrp	x3, _text
-	adrp	x6, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
-	mov_q	x7, SWAPPER_RX_MMUFLAGS
-
-	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
-
-	/* Remap [.init].data, BSS and the kernel page tables r/w in the ID map */
-	adrp	x1, _text
-	adrp	x2, __initdata_begin
-	adrp	x3, _end
-	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
-	mov_q	x5, SWAPPER_RW_MMUFLAGS
-	mov	x6, #SWAPPER_BLOCK_SHIFT
-	bl	remap_region
-
-	/* Remap the FDT after the kernel image */
-	adrp	x1, _text
-	adrp	x22, _end + SWAPPER_BLOCK_SIZE
-	bic	x2, x22, #SWAPPER_BLOCK_SIZE - 1
-	bfi	x22, x21, #0, #SWAPPER_BLOCK_SHIFT		// remapped FDT address
-	add	x3, x2, #MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
-	bic	x4, x21, #SWAPPER_BLOCK_SIZE - 1
-	mov_q	x5, SWAPPER_RW_MMUFLAGS
-	mov	x6, #SWAPPER_BLOCK_SHIFT
-	bl	remap_region
-
-	/*
-	 * Since the page tables have been populated with non-cacheable
-	 * accesses (MMU disabled), invalidate those tables again to
-	 * remove any speculatively loaded cache lines.
-	 */
-	cbnz	x19, 0f				// skip cache invalidation if MMU is on
-	dmb	sy
-
-	adrp	x0, init_idmap_pg_dir
-	adrp	x1, init_idmap_pg_end
-	bl	dcache_inval_poc
-0:	ret	x28
-SYM_FUNC_END(create_idmap)
-
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
 	 *
@@ -729,11 +513,6 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 SYM_FUNC_END(__no_granule_support)
 
 SYM_FUNC_START_LOCAL(__primary_switch)
-	mrs		x1, tcr_el1
-	mov		x2, #64 - VA_BITS
-	tcr_set_t0sz	x1, x2
-	msr		tcr_el1, x1
-
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
@@ -742,7 +521,7 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	mov	sp, x1
 	mov	x29, xzr
 	mov	x0, x20				// pass the full boot status
-	mov	x1, x22				// pass the low FDT mapping
+	mov	x1, x21				// pass the FDT
 	bl	__pi_early_map_kernel		// Map and relocate the kernel
 
 	ldr	x8, =__primary_switched
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index e566b32f9c22..941a14c05184 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -52,6 +52,7 @@ PROVIDE(__pi_cavium_erratum_27456_cpus	= cavium_erratum_27456_cpus);
 PROVIDE(__pi__ctype			= _ctype);
 PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
+PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
 
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 8c2f80a46b93..4393b41f0b71 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -11,6 +11,9 @@ KBUILD_CFLAGS	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
 		   -fno-asynchronous-unwind-tables -fno-unwind-tables \
 		   $(call cc-option,-fno-addrsig)
 
+# this code may run with the MMU off so disable unaligned accesses
+CFLAGS_map_range.o += -mstrict-align
+
 # remove SCS flags from all objects in this directory
 KBUILD_CFLAGS	:= $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 # disable LTO
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index f206373b28b0..f86e878d366d 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -128,6 +128,22 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 	}
 }
 
+static void __init map_fdt(u64 fdt)
+{
+	static u8 ptes[INIT_IDMAP_FDT_SIZE] __initdata __aligned(PAGE_SIZE);
+	u64 efdt = fdt + MAX_FDT_SIZE;
+	u64 ptep = (u64)ptes;
+
+	/*
+	 * Map up to MAX_FDT_SIZE bytes, but avoid overlap with
+	 * the kernel image.
+	 */
+	map_range(&ptep, fdt, (u64)_text > fdt ? min((u64)_text, efdt) : efdt,
+		  fdt, PAGE_KERNEL, IDMAP_ROOT_LEVEL,
+		  (pte_t *)init_idmap_pg_dir, false, 0);
+	dsb(ishst);
+}
+
 asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 {
 	static char const chosen_str[] __initconst = "/chosen";
@@ -136,6 +152,8 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
 	int chosen;
 
+	map_fdt((u64)fdt);
+
 	/* Clear BSS and the initial page tables */
 	memset(__bss_start, 0, (u64)init_pg_end - (u64)__bss_start);
 
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
index c31feda18f47..79e4f6a2efe1 100644
--- a/arch/arm64/kernel/pi/map_range.c
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -86,3 +86,15 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
 		tbl++;
 	}
 }
+
+asmlinkage u64 __init create_init_idmap(pgd_t *pg_dir)
+{
+	u64 ptep = (u64)pg_dir + PAGE_SIZE;
+
+	map_range(&ptep, (u64)_stext, (u64)__initdata_begin, (u64)_stext,
+		  PAGE_KERNEL_ROX, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
+	map_range(&ptep, (u64)__initdata_begin, (u64)_end, (u64)__initdata_begin,
+		  PAGE_KERNEL, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
+
+	return ptep;
+}
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
index d307c58e9741..1ea282a5f96a 100644
--- a/arch/arm64/kernel/pi/pi.h
+++ b/arch/arm64/kernel/pi/pi.h
@@ -21,6 +21,8 @@ static inline void *prel64_to_pointer(const prel64_t *offset)
 
 extern bool dynamic_scs_is_enabled;
 
+extern pgd_t init_idmap_pg_dir[];
+
 void init_feature_override(u64 boot_status, const void *fdt, int chosen);
 u64 kaslr_early_init(void *fdt, int chosen);
 void relocate_kernel(u64 offset);
@@ -30,3 +32,5 @@ void map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
 	       int level, pte_t *tbl, bool may_use_cont, u64 va_offset);
 
 asmlinkage void early_map_kernel(u64 boot_status, void *fdt);
+
+asmlinkage u64 create_init_idmap(pgd_t *pgd);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index aae32a6c0428..cef7573d156d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -45,8 +45,6 @@
 #define NO_CONT_MAPPINGS	BIT(1)
 #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
 
-int idmap_t0sz __ro_after_init;
-
 #if VA_BITS > 48
 u64 vabits_actual __ro_after_init = VA_BITS_MIN;
 EXPORT_SYMBOL(vabits_actual);
@@ -790,8 +788,6 @@ void __init paging_init(void)
 	pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
 	extern pgd_t init_idmap_pg_dir[];
 
-	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(VA_BITS_MIN - 1, 0));
-
 	map_kernel(pgdp);
 	map_mem(pgdp);
 
@@ -806,7 +802,6 @@ void __init paging_init(void)
 	memblock_allow_resize();
 
 	create_idmap();
-	idmap_t0sz = TCR_T0SZ(IDMAP_VA_BITS);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 47ede52bb900..55c366dbda8f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -200,7 +200,8 @@ SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 
-#define KPTI_NG_PTE_FLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS | PTE_WRITE)
+#define KPTI_NG_PTE_FLAGS	(PTE_ATTRINDX(MT_NORMAL) | PTE_TYPE_PAGE | \
+				 PTE_AF | PTE_SHARED | PTE_UXN | PTE_WRITE)
 
 	.pushsection ".idmap.text", "a"
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 39/41] arm64: mm: avoid fixmap for early swapper_pg_dir updates
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (37 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 38/41] arm64: kernel: Create initial ID map from C code Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 40/41] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Early in the boot, when .rodata is still writable, we can poke
swapper_pg_dir entries directly, and there is no need to go through the
fixmap. After a future patch, we will enter the kernel with
swapper_pg_dir already active, and early swapper_pg_dir updates for
creating the fixmap page table hierarchy itself cannot go through the
fixmap for obvious reaons. So let's keep track of whether rodata is
writable, and update the descriptor directly in that case.

As the same reasoning applies to early KASAN init, make the function
noinstr as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/mmu.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index cef7573d156d..5996b374ff8a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -55,6 +55,8 @@ EXPORT_SYMBOL(kimage_voffset);
 
 u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_EL2, BOOT_CPU_MODE_EL1 };
 
+static bool rodata_is_rw __ro_after_init = true;
+
 /*
  * The booting CPU updates the failed status @__early_cpu_boot_status,
  * with MMU turned off.
@@ -71,10 +73,21 @@ EXPORT_SYMBOL(empty_zero_page);
 static DEFINE_SPINLOCK(swapper_pgdir_lock);
 static DEFINE_MUTEX(fixmap_lock);
 
-void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
+void noinstr set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 {
 	pgd_t *fixmap_pgdp;
 
+	/*
+	 * Don't bother with the fixmap if swapper_pg_dir is still mapped
+	 * writable in the kernel mapping.
+	 */
+	if (rodata_is_rw) {
+		WRITE_ONCE(*pgdp, pgd);
+		dsb(ishst);
+		isb();
+		return;
+	}
+
 	spin_lock(&swapper_pgdir_lock);
 	fixmap_pgdp = pgd_set_fixmap(__pa_symbol(pgdp));
 	WRITE_ONCE(*fixmap_pgdp, pgd);
@@ -628,6 +641,7 @@ void mark_rodata_ro(void)
 	 * to cover NOTES and EXCEPTION_TABLE.
 	 */
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
+	WRITE_ONCE(rodata_is_rw, false);
 	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 40/41] arm64: mm: omit redundant remap of kernel image
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (38 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 39/41] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-11-29 11:16 ` [PATCH v6 41/41] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()" Ard Biesheuvel
  2023-12-12 17:20 ` [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Will Deacon
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that the early kernel mapping is created with all the right
attributes and segment boundaries, there is no longer a need to recreate
it and switch to it. This also means we no longer have to copy the kasan
shadow or some parts of the fixmap from one set of page tables to the
other.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/fixmap.h   |  1 -
 arch/arm64/include/asm/kasan.h    |  2 -
 arch/arm64/include/asm/mmu.h      |  2 +-
 arch/arm64/kernel/image-vars.h    |  1 +
 arch/arm64/kernel/pi/map_kernel.c |  6 +-
 arch/arm64/mm/fixmap.c            | 34 --------
 arch/arm64/mm/kasan_init.c        | 15 ----
 arch/arm64/mm/mmu.c               | 85 ++++----------------
 8 files changed, 21 insertions(+), 125 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 58c294a96676..8aabd45e9a13 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -100,7 +100,6 @@ enum fixed_addresses {
 #define FIXMAP_PAGE_IO     __pgprot(PROT_DEVICE_nGnRE)
 
 void __init early_fixmap_init(void);
-void __init fixmap_copy(pgd_t *pgdir);
 
 #define __early_set_fixmap __set_fixmap
 
diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index 12d5f47f7dbe..ab52688ac4bd 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -36,12 +36,10 @@ void kasan_init(void);
 #define _KASAN_SHADOW_START(va)	(KASAN_SHADOW_END - (1UL << ((va) - KASAN_SHADOW_SCALE_SHIFT)))
 #define KASAN_SHADOW_START      _KASAN_SHADOW_START(vabits_actual)
 
-void kasan_copy_shadow(pgd_t *pgdir);
 asmlinkage void kasan_early_init(void);
 
 #else
 static inline void kasan_init(void) { }
-static inline void kasan_copy_shadow(pgd_t *pgdir) { }
 #endif
 
 #endif
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index d0b8b4b413b6..65977c7783c5 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -110,7 +110,7 @@ static inline bool kaslr_requires_kpti(void)
 }
 
 #define INIT_MM_CONTEXT(name)	\
-	.pgd = init_pg_dir,
+	.pgd = swapper_pg_dir,
 
 #endif	/* !__ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 941a14c05184..e140c5bda90b 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -55,6 +55,7 @@ PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
+PROVIDE(__pi_swapper_pg_dir		= swapper_pg_dir);
 
 PROVIDE(__pi__text			= _text);
 PROVIDE(__pi__stext               	= _stext);
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index f86e878d366d..4b76a007a50d 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -124,8 +124,12 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 			    text_prot, true, root_level);
 		map_segment(init_pg_dir, NULL, va_offset, __inittext_begin,
 			    __inittext_end, text_prot, false, root_level);
-		dsb(ishst);
 	}
+
+	/* Copy the root page table to its final location */
+	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PGD_SIZE);
+	dsb(ishst);
+	idmap_cpu_replace_ttbr1(swapper_pg_dir);
 }
 
 static void __init map_fdt(u64 fdt)
diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index c0a3301203bd..9436a12e1882 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -167,37 +167,3 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 
 	return dt_virt;
 }
-
-/*
- * Copy the fixmap region into a new pgdir.
- */
-void __init fixmap_copy(pgd_t *pgdir)
-{
-	if (!READ_ONCE(pgd_val(*pgd_offset_pgd(pgdir, FIXADDR_TOT_START)))) {
-		/*
-		 * The fixmap falls in a separate pgd to the kernel, and doesn't
-		 * live in the carveout for the swapper_pg_dir. We can simply
-		 * re-use the existing dir for the fixmap.
-		 */
-		set_pgd(pgd_offset_pgd(pgdir, FIXADDR_TOT_START),
-			READ_ONCE(*pgd_offset_k(FIXADDR_TOT_START)));
-	} else if (CONFIG_PGTABLE_LEVELS > 3) {
-		pgd_t *bm_pgdp;
-		p4d_t *bm_p4dp;
-		pud_t *bm_pudp;
-		/*
-		 * The fixmap shares its top level pgd entry with the kernel
-		 * mapping. This can really only occur when we are running
-		 * with 16k/4 levels, so we can simply reuse the pud level
-		 * entry instead.
-		 */
-		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
-		bm_pgdp = pgd_offset_pgd(pgdir, FIXADDR_TOT_START);
-		bm_p4dp = p4d_offset(bm_pgdp, FIXADDR_TOT_START);
-		bm_pudp = pud_set_fixmap_offset(bm_p4dp, FIXADDR_TOT_START);
-		pud_populate(&init_mm, bm_pudp, lm_alias(bm_pmd));
-		pud_clear_fixmap();
-	} else {
-		BUG();
-	}
-}
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 555285ebd5af..dd91f5942fd2 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -184,21 +184,6 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end,
 	kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false);
 }
 
-/*
- * Copy the current shadow region into a new pgdir.
- */
-void __init kasan_copy_shadow(pgd_t *pgdir)
-{
-	pgd_t *pgdp, *pgdp_new, *pgdp_end;
-
-	pgdp = pgd_offset_k(KASAN_SHADOW_START);
-	pgdp_end = pgd_offset_k(KASAN_SHADOW_END);
-	pgdp_new = pgd_offset_pgd(pgdir, KASAN_SHADOW_START);
-	do {
-		set_pgd(pgdp_new, READ_ONCE(*pgdp));
-	} while (pgdp++, pgdp_new++, pgdp != pgdp_end);
-}
-
 static void __init clear_pgds(unsigned long start,
 			unsigned long end)
 {
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5996b374ff8a..07da61c1d7c6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -648,9 +648,9 @@ void mark_rodata_ro(void)
 	debug_checkwx();
 }
 
-static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
-				      pgprot_t prot, struct vm_struct *vma,
-				      int flags, unsigned long vm_flags)
+static void __init declare_vma(struct vm_struct *vma,
+			       void *va_start, void *va_end,
+			       unsigned long vm_flags)
 {
 	phys_addr_t pa_start = __pa_symbol(va_start);
 	unsigned long size = va_end - va_start;
@@ -658,9 +658,6 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(pa_start));
 	BUG_ON(!PAGE_ALIGNED(size));
 
-	__create_pgd_mapping(pgdp, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, flags);
-
 	if (!(vm_flags & VM_NO_GUARD))
 		size += PAGE_SIZE;
 
@@ -673,12 +670,12 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 	vm_area_add_early(vma);
 }
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 static pgprot_t kernel_exec_prot(void)
 {
 	return rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 }
 
-#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 static int __init map_entry_trampoline(void)
 {
 	int i;
@@ -710,60 +707,17 @@ core_initcall(map_entry_trampoline);
 #endif
 
 /*
- * Open coded check for BTI, only for use to determine configuration
- * for early mappings for before the cpufeature code has run.
- */
-static bool arm64_early_this_cpu_has_bti(void)
-{
-	u64 pfr1;
-
-	if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		return false;
-
-	pfr1 = __read_sysreg_by_encoding(SYS_ID_AA64PFR1_EL1);
-	return cpuid_feature_extract_unsigned_field(pfr1,
-						    ID_AA64PFR1_EL1_BT_SHIFT);
-}
-
-/*
- * Create fine-grained mappings for the kernel.
+ * Declare the VMA areas for the kernel
  */
-static void __init map_kernel(pgd_t *pgdp)
+static void __init declare_kernel_vmas(void)
 {
-	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
-				vmlinux_initdata, vmlinux_data;
-
-	/*
-	 * External debuggers may need to write directly to the text
-	 * mapping to install SW breakpoints. Allow this (only) when
-	 * explicitly requested with rodata=off.
-	 */
-	pgprot_t text_prot = kernel_exec_prot();
-
-	/*
-	 * If we have a CPU that supports BTI and a kernel built for
-	 * BTI then mark the kernel executable text as guarded pages
-	 * now so we don't have to rewrite the page tables later.
-	 */
-	if (arm64_early_this_cpu_has_bti())
-		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
+	static struct vm_struct vmlinux_seg[KERNEL_SEGMENT_COUNT];
 
-	/*
-	 * Only rodata will be remapped with different permissions later on,
-	 * all other segments are allowed to use contiguous mappings.
-	 */
-	map_kernel_segment(pgdp, _stext, _etext, text_prot, &vmlinux_text, 0,
-			   VM_NO_GUARD);
-	map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL,
-			   &vmlinux_rodata, NO_CONT_MAPPINGS, VM_NO_GUARD);
-	map_kernel_segment(pgdp, __inittext_begin, __inittext_end, text_prot,
-			   &vmlinux_inittext, 0, VM_NO_GUARD);
-	map_kernel_segment(pgdp, __initdata_begin, __initdata_end, PAGE_KERNEL,
-			   &vmlinux_initdata, 0, VM_NO_GUARD);
-	map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
-
-	fixmap_copy(pgdp);
-	kasan_copy_shadow(pgdp);
+	declare_vma(&vmlinux_seg[0], _stext, _etext, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[1], __start_rodata, __inittext_begin, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[2], __inittext_begin, __inittext_end, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[3], __initdata_begin, __initdata_end, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[4], _data, _end, 0);
 }
 
 void __pi_map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
@@ -799,23 +753,12 @@ static void __init create_idmap(void)
 
 void __init paging_init(void)
 {
-	pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
-	extern pgd_t init_idmap_pg_dir[];
-
-	map_kernel(pgdp);
-	map_mem(pgdp);
-
-	pgd_clear_fixmap();
-
-	cpu_replace_ttbr1(lm_alias(swapper_pg_dir), init_idmap_pg_dir);
-	init_mm.pgd = swapper_pg_dir;
-
-	memblock_phys_free(__pa_symbol(init_pg_dir),
-			   __pa_symbol(init_pg_end) - __pa_symbol(init_pg_dir));
+	map_mem(swapper_pg_dir);
 
 	memblock_allow_resize();
 
 	create_idmap();
+	declare_kernel_vmas();
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v6 41/41] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (39 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 40/41] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
@ 2023-11-29 11:16 ` Ard Biesheuvel
  2023-12-12 17:20 ` [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Will Deacon
  41 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 11:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

This reverts commit 1682c45b920643c, which is no longer needed now that
we create the permanent kernel mapping directly during early boot.

This is a RINO (revert in name only) given that some of the code has
moved around, but the changes are straight-forward.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 17 ++++++-----------
 arch/arm64/mm/kasan_init.c           |  4 ++--
 arch/arm64/mm/mmu.c                  |  4 ++--
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index a8a89a0f2867..c768d16b81a4 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -108,18 +108,13 @@ static inline void cpu_uninstall_idmap(void)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
-static inline void __cpu_install_idmap(pgd_t *idmap)
+static inline void cpu_install_idmap(void)
 {
 	cpu_set_reserved_ttbr0();
 	local_flush_tlb_all();
 	cpu_set_idmap_tcr_t0sz();
 
-	cpu_switch_mm(lm_alias(idmap), &init_mm);
-}
-
-static inline void cpu_install_idmap(void)
-{
-	__cpu_install_idmap(idmap_pg_dir);
+	cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);
 }
 
 /*
@@ -146,21 +141,21 @@ static inline void cpu_install_ttbr0(phys_addr_t ttbr0, unsigned long t0sz)
 	isb();
 }
 
-void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp);
+void __cpu_replace_ttbr1(pgd_t *pgdp, bool cnp);
 
 static inline void cpu_enable_swapper_cnp(void)
 {
-	__cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir, true);
+	__cpu_replace_ttbr1(lm_alias(swapper_pg_dir), true);
 }
 
-static inline void cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap)
+static inline void cpu_replace_ttbr1(pgd_t *pgdp)
 {
 	/*
 	 * Only for early TTBR1 replacement before cpucaps are finalized and
 	 * before we've decided whether to use CNP.
 	 */
 	WARN_ON(system_capabilities_finalized());
-	__cpu_replace_ttbr1(pgdp, idmap, false);
+	__cpu_replace_ttbr1(pgdp, false);
 }
 
 /*
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index dd91f5942fd2..2ea0de4a8486 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -220,7 +220,7 @@ static void __init kasan_init_shadow(void)
 	 */
 	memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
 	dsb(ishst);
-	cpu_replace_ttbr1(lm_alias(tmp_pg_dir), idmap_pg_dir);
+	cpu_replace_ttbr1(lm_alias(tmp_pg_dir));
 
 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
@@ -256,7 +256,7 @@ static void __init kasan_init_shadow(void)
 				PAGE_KERNEL_RO));
 
 	memset(kasan_early_shadow_page, KASAN_SHADOW_INIT, PAGE_SIZE);
-	cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
+	cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
 }
 
 static void __init kasan_init_depth(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 07da61c1d7c6..88c6a54b76ac 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1442,7 +1442,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte
  * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
  * avoiding the possibility of conflicting TLB entries being allocated.
  */
-void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
+void __cpu_replace_ttbr1(pgd_t *pgdp, bool cnp)
 {
 	typedef void (ttbr_replace_func)(phys_addr_t);
 	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
@@ -1457,7 +1457,7 @@ void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
 
 	replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
 
-	__cpu_install_idmap(idmap);
+	cpu_install_idmap();
 
 	/*
 	 * We really don't want to take *any* exceptions while TTBR1 is
-- 
2.43.0.rc1.413.gea7ed67945-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/
  2023-11-29 11:16 ` [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
@ 2023-11-29 12:27   ` Marc Zyngier
  2023-11-29 12:46     ` Ard Biesheuvel
  0 siblings, 1 reply; 63+ messages in thread
From: Marc Zyngier @ 2023-11-29 12:27 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, 29 Nov 2023 11:16:07 +0000,
Ard Biesheuvel <ardb@google.com> wrote:
> 
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> The mini C runtime runs before relocations are processed, and so it
> cannot rely on statically initialized pointer variables.
> 
> Add a check to ensure that such code does not get introduced by
> accident, by going over the relocations in each object, identifying the
> ones that operate on data sections that are part of the executable
> image, and raising an error if any relocations of type R_AARCH64_ABS64
> exist. Note that such relocations are permitted in other places (e.g.,
> debug sections) and will never occur in compiler generated code sections
> when using the small code model, so only check sections that have
> SHF_ALLOC set and SHF_EXECINSTR cleared.
> 
> To accommodate cases where statically initialized symbol references are
> unavoidable, introduce a special case for ELF input data sections that
> have ".rodata.prel64" in their names, and in these cases, instead of
> rejecting any encountered ABS64 relocations, convert them into PREL64
> relocations, which don't require any runtime fixups. Note that the code
> in question must still be modified to deal with this, as it needs to
> convert the 64-bit signed offsets into absolute addresses before use.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/kernel/pi/Makefile    |   9 +-
>  arch/arm64/kernel/pi/pi.h        |  18 +++
>  arch/arm64/kernel/pi/relacheck.c | 130 ++++++++++++++++++++
>  3 files changed, 155 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
> index c844a0546d7f..bc32a431fe35 100644
> --- a/arch/arm64/kernel/pi/Makefile
> +++ b/arch/arm64/kernel/pi/Makefile
> @@ -22,11 +22,16 @@ KCSAN_SANITIZE	:= n
>  UBSAN_SANITIZE	:= n
>  KCOV_INSTRUMENT	:= n
>  
> +hostprogs	:= relacheck
> +
> +quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
> +      cmd_piobjcopy = $(cmd_objcopy) && $(obj)/relacheck $(@) $(<)
> +
>  $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
>  			       --remove-section=.note.gnu.property \
>  			       --prefix-alloc-sections=.init
> -$(obj)/%.pi.o: $(obj)/%.o FORCE
> -	$(call if_changed,objcopy)
> +$(obj)/%.pi.o: $(obj)/%.o $(obj)/relacheck FORCE
> +	$(call if_changed,piobjcopy)
>  
>  $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
>  	$(call if_changed_rule,cc_o_c)
> diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
> new file mode 100644
> index 000000000000..7c2d9bbf0ff9
> --- /dev/null
> +++ b/arch/arm64/kernel/pi/pi.h
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright 2023 Google LLC
> +// Author: Ard Biesheuvel <ardb@google.com>
> +
> +#define __prel64_initconst	__section(".init.rodata.prel64")
> +
> +#define PREL64(type, name)	union { type *name; prel64_t name ## _prel; }
> +
> +#define prel64_pointer(__d)	(typeof(__d))prel64_to_pointer(&__d##_prel)
> +
> +typedef volatile signed long prel64_t;
> +
> +static inline void *prel64_to_pointer(const prel64_t *offset)
> +{
> +	if (!*offset)
> +		return NULL;
> +	return (void *)offset + *offset;
> +}

Is there any use for these definitions before the override code gets
moved to pi/ in patch 21? It is otherwise a bit odd to see two sets of
definitions between patches 15 and 20.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/
  2023-11-29 12:27   ` Marc Zyngier
@ 2023-11-29 12:46     ` Ard Biesheuvel
  0 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-29 12:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, 29 Nov 2023 at 13:27, Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 29 Nov 2023 11:16:07 +0000,
> Ard Biesheuvel <ardb@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The mini C runtime runs before relocations are processed, and so it
> > cannot rely on statically initialized pointer variables.
> >
> > Add a check to ensure that such code does not get introduced by
> > accident, by going over the relocations in each object, identifying the
> > ones that operate on data sections that are part of the executable
> > image, and raising an error if any relocations of type R_AARCH64_ABS64
> > exist. Note that such relocations are permitted in other places (e.g.,
> > debug sections) and will never occur in compiler generated code sections
> > when using the small code model, so only check sections that have
> > SHF_ALLOC set and SHF_EXECINSTR cleared.
> >
> > To accommodate cases where statically initialized symbol references are
> > unavoidable, introduce a special case for ELF input data sections that
> > have ".rodata.prel64" in their names, and in these cases, instead of
> > rejecting any encountered ABS64 relocations, convert them into PREL64
> > relocations, which don't require any runtime fixups. Note that the code
> > in question must still be modified to deal with this, as it needs to
> > convert the 64-bit signed offsets into absolute addresses before use.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/kernel/pi/Makefile    |   9 +-
> >  arch/arm64/kernel/pi/pi.h        |  18 +++
> >  arch/arm64/kernel/pi/relacheck.c | 130 ++++++++++++++++++++
> >  3 files changed, 155 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
> > index c844a0546d7f..bc32a431fe35 100644
> > --- a/arch/arm64/kernel/pi/Makefile
> > +++ b/arch/arm64/kernel/pi/Makefile
> > @@ -22,11 +22,16 @@ KCSAN_SANITIZE    := n
> >  UBSAN_SANITIZE       := n
> >  KCOV_INSTRUMENT      := n
> >
> > +hostprogs    := relacheck
> > +
> > +quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
> > +      cmd_piobjcopy = $(cmd_objcopy) && $(obj)/relacheck $(@) $(<)
> > +
> >  $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
> >                              --remove-section=.note.gnu.property \
> >                              --prefix-alloc-sections=.init
> > -$(obj)/%.pi.o: $(obj)/%.o FORCE
> > -     $(call if_changed,objcopy)
> > +$(obj)/%.pi.o: $(obj)/%.o $(obj)/relacheck FORCE
> > +     $(call if_changed,piobjcopy)
> >
> >  $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
> >       $(call if_changed_rule,cc_o_c)
> > diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
> > new file mode 100644
> > index 000000000000..7c2d9bbf0ff9
> > --- /dev/null
> > +++ b/arch/arm64/kernel/pi/pi.h
> > @@ -0,0 +1,18 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright 2023 Google LLC
> > +// Author: Ard Biesheuvel <ardb@google.com>
> > +
> > +#define __prel64_initconst   __section(".init.rodata.prel64")
> > +
> > +#define PREL64(type, name)   union { type *name; prel64_t name ## _prel; }
> > +
> > +#define prel64_pointer(__d)  (typeof(__d))prel64_to_pointer(&__d##_prel)
> > +
> > +typedef volatile signed long prel64_t;
> > +
> > +static inline void *prel64_to_pointer(const prel64_t *offset)
> > +{
> > +     if (!*offset)
> > +             return NULL;
> > +     return (void *)offset + *offset;
> > +}
>
> Is there any use for these definitions before the override code gets
> moved to pi/ in patch 21? It is otherwise a bit odd to see two sets of
> definitions between patches 15 and 20.
>

No they are unused at this point, but they are tied closely to the
host tool that this patch introduces.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  2023-11-29 11:15 ` [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime Ard Biesheuvel
@ 2023-11-30  4:44   ` Anshuman Khandual
  0 siblings, 0 replies; 63+ messages in thread
From: Anshuman Khandual @ 2023-11-30  4:44 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Kees Cook



On 11/29/23 16:45, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> In subsequent patches, mark portions of the early C code will be marked

Small nit - 		 ^^^ redundant usage for mark		   ^^^^^^

> as __init.  Unfortunarely, __init implies __latent_entropy, and this

Typo here - s/Unfortunarely/Unfortunately/

> would result in the early C code being instrumented in an unsafe manner.
> 
> Disable the latent entropy plugin for the early C code.

I am just wondering if this '__latent_entropy' could be problematic for all
early __init code in each supported platform. Although this plugin does not
get disabled in that many places inside the kernel tree.

$git grep DISABLE_LATENT_ENTROPY_PLUGIN
arch/arm64/kernel/pi/Makefile:             $(DISABLE_LATENT_ENTROPY_PLUGIN) \
arch/powerpc/kernel/Makefile:CFLAGS_early_32.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/kernel/Makefile:CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/kernel/Makefile:CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/kernel/Makefile:CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/kernel/Makefile:CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/kernel/vdso/Makefile:  CFLAGS_vgettimeofday-32.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/kernel/vdso/Makefile:  CFLAGS_vgettimeofday-64.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/lib/Makefile:CFLAGS_code-patching.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
arch/powerpc/lib/Makefile:CFLAGS_feature-fixups.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
scripts/Makefile.gcc-plugins:    DISABLE_LATENT_ENTROPY_PLUGIN += -fplugin-arg-latent_entropy_plugin-disable -ULATENT_ENTROPY_PLUGIN
scripts/Makefile.gcc-plugins:export DISABLE_LATENT_ENTROPY_PLUGIN

> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/kernel/pi/Makefile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
> index 4c0ea3cd4ea4..c844a0546d7f 100644
> --- a/arch/arm64/kernel/pi/Makefile
> +++ b/arch/arm64/kernel/pi/Makefile
> @@ -3,6 +3,7 @@
>  
>  KBUILD_CFLAGS	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
>  		   -Os -DDISABLE_BRANCH_PROFILING $(DISABLE_STACKLEAK_PLUGIN) \
> +		   $(DISABLE_LATENT_ENTROPY_PLUGIN) \
>  		   $(call cc-option,-mbranch-protection=none) \
>  		   -I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
>  		   -include $(srctree)/include/linux/hidden.h \

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off
  2023-11-29 11:15 ` [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off Ard Biesheuvel
@ 2023-11-30  5:23   ` Anshuman Khandual
  2023-12-04 14:12   ` Mark Rutland
  1 sibling, 0 replies; 63+ messages in thread
From: Anshuman Khandual @ 2023-11-30  5:23 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Kees Cook



On 11/29/23 16:45, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is
> disabled, and this permits the loader (i.e., EFI) to place the kernel

Indeed, could validate this out via defconfig based build and boot.

> anywhere in physical memory as long as the base address is 64k aligned.
> 
> This means that the 'KASLR' case described in the header that defines
> the size of the statically allocated page tables could take effect even
> when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead.

Makes sense.

> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/kernel-pgtable.h | 27 +++++---------------
>  1 file changed, 6 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index 85d26143faa5..83ddb14b95a5 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -37,27 +37,12 @@
>  
>  
>  /*
> - * If KASLR is enabled, then an offset K is added to the kernel address
> - * space. The bottom 21 bits of this offset are zero to guarantee 2MB
> - * alignment for PA and VA.
> - *
> - * For each pagetable level of the swapper, we know that the shift will
> - * be larger than 21 (for the 4KB granule case we use section maps thus
> - * the smallest shift is actually 30) thus there is the possibility that
> - * KASLR can increase the number of pagetable entries by 1, so we make
> - * room for this extra entry.
> - *
> - * Note KASLR cannot increase the number of required entries for a level
> - * by more than one because it increments both the virtual start and end
> - * addresses equally (the extra entry comes from the case where the end
> - * address is just pushed over a boundary and the start address isn't).
> + * A relocatable kernel may execute from an address that differs from the one at
> + * which it was linked. In the worst case, its runtime placement may intersect
> + * with two adjacent PGDIR entries, which means that an additional page table
> + * may be needed at each subordinate level.
>   */

This is a better explanation.

> -
> -#ifdef CONFIG_RANDOMIZE_BASE
> -#define EARLY_KASLR	(1)
> -#else
> -#define EARLY_KASLR	(0)
> -#endif
> +#define EXTRA_PAGE	__is_defined(CONFIG_RELOCATABLE)

EXTRA_INIT_DIR_PAGE instead ? Just to have some more context.

>  
>  #define SPAN_NR_ENTRIES(vstart, vend, shift) \
>  	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
> @@ -83,7 +68,7 @@
>  			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
>  			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
>  			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
> -#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR))
> +#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE))
>  
>  /* the initial ID map may need two extra pages if it needs to be extended */
>  #if VA_BITS < 48

Regardless

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable
  2023-11-29 11:15 ` [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
@ 2023-11-30  5:38   ` Anshuman Khandual
  2023-12-04 14:37     ` Mark Rutland
  0 siblings, 1 reply; 63+ messages in thread
From: Anshuman Khandual @ 2023-11-30  5:38 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Kees Cook



On 11/29/23 16:45, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> We store the address of _text in kimage_vaddr, but since commit
> 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings
> decision"), we no longer reference this variable from modules so we no
> longer need to export it.
> 
> In fact, we don't need it at all so let's just get rid of it.
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

There is a checkpatch.pl error for this patch.

--------
ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' -
ie: 'commit 09e3c22a86f6 ("arm64: Use a variable to store non-global mappings decision")'
#6: 
We store the address of _text in kimage_vaddr, but since commit
--------

Otherwise LGTM.


Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  arch/arm64/include/asm/memory.h | 6 ++----
>  arch/arm64/kernel/head.S        | 2 +-
>  arch/arm64/mm/mmu.c             | 3 ---
>  3 files changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index fde4186cc387..b8d726f951ae 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -182,6 +182,7 @@
>  #include <linux/types.h>
>  #include <asm/boot.h>
>  #include <asm/bug.h>
> +#include <asm/sections.h>
>  
>  #if VA_BITS > 48
>  extern u64			vabits_actual;
> @@ -193,15 +194,12 @@ extern s64			memstart_addr;
>  /* PHYS_OFFSET - the physical address of the start of memory. */
>  #define PHYS_OFFSET		({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
>  
> -/* the virtual base of the kernel image */
> -extern u64			kimage_vaddr;
> -
>  /* the offset between the kernel virtual and physical mappings */
>  extern u64			kimage_voffset;
>  
>  static inline unsigned long kaslr_offset(void)
>  {
> -	return kimage_vaddr - KIMAGE_VADDR;
> +	return (u64)&_text - KIMAGE_VADDR;
>  }
>  
>  #ifdef CONFIG_RANDOMIZE_BASE
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 7b236994f0e1..cab7f91949d8 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -482,7 +482,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
>  
>  	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
>  
> -	ldr_l	x4, kimage_vaddr		// Save the offset between
> +	adrp	x4, _text			// Save the offset between
>  	sub	x4, x4, x0			// the kernel virtual and
>  	str_l	x4, kimage_voffset, x5		// physical mappings
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 15f6347d23b6..03c73e9197ac 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -52,9 +52,6 @@ u64 vabits_actual __ro_after_init = VA_BITS_MIN;
>  EXPORT_SYMBOL(vabits_actual);
>  #endif
>  
> -u64 kimage_vaddr __ro_after_init = (u64)&_text;
> -EXPORT_SYMBOL(kimage_vaddr);
> -
>  u64 kimage_voffset __ro_after_init;
>  EXPORT_SYMBOL(kimage_voffset);
>  

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-11-29 11:16 ` [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region Ard Biesheuvel
@ 2023-11-30  7:59   ` Anshuman Khandual
  2023-11-30  8:02     ` Ard Biesheuvel
  2023-12-11 13:57   ` Mark Rutland
  1 sibling, 1 reply; 63+ messages in thread
From: Anshuman Khandual @ 2023-11-30  7:59 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Kees Cook



On 11/29/23 16:46, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Move the PCI I/O region above the vmemmap region in the kernel's VA
> space. This will permit us to reclaim the lower part of the vmemmap
> region for vmalloc/vmap allocations when running a 52-bit VA capable
> build on a 48-bit VA capable system.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/memory.h | 4 ++--
>  arch/arm64/mm/ptdump.c          | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index b8d726f951ae..99caeff78e1a 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -49,8 +49,8 @@
>  #define MODULES_VSIZE		(SZ_2G)
>  #define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
>  #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
> -#define PCI_IO_END		(VMEMMAP_START - SZ_8M)
> -#define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
> +#define PCI_IO_START		(VMEMMAP_END + SZ_8M)
> +#define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
>  #define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)
>  
>  #if VA_BITS > 48
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index e305b6593c4e..d1df56d44f8a 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
>  	{ VMALLOC_END,			"vmalloc() end" },
>  	{ FIXADDR_TOT_START,		"Fixmap start" },
>  	{ FIXADDR_TOP,			"Fixmap end" },
> -	{ PCI_IO_START,			"PCI I/O start" },
> -	{ PCI_IO_END,			"PCI I/O end" },
>  	{ VMEMMAP_START,		"vmemmap start" },
>  	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
> +	{ PCI_IO_START,			"PCI I/O start" },
> +	{ PCI_IO_END,			"PCI I/O end" },
>  	{ -1,				NULL },
>  };
>  

I have not debugged this any further, but the PCI region displays no entries
via ptdump just after this patch being applied.

With the patch:

---[ Fixmap end ]---
0xfffffbfffe000000-0xfffffc0000000000          32M PMD
---[ vmemmap start ]---
0xfffffc0000000000-0xfffffc0002000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
0xfffffc0002000000-0xfffffc0020000000         480M PMD
0xfffffc0020000000-0xfffffc0022000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
0xfffffc0022000000-0xfffffc0040000000         480M PMD
0xfffffc0040000000-0xfffffc8000000000         511G PUD
0xfffffc8000000000-0xfffffe0000000000        1536G PGD
---[ vmemmap end ]---
0xfffffe0000000000-0xfffffe8000000000         512G PGD
---[ PCI I/O start ]---				
---[ PCI I/O end ]---					-----> No entries
0xfffffe8000000000-0x0000000000000000        1536G PGD

Without the patch:

--[ Fixmap start ]---
0xfffffbfffdc31000-0xfffffbfffddf6000        1812K PTE
0xfffffbfffddf6000-0xfffffbfffddf9000          12K PTE       ro x  SHD AF            UXN    MEM/NORMAL
0xfffffbfffddf9000-0xfffffbfffddfa000           4K PTE       ro NX SHD AF NG         UXN    MEM/NORMAL
0xfffffbfffddfa000-0xfffffbfffddfd000          12K PTE
0xfffffbfffddfd000-0xfffffbfffddfe000           4K PTE       RW NX SHD AF NG         UXN    DEVICE/nGnRE
0xfffffbfffddfe000-0xfffffbfffde00000           8K PTE       ro NX SHD AF NG         UXN    MEM/NORMAL
0xfffffbfffde00000-0xfffffbfffe000000           2M PTE
---[ Fixmap end ]---
0xfffffbfffe000000-0xfffffbfffe800000           8M PMD
---[ PCI I/O start ]---
0xfffffbfffe800000-0xfffffbffff800000          16M PMD
---[ PCI I/O end ]---
0xfffffbffff800000-0xfffffc0000000000           8M PMD
---[ vmemmap start ]---
0xfffffc0000000000-0xfffffc0002000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
0xfffffc0002000000-0xfffffc0020000000         480M PMD
0xfffffc0020000000-0xfffffc0022000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
0xfffffc0022000000-0xfffffc0040000000         480M PMD
0xfffffc0040000000-0xfffffc8000000000         511G PUD
0xfffffc8000000000-0xfffffe0000000000        1536G PGD
---[ vmemmap end ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-11-30  7:59   ` Anshuman Khandual
@ 2023-11-30  8:02     ` Ard Biesheuvel
  2023-11-30  8:52       ` Anshuman Khandual
  0 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-30  8:02 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Mark Rutland, Ryan Roberts, Kees Cook

On Thu, 30 Nov 2023 at 08:59, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 11/29/23 16:46, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Move the PCI I/O region above the vmemmap region in the kernel's VA
> > space. This will permit us to reclaim the lower part of the vmemmap
> > region for vmalloc/vmap allocations when running a 52-bit VA capable
> > build on a 48-bit VA capable system.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/memory.h | 4 ++--
> >  arch/arm64/mm/ptdump.c          | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index b8d726f951ae..99caeff78e1a 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -49,8 +49,8 @@
> >  #define MODULES_VSIZE                (SZ_2G)
> >  #define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> >  #define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> > -#define PCI_IO_END           (VMEMMAP_START - SZ_8M)
> > -#define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
> > +#define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> > +#define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> >  #define FIXADDR_TOP          (VMEMMAP_START - SZ_32M)
> >
> >  #if VA_BITS > 48
> > diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> > index e305b6593c4e..d1df56d44f8a 100644
> > --- a/arch/arm64/mm/ptdump.c
> > +++ b/arch/arm64/mm/ptdump.c
> > @@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
> >       { VMALLOC_END,                  "vmalloc() end" },
> >       { FIXADDR_TOT_START,            "Fixmap start" },
> >       { FIXADDR_TOP,                  "Fixmap end" },
> > -     { PCI_IO_START,                 "PCI I/O start" },
> > -     { PCI_IO_END,                   "PCI I/O end" },
> >       { VMEMMAP_START,                "vmemmap start" },
> >       { VMEMMAP_START + VMEMMAP_SIZE, "vmemmap end" },
> > +     { PCI_IO_START,                 "PCI I/O start" },
> > +     { PCI_IO_END,                   "PCI I/O end" },
> >       { -1,                           NULL },
> >  };
> >

Hi Anshuman,

>
> I have not debugged this any further, but the PCI region displays no entries
> via ptdump just after this patch being applied.
>

It also does not display any mappings without the patch. What exactly
were you expecting to see here?

> With the patch:
>
> ---[ Fixmap end ]---
> 0xfffffbfffe000000-0xfffffc0000000000          32M PMD
> ---[ vmemmap start ]---
> 0xfffffc0000000000-0xfffffc0002000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
> 0xfffffc0002000000-0xfffffc0020000000         480M PMD
> 0xfffffc0020000000-0xfffffc0022000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
> 0xfffffc0022000000-0xfffffc0040000000         480M PMD
> 0xfffffc0040000000-0xfffffc8000000000         511G PUD
> 0xfffffc8000000000-0xfffffe0000000000        1536G PGD
> ---[ vmemmap end ]---
> 0xfffffe0000000000-0xfffffe8000000000         512G PGD
> ---[ PCI I/O start ]---
> ---[ PCI I/O end ]---                                   -----> No entries
> 0xfffffe8000000000-0x0000000000000000        1536G PGD
>
> Without the patch:
>
> --[ Fixmap start ]---
> 0xfffffbfffdc31000-0xfffffbfffddf6000        1812K PTE
> 0xfffffbfffddf6000-0xfffffbfffddf9000          12K PTE       ro x  SHD AF            UXN    MEM/NORMAL
> 0xfffffbfffddf9000-0xfffffbfffddfa000           4K PTE       ro NX SHD AF NG         UXN    MEM/NORMAL
> 0xfffffbfffddfa000-0xfffffbfffddfd000          12K PTE
> 0xfffffbfffddfd000-0xfffffbfffddfe000           4K PTE       RW NX SHD AF NG         UXN    DEVICE/nGnRE
> 0xfffffbfffddfe000-0xfffffbfffde00000           8K PTE       ro NX SHD AF NG         UXN    MEM/NORMAL
> 0xfffffbfffde00000-0xfffffbfffe000000           2M PTE
> ---[ Fixmap end ]---
> 0xfffffbfffe000000-0xfffffbfffe800000           8M PMD
> ---[ PCI I/O start ]---
> 0xfffffbfffe800000-0xfffffbffff800000          16M PMD
> ---[ PCI I/O end ]---
> 0xfffffbffff800000-0xfffffc0000000000           8M PMD
> ---[ vmemmap start ]---
> 0xfffffc0000000000-0xfffffc0002000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
> 0xfffffc0002000000-0xfffffc0020000000         480M PMD
> 0xfffffc0020000000-0xfffffc0022000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
> 0xfffffc0022000000-0xfffffc0040000000         480M PMD
> 0xfffffc0040000000-0xfffffc8000000000         511G PUD
> 0xfffffc8000000000-0xfffffe0000000000        1536G PGD
> ---[ vmemmap end ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-11-30  8:02     ` Ard Biesheuvel
@ 2023-11-30  8:52       ` Anshuman Khandual
  2023-11-30  8:56         ` Ard Biesheuvel
  0 siblings, 1 reply; 63+ messages in thread
From: Anshuman Khandual @ 2023-11-30  8:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Mark Rutland, Ryan Roberts, Kees Cook



On 11/30/23 13:32, Ard Biesheuvel wrote:
> On Thu, 30 Nov 2023 at 08:59, Anshuman Khandual
> <anshuman.khandual@arm.com> wrote:
>>
>>
>>
>> On 11/29/23 16:46, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>
>>> Move the PCI I/O region above the vmemmap region in the kernel's VA
>>> space. This will permit us to reclaim the lower part of the vmemmap
>>> region for vmalloc/vmap allocations when running a 52-bit VA capable
>>> build on a 48-bit VA capable system.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> ---
>>>  arch/arm64/include/asm/memory.h | 4 ++--
>>>  arch/arm64/mm/ptdump.c          | 4 ++--
>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>> index b8d726f951ae..99caeff78e1a 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -49,8 +49,8 @@
>>>  #define MODULES_VSIZE                (SZ_2G)
>>>  #define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
>>>  #define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
>>> -#define PCI_IO_END           (VMEMMAP_START - SZ_8M)
>>> -#define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>> +#define PCI_IO_START         (VMEMMAP_END + SZ_8M)
>>> +#define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
>>>  #define FIXADDR_TOP          (VMEMMAP_START - SZ_32M)
>>>
>>>  #if VA_BITS > 48
>>> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
>>> index e305b6593c4e..d1df56d44f8a 100644
>>> --- a/arch/arm64/mm/ptdump.c
>>> +++ b/arch/arm64/mm/ptdump.c
>>> @@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
>>>       { VMALLOC_END,                  "vmalloc() end" },
>>>       { FIXADDR_TOT_START,            "Fixmap start" },
>>>       { FIXADDR_TOP,                  "Fixmap end" },
>>> -     { PCI_IO_START,                 "PCI I/O start" },
>>> -     { PCI_IO_END,                   "PCI I/O end" },
>>>       { VMEMMAP_START,                "vmemmap start" },
>>>       { VMEMMAP_START + VMEMMAP_SIZE, "vmemmap end" },
>>> +     { PCI_IO_START,                 "PCI I/O start" },
>>> +     { PCI_IO_END,                   "PCI I/O end" },
>>>       { -1,                           NULL },
>>>  };
>>>
> 
> Hi Anshuman,
> 
>>
>> I have not debugged this any further, but the PCI region displays no entries
>> via ptdump just after this patch being applied.
>>
> 
> It also does not display any mappings without the patch. What exactly
> were you expecting to see here?

Is not there a single entry inside - without this patch section. OR does
it signify the gap-width of the PCI mapping area ?

---[ PCI I/O start ]---
0xfffffbfffe800000-0xfffffbffff800000          16M PMD
---[ PCI I/O end ]---

> 
>> With the patch:
>>
>> ---[ Fixmap end ]---
>> 0xfffffbfffe000000-0xfffffc0000000000          32M PMD
>> ---[ vmemmap start ]---
>> 0xfffffc0000000000-0xfffffc0002000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
>> 0xfffffc0002000000-0xfffffc0020000000         480M PMD
>> 0xfffffc0020000000-0xfffffc0022000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
>> 0xfffffc0022000000-0xfffffc0040000000         480M PMD
>> 0xfffffc0040000000-0xfffffc8000000000         511G PUD
>> 0xfffffc8000000000-0xfffffe0000000000        1536G PGD
>> ---[ vmemmap end ]---
>> 0xfffffe0000000000-0xfffffe8000000000         512G PGD
>> ---[ PCI I/O start ]---
>> ---[ PCI I/O end ]---                                   -----> No entries
>> 0xfffffe8000000000-0x0000000000000000        1536G PGD
>>
>> Without the patch:
>>
>> --[ Fixmap start ]---
>> 0xfffffbfffdc31000-0xfffffbfffddf6000        1812K PTE
>> 0xfffffbfffddf6000-0xfffffbfffddf9000          12K PTE       ro x  SHD AF            UXN    MEM/NORMAL
>> 0xfffffbfffddf9000-0xfffffbfffddfa000           4K PTE       ro NX SHD AF NG         UXN    MEM/NORMAL
>> 0xfffffbfffddfa000-0xfffffbfffddfd000          12K PTE
>> 0xfffffbfffddfd000-0xfffffbfffddfe000           4K PTE       RW NX SHD AF NG         UXN    DEVICE/nGnRE
>> 0xfffffbfffddfe000-0xfffffbfffde00000           8K PTE       ro NX SHD AF NG         UXN    MEM/NORMAL
>> 0xfffffbfffde00000-0xfffffbfffe000000           2M PTE
>> ---[ Fixmap end ]---
>> 0xfffffbfffe000000-0xfffffbfffe800000           8M PMD
>> ---[ PCI I/O start ]---
>> 0xfffffbfffe800000-0xfffffbffff800000          16M PMD
>> ---[ PCI I/O end ]---
>> 0xfffffbffff800000-0xfffffc0000000000           8M PMD
>> ---[ vmemmap start ]---
>> 0xfffffc0000000000-0xfffffc0002000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
>> 0xfffffc0002000000-0xfffffc0020000000         480M PMD
>> 0xfffffc0020000000-0xfffffc0022000000          32M PMD       RW NX SHD AF NG     BLK UXN    MEM/NORMAL
>> 0xfffffc0022000000-0xfffffc0040000000         480M PMD
>> 0xfffffc0040000000-0xfffffc8000000000         511G PUD
>> 0xfffffc8000000000-0xfffffe0000000000        1536G PGD
>> ---[ vmemmap end ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-11-30  8:52       ` Anshuman Khandual
@ 2023-11-30  8:56         ` Ard Biesheuvel
  0 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-11-30  8:56 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Mark Rutland, Ryan Roberts, Kees Cook

On Thu, 30 Nov 2023 at 09:52, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 11/30/23 13:32, Ard Biesheuvel wrote:
> > On Thu, 30 Nov 2023 at 08:59, Anshuman Khandual
> > <anshuman.khandual@arm.com> wrote:
> >>
> >>
> >>
> >> On 11/29/23 16:46, Ard Biesheuvel wrote:
> >>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>
> >>> Move the PCI I/O region above the vmemmap region in the kernel's VA
> >>> space. This will permit us to reclaim the lower part of the vmemmap
> >>> region for vmalloc/vmap allocations when running a 52-bit VA capable
> >>> build on a 48-bit VA capable system.
> >>>
> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >>> ---
> >>>  arch/arm64/include/asm/memory.h | 4 ++--
> >>>  arch/arm64/mm/ptdump.c          | 4 ++--
> >>>  2 files changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> >>> index b8d726f951ae..99caeff78e1a 100644
> >>> --- a/arch/arm64/include/asm/memory.h
> >>> +++ b/arch/arm64/include/asm/memory.h
> >>> @@ -49,8 +49,8 @@
> >>>  #define MODULES_VSIZE                (SZ_2G)
> >>>  #define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> >>>  #define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> >>> -#define PCI_IO_END           (VMEMMAP_START - SZ_8M)
> >>> -#define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
> >>> +#define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> >>> +#define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> >>>  #define FIXADDR_TOP          (VMEMMAP_START - SZ_32M)
> >>>
> >>>  #if VA_BITS > 48
> >>> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> >>> index e305b6593c4e..d1df56d44f8a 100644
> >>> --- a/arch/arm64/mm/ptdump.c
> >>> +++ b/arch/arm64/mm/ptdump.c
> >>> @@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
> >>>       { VMALLOC_END,                  "vmalloc() end" },
> >>>       { FIXADDR_TOT_START,            "Fixmap start" },
> >>>       { FIXADDR_TOP,                  "Fixmap end" },
> >>> -     { PCI_IO_START,                 "PCI I/O start" },
> >>> -     { PCI_IO_END,                   "PCI I/O end" },
> >>>       { VMEMMAP_START,                "vmemmap start" },
> >>>       { VMEMMAP_START + VMEMMAP_SIZE, "vmemmap end" },
> >>> +     { PCI_IO_START,                 "PCI I/O start" },
> >>> +     { PCI_IO_END,                   "PCI I/O end" },
> >>>       { -1,                           NULL },
> >>>  };
> >>>
> >
> > Hi Anshuman,
> >
> >>
> >> I have not debugged this any further, but the PCI region displays no entries
> >> via ptdump just after this patch being applied.
> >>
> >
> > It also does not display any mappings without the patch. What exactly
> > were you expecting to see here?
>
> Is not there a single entry inside - without this patch section. OR does
> it signify the gap-width of the PCI mapping area ?
>
> ---[ PCI I/O start ]---
> 0xfffffbfffe800000-0xfffffbffff800000          16M PMD
> ---[ PCI I/O end ]---
>

This means that the PCI I/O space is covered by 8 invalid level 2 entries.


> >> 0xfffffe0000000000-0xfffffe8000000000         512G PGD
> >> ---[ PCI I/O start ]---
> >> ---[ PCI I/O end ]---                                   -----> No entries
> >> 0xfffffe8000000000-0x0000000000000000        1536G PGD

This means that the PCI I/O space is covered by 1 invalid level 0 entry.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off
  2023-11-29 11:15 ` [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off Ard Biesheuvel
  2023-11-30  5:23   ` Anshuman Khandual
@ 2023-12-04 14:12   ` Mark Rutland
  2023-12-04 15:40     ` Ard Biesheuvel
  1 sibling, 1 reply; 63+ messages in thread
From: Mark Rutland @ 2023-12-04 14:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, Nov 29, 2023 at 12:15:58PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is
> disabled, and this permits the loader (i.e., EFI) to place the kernel
> anywhere in physical memory as long as the base address is 64k aligned.

I don't think that case is something we actually intend to permit today:

(a) When CONFIG_RANDOMIZE_BASE=n, the EFI stub will load the kernel at SZ_2M
    alignment. We initialize efi_nokaslr to !IS_ENABLED(CONFIG_RANDOMIZE_BASE),
    and so arm64's efi_get_kimg_min_align() will return SZ_2M.

    ... unless I'm missing something there?

(b) We don't expose anything in the Image header such that an external
    bootloader (i.e. not the EFI stub) can decide that 64K alignment is
    sufficient. It would be unsound for a bootloader to load the kernel at less
    than 2M alignment.

(c) We never documented 64K alignment as being permitted. In booting.txt we say
    "The Image must be placed text_offset bytes from a 2MB aligned base address
    anywhere in usable system RAM and called there.", with no mention of a
    relaxation down to 64K.

... so I don't think this patch is necessary, unless it's going to make
something else simpler later in the series?

Mark.

> This means that the 'KASLR' case described in the header that defines
> the size of the statically allocated page tables could take effect even
> when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/kernel-pgtable.h | 27 +++++---------------
>  1 file changed, 6 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index 85d26143faa5..83ddb14b95a5 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -37,27 +37,12 @@
>  
>  
>  /*
> - * If KASLR is enabled, then an offset K is added to the kernel address
> - * space. The bottom 21 bits of this offset are zero to guarantee 2MB
> - * alignment for PA and VA.
> - *
> - * For each pagetable level of the swapper, we know that the shift will
> - * be larger than 21 (for the 4KB granule case we use section maps thus
> - * the smallest shift is actually 30) thus there is the possibility that
> - * KASLR can increase the number of pagetable entries by 1, so we make
> - * room for this extra entry.
> - *
> - * Note KASLR cannot increase the number of required entries for a level
> - * by more than one because it increments both the virtual start and end
> - * addresses equally (the extra entry comes from the case where the end
> - * address is just pushed over a boundary and the start address isn't).
> + * A relocatable kernel may execute from an address that differs from the one at
> + * which it was linked. In the worst case, its runtime placement may intersect
> + * with two adjacent PGDIR entries, which means that an additional page table
> + * may be needed at each subordinate level.
>   */
> -
> -#ifdef CONFIG_RANDOMIZE_BASE
> -#define EARLY_KASLR	(1)
> -#else
> -#define EARLY_KASLR	(0)
> -#endif
> +#define EXTRA_PAGE	__is_defined(CONFIG_RELOCATABLE)
>  
>  #define SPAN_NR_ENTRIES(vstart, vend, shift) \
>  	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
> @@ -83,7 +68,7 @@
>  			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
>  			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
>  			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
> -#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR))
> +#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE))
>  
>  /* the initial ID map may need two extra pages if it needs to be extended */
>  #if VA_BITS < 48
> -- 
> 2.43.0.rc1.413.gea7ed67945-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable
  2023-11-30  5:38   ` Anshuman Khandual
@ 2023-12-04 14:37     ` Mark Rutland
  2023-12-05  2:26       ` Anshuman Khandual
  0 siblings, 1 reply; 63+ messages in thread
From: Mark Rutland @ 2023-12-04 14:37 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Ard Biesheuvel, linux-arm-kernel, Ard Biesheuvel, Catalin Marinas,
	Will Deacon, Marc Zyngier, Ryan Roberts, Kees Cook

On Thu, Nov 30, 2023 at 11:08:59AM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/29/23 16:45, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> > 
> > We store the address of _text in kimage_vaddr, but since commit
> > 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings
> > decision"), we no longer reference this variable from modules so we no
> > longer need to export it.
> > 
> > In fact, we don't need it at all so let's just get rid of it.
> > 
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> 
> There is a checkpatch.pl error for this patch.
> 
> --------
> ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' -
> ie: 'commit 09e3c22a86f6 ("arm64: Use a variable to store non-global mappings decision")'
> #6: 
> We store the address of _text in kimage_vaddr, but since commit
> --------

That looks like a spurious warning. Ard wrote:

  [...] commit 09e3c22a86f6889d ("arm64: Use a variable to store non-global
  mappings decision"), [...]

... which is 'commit <16 chars of sha1> ("<title line>")', just as the
checkpatch message says.

Clearly, this checkpatch warning isn't helpful.

Regardless of whether the warning is spurious, please use judgement for
checkpatch warnings. They're not always a hard rule to be followed.

Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off
  2023-12-04 14:12   ` Mark Rutland
@ 2023-12-04 15:40     ` Ard Biesheuvel
  0 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-12-04 15:40 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Mon, 4 Dec 2023 at 15:12, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Nov 29, 2023 at 12:15:58PM +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is
> > disabled, and this permits the loader (i.e., EFI) to place the kernel
> > anywhere in physical memory as long as the base address is 64k aligned.
>
> I don't think that case is something we actually intend to permit today:
>
> (a) When CONFIG_RANDOMIZE_BASE=n, the EFI stub will load the kernel at SZ_2M
>     alignment. We initialize efi_nokaslr to !IS_ENABLED(CONFIG_RANDOMIZE_BASE),
>     and so arm64's efi_get_kimg_min_align() will return SZ_2M.
>
>     ... unless I'm missing something there?
>
> (b) We don't expose anything in the Image header such that an external
>     bootloader (i.e. not the EFI stub) can decide that 64K alignment is
>     sufficient. It would be unsound for a bootloader to load the kernel at less
>     than 2M alignment.
>
> (c) We never documented 64K alignment as being permitted. In booting.txt we say
>     "The Image must be placed text_offset bytes from a 2MB aligned base address
>     anywhere in usable system RAM and called there.", with no mention of a
>     relaxation down to 64K.
>
> ... so I don't think this patch is necessary, unless it's going to make
> something else simpler later in the series?
>

Your analysis is correct afaict, and I got things confused as to
whether/how EFI will relocate the image.

However, we do have a quirk that permits loading with a hard coded
TEXT_OFFSET of 512k even if the TEXT_OFFSET in the header is 0x0. This
is the reason why we enabled CONFIG_RELOCATABLE without
CONFIG_RANDOMIZE_BASE by default.

I think that ultimately, the fact that CONFIG_RELOCATABLE can be
enabled without CONFIG_RANDOMIZE_BASE is sufficient grounds to add
this extra space.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable
  2023-12-04 14:37     ` Mark Rutland
@ 2023-12-05  2:26       ` Anshuman Khandual
  0 siblings, 0 replies; 63+ messages in thread
From: Anshuman Khandual @ 2023-12-05  2:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, linux-arm-kernel, Ard Biesheuvel, Catalin Marinas,
	Will Deacon, Marc Zyngier, Ryan Roberts, Kees Cook



On 12/4/23 20:07, Mark Rutland wrote:
> On Thu, Nov 30, 2023 at 11:08:59AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 11/29/23 16:45, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>
>>> We store the address of _text in kimage_vaddr, but since commit
>>> 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings
>>> decision"), we no longer reference this variable from modules so we no
>>> longer need to export it.
>>>
>>> In fact, we don't need it at all so let's just get rid of it.
>>>
>>> Acked-by: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>
>> There is a checkpatch.pl error for this patch.
>>
>> --------
>> ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' -
>> ie: 'commit 09e3c22a86f6 ("arm64: Use a variable to store non-global mappings decision")'
>> #6: 
>> We store the address of _text in kimage_vaddr, but since commit
>> --------
> 
> That looks like a spurious warning. Ard wrote:
> 
>   [...] commit 09e3c22a86f6889d ("arm64: Use a variable to store non-global
>   mappings decision"), [...]
> 
> ... which is 'commit <16 chars of sha1> ("<title line>")', just as the
> checkpatch message says.

Although it does not make any difference TBH, sometimes moving 'commit'
before the SHA string on the same line just removes the warning. That
line break might be the trigger.

> 
> Clearly, this checkpatch warning isn't helpful.
> 
> Regardless of whether the warning is spurious, please use judgement for
> checkpatch warnings. They're not always a hard rule to be followed.
> 
> Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-11-29 11:16 ` [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region Ard Biesheuvel
  2023-11-30  7:59   ` Anshuman Khandual
@ 2023-12-11 13:57   ` Mark Rutland
  2023-12-11 14:10     ` Ard Biesheuvel
  1 sibling, 1 reply; 63+ messages in thread
From: Mark Rutland @ 2023-12-11 13:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

Hi Ard,

On Wed, Nov 29, 2023 at 12:16:00PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Move the PCI I/O region above the vmemmap region in the kernel's VA
> space. This will permit us to reclaim the lower part of the vmemmap
> region for vmalloc/vmap allocations when running a 52-bit VA capable
> build on a 48-bit VA capable system.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/memory.h | 4 ++--
>  arch/arm64/mm/ptdump.c          | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index b8d726f951ae..99caeff78e1a 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -49,8 +49,8 @@
>  #define MODULES_VSIZE		(SZ_2G)
>  #define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
>  #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
> -#define PCI_IO_END		(VMEMMAP_START - SZ_8M)
> -#define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
> +#define PCI_IO_START		(VMEMMAP_END + SZ_8M)
> +#define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
>  #define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)

This looks simple, but it does have a subtle dependency on VMEMMAP_SIZE being
sufficiently smaller than the negative offset we choose for VMEMMAP_START, and
it took mw a while to convince myself that we always ensure that today. I could
well imagine that we try to reduce that in future (given for most
configurations we're reserving around double the space we actually need), and
it would be nice if we could avoid that going wrong.

We need to make sure that PCI_IO_START and PCI_IO_END fall between VMEMMAP_END
and the last page of the VA space (due to ERR_PTR). Could we add some comment
and/or assertions to that effect?

Mark.

>  
>  #if VA_BITS > 48
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index e305b6593c4e..d1df56d44f8a 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
>  	{ VMALLOC_END,			"vmalloc() end" },
>  	{ FIXADDR_TOT_START,		"Fixmap start" },
>  	{ FIXADDR_TOP,			"Fixmap end" },
> -	{ PCI_IO_START,			"PCI I/O start" },
> -	{ PCI_IO_END,			"PCI I/O end" },
>  	{ VMEMMAP_START,		"vmemmap start" },
>  	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
> +	{ PCI_IO_START,			"PCI I/O start" },
> +	{ PCI_IO_END,			"PCI I/O end" },
>  	{ -1,				NULL },
>  };
>  
> -- 
> 2.43.0.rc1.413.gea7ed67945-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-12-11 13:57   ` Mark Rutland
@ 2023-12-11 14:10     ` Ard Biesheuvel
  2023-12-11 14:21       ` Mark Rutland
  0 siblings, 1 reply; 63+ messages in thread
From: Ard Biesheuvel @ 2023-12-11 14:10 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Mon, 11 Dec 2023 at 14:57, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Ard,
>
> On Wed, Nov 29, 2023 at 12:16:00PM +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Move the PCI I/O region above the vmemmap region in the kernel's VA
> > space. This will permit us to reclaim the lower part of the vmemmap
> > region for vmalloc/vmap allocations when running a 52-bit VA capable
> > build on a 48-bit VA capable system.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/memory.h | 4 ++--
> >  arch/arm64/mm/ptdump.c          | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index b8d726f951ae..99caeff78e1a 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -49,8 +49,8 @@
> >  #define MODULES_VSIZE                (SZ_2G)
> >  #define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> >  #define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> > -#define PCI_IO_END           (VMEMMAP_START - SZ_8M)
> > -#define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
> > +#define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> > +#define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> >  #define FIXADDR_TOP          (VMEMMAP_START - SZ_32M)
>
> This looks simple, but it does have a subtle dependency on VMEMMAP_SIZE being
> sufficiently smaller than the negative offset we choose for VMEMMAP_START, and
> it took mw a while to convince myself that we always ensure that today. I could
> well imagine that we try to reduce that in future (given for most
> configurations we're reserving around double the space we actually need), and
> it would be nice if we could avoid that going wrong.
>

Yes, we are relying on the fact that the upper region is not covered
by struct pages, so there is guaranteed to be an unused region at the
very top.

One of the constants in the equation that describes the size of this
unused region is the size of a struct page, and so I agree this needs
a couple of static asserts to ensure we don't get inadvertent
overlaps.


> We need to make sure that PCI_IO_START and PCI_IO_END fall between VMEMMAP_END
> and the last page of the VA space (due to ERR_PTR). Could we add some comment
> and/or assertions to that effect?
>
> Mark.
>
> >
> >  #if VA_BITS > 48
> > diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> > index e305b6593c4e..d1df56d44f8a 100644
> > --- a/arch/arm64/mm/ptdump.c
> > +++ b/arch/arm64/mm/ptdump.c
> > @@ -47,10 +47,10 @@ static struct addr_marker address_markers[] = {
> >       { VMALLOC_END,                  "vmalloc() end" },
> >       { FIXADDR_TOT_START,            "Fixmap start" },
> >       { FIXADDR_TOP,                  "Fixmap end" },
> > -     { PCI_IO_START,                 "PCI I/O start" },
> > -     { PCI_IO_END,                   "PCI I/O end" },
> >       { VMEMMAP_START,                "vmemmap start" },
> >       { VMEMMAP_START + VMEMMAP_SIZE, "vmemmap end" },
> > +     { PCI_IO_START,                 "PCI I/O start" },
> > +     { PCI_IO_END,                   "PCI I/O end" },
> >       { -1,                           NULL },
> >  };
> >
> > --
> > 2.43.0.rc1.413.gea7ed67945-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 06/41] arm64: ptdump: Allow all region boundaries to be defined at boot time
  2023-11-29 11:16 ` [PATCH v6 06/41] arm64: ptdump: Allow all region boundaries to be defined at boot time Ard Biesheuvel
@ 2023-12-11 14:15   ` Mark Rutland
  0 siblings, 0 replies; 63+ messages in thread
From: Mark Rutland @ 2023-12-11 14:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, Nov 29, 2023 at 12:16:02PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Rework the way the address_markers array is populated so that we can
> tolerate values that are not compile time constants generally, rather
> than keeping track manually of the array indexes in question, and poking
> new values into them manually. This will be needed for VMALLOC_END,
> which will cease to be a compile time constant after a subsequent patch.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/mm/ptdump.c | 54 ++++++++------------
>  1 file changed, 22 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index 3958b008f908..bfc307890344 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -26,34 +26,6 @@
>  #include <asm/ptdump.h>
>  
>  
> -enum address_markers_idx {
> -	PAGE_OFFSET_NR = 0,
> -	PAGE_END_NR,
> -#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
> -	KASAN_START_NR,
> -#endif
> -};
> -
> -static struct addr_marker address_markers[] = {
> -	{ PAGE_OFFSET,			"Linear Mapping start" },
> -	{ 0 /* PAGE_END */,		"Linear Mapping end" },
> -#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
> -	{ 0 /* KASAN_SHADOW_START */,	"Kasan shadow start" },
> -	{ KASAN_SHADOW_END,		"Kasan shadow end" },
> -#endif
> -	{ MODULES_VADDR,		"Modules start" },
> -	{ MODULES_END,			"Modules end" },
> -	{ VMALLOC_START,		"vmalloc() area" },
> -	{ VMALLOC_END,			"vmalloc() end" },
> -	{ VMEMMAP_START,		"vmemmap start" },
> -	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
> -	{ PCI_IO_START,			"PCI I/O start" },
> -	{ PCI_IO_END,			"PCI I/O end" },
> -	{ FIXADDR_TOT_START,		"Fixmap start" },
> -	{ FIXADDR_TOP,			"Fixmap end" },
> -	{ -1,				NULL },
> -};
> -
>  #define pt_dump_seq_printf(m, fmt, args...)	\
>  ({						\
>  	if (m)					\
> @@ -339,9 +311,8 @@ static void __init ptdump_initialize(void)
>  				pg_level[i].mask |= pg_level[i].bits[j].mask;
>  }
>  
> -static struct ptdump_info kernel_ptdump_info = {
> +static struct ptdump_info kernel_ptdump_info __ro_after_init = {
>  	.mm		= &init_mm,
> -	.markers	= address_markers,
>  	.base_addr	= PAGE_OFFSET,
>  };
>  
> @@ -375,10 +346,29 @@ void ptdump_check_wx(void)
>  
>  static int __init ptdump_init(void)
>  {
> -	address_markers[PAGE_END_NR].start_address = PAGE_END;
> +	struct addr_marker m[] = {
> +		{ PAGE_OFFSET,			"Linear Mapping start" },
> +		{ PAGE_END,			"Linear Mapping end" },
>  #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
> -	address_markers[KASAN_START_NR].start_address = KASAN_SHADOW_START;
> +		{ KASAN_SHADOW_START,		"Kasan shadow start" },
> +		{ KASAN_SHADOW_END,		"Kasan shadow end" },
>  #endif
> +		{ MODULES_VADDR,		"Modules start" },
> +		{ MODULES_END,			"Modules end" },
> +		{ VMALLOC_START,		"vmalloc() area" },
> +		{ VMALLOC_END,			"vmalloc() end" },
> +		{ VMEMMAP_START,		"vmemmap start" },
> +		{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
> +		{ PCI_IO_START,			"PCI I/O start" },
> +		{ PCI_IO_END,			"PCI I/O end" },
> +		{ FIXADDR_TOT_START,		"Fixmap start" },
> +		{ FIXADDR_TOP,			"Fixmap end" },
> +		{ -1,				NULL },
> +	};
> +	static struct addr_marker address_markers[ARRAY_SIZE(m)] __ro_after_init;
> +
> +	kernel_ptdump_info.markers = memcpy(address_markers, m, sizeof(m));

That is a neat trick.

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> +
>  	ptdump_initialize();
>  	ptdump_debugfs_register(&kernel_ptdump_info, "kernel_page_tables");
>  	return 0;
> -- 
> 2.43.0.rc1.413.gea7ed67945-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region
  2023-12-11 14:10     ` Ard Biesheuvel
@ 2023-12-11 14:21       ` Mark Rutland
  0 siblings, 0 replies; 63+ messages in thread
From: Mark Rutland @ 2023-12-11 14:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Mon, Dec 11, 2023 at 03:10:00PM +0100, Ard Biesheuvel wrote:
> On Mon, 11 Dec 2023 at 14:57, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Hi Ard,
> >
> > On Wed, Nov 29, 2023 at 12:16:00PM +0100, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Move the PCI I/O region above the vmemmap region in the kernel's VA
> > > space. This will permit us to reclaim the lower part of the vmemmap
> > > region for vmalloc/vmap allocations when running a 52-bit VA capable
> > > build on a 48-bit VA capable system.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/memory.h | 4 ++--
> > >  arch/arm64/mm/ptdump.c          | 4 ++--
> > >  2 files changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > > index b8d726f951ae..99caeff78e1a 100644
> > > --- a/arch/arm64/include/asm/memory.h
> > > +++ b/arch/arm64/include/asm/memory.h
> > > @@ -49,8 +49,8 @@
> > >  #define MODULES_VSIZE                (SZ_2G)
> > >  #define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> > >  #define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> > > -#define PCI_IO_END           (VMEMMAP_START - SZ_8M)
> > > -#define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
> > > +#define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> > > +#define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> > >  #define FIXADDR_TOP          (VMEMMAP_START - SZ_32M)
> >
> > This looks simple, but it does have a subtle dependency on VMEMMAP_SIZE being
> > sufficiently smaller than the negative offset we choose for VMEMMAP_START, and
> > it took mw a while to convince myself that we always ensure that today. I could
> > well imagine that we try to reduce that in future (given for most
> > configurations we're reserving around double the space we actually need), and
> > it would be nice if we could avoid that going wrong.
> >
> 
> Yes, we are relying on the fact that the upper region is not covered
> by struct pages, so there is guaranteed to be an unused region at the
> very top.

Yep; exactly. My fear is that in future someone realises that:

	(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))

... is actually reserving ~2x the space it needs for *most* configurations (since for 
most configurations we only need the VMEMMAP to account for (VA_BITS-1) bits of
linear map). If they try to shrink the VMEMMAP region in for those cases, it's
liable to overlap subsequent regions and/or to wrap those past the end of the
VA space.

> One of the constants in the equation that describes the size of this
> unused region is the size of a struct page, and so I agree this needs
> a couple of static asserts to ensure we don't get inadvertent
> overlaps.

That'd be great, thanks!

Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 05/41] arm64: mm: Move fixmap region above vmemmap region
  2023-11-29 11:16 ` [PATCH v6 05/41] arm64: mm: Move fixmap region above " Ard Biesheuvel
@ 2023-12-11 14:23   ` Mark Rutland
  0 siblings, 0 replies; 63+ messages in thread
From: Mark Rutland @ 2023-12-11 14:23 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, Nov 29, 2023 at 12:16:01PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Move the fixmap region above the vmemmap region, so that the start of
> the vmemmap delineates the end of the region available for vmalloc and
> vmap allocations and the randomized placement of the kernel and modules.
> 
> In a subsequent patch, we will take advantage of this to reclaim most of
> the vmemmap area when running a 52-bit VA capable build with 52-bit
> virtual addressing disabled at runtime.
> 
> Note that the existing guard region of 256 MiB covers the fixmap and PCI
> I/O regions as well, so we can reduce it 8 MiB, which is what we use in
> other places too.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

As with the prior patch, this looks simple enough, but it'd be good if we could
do something to assert that we don't overlap this region with others.

Mark.

> ---
>  arch/arm64/include/asm/memory.h  | 2 +-
>  arch/arm64/include/asm/pgtable.h | 2 +-
>  arch/arm64/mm/ptdump.c           | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 99caeff78e1a..2745bed8ae5b 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,7 +51,7 @@
>  #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
>  #define PCI_IO_START		(VMEMMAP_END + SZ_8M)
>  #define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
> -#define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)
> +#define FIXADDR_TOP		(-UL(SZ_8M))
>  
>  #if VA_BITS > 48
>  #define VA_BITS_MIN		(48)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index b19a8aee684c..8d30e2787b1f 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -22,7 +22,7 @@
>   *	and fixed mappings
>   */
>  #define VMALLOC_START		(MODULES_END)
> -#define VMALLOC_END		(VMEMMAP_START - SZ_256M)
> +#define VMALLOC_END		(VMEMMAP_START - SZ_8M)
>  
>  #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
>  
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index d1df56d44f8a..3958b008f908 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -45,12 +45,12 @@ static struct addr_marker address_markers[] = {
>  	{ MODULES_END,			"Modules end" },
>  	{ VMALLOC_START,		"vmalloc() area" },
>  	{ VMALLOC_END,			"vmalloc() end" },
> -	{ FIXADDR_TOT_START,		"Fixmap start" },
> -	{ FIXADDR_TOP,			"Fixmap end" },
>  	{ VMEMMAP_START,		"vmemmap start" },
>  	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
>  	{ PCI_IO_START,			"PCI I/O start" },
>  	{ PCI_IO_END,			"PCI I/O end" },
> +	{ FIXADDR_TOT_START,		"Fixmap start" },
> +	{ FIXADDR_TOP,			"Fixmap end" },
>  	{ -1,				NULL },
>  };
>  
> -- 
> 2.43.0.rc1.413.gea7ed67945-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region
  2023-11-29 11:16 ` [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region Ard Biesheuvel
@ 2023-12-11 14:35   ` Mark Rutland
  2023-12-12 21:34     ` Ard Biesheuvel
  0 siblings, 1 reply; 63+ messages in thread
From: Mark Rutland @ 2023-12-11 14:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, Nov 29, 2023 at 12:16:04PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> The placement and size of the vmemmap region in the kernel virtual
> address space is currently derived from the base2 order of the size of a
> struct page. This makes for nicely aligned constants with lots of
> leading 0xf and trailing 0x0 digits, but given that the actual struct
> pages are indexed as an ordinary array, this resulting region is
> severely overdimensioned when the size of a struct page is just over a
> power of 2.
> 
> This doesn't matter today, but once we enable 52-bit virtual addressing
> for 4k pages configurations, the vmemmap region may take up almost half
> of the upper VA region with the current struct page upper bound at 64
> bytes. And once we enable KMSAN or other features that push the size of
> a struct page over 64 bytes, we will run out of VMALLOC space entirely.
> 
> So instead, let's derive the region size from the actual size of a
> struct page, and place the entire region 1 GB from the top of the VA
> space, where it still doesn't share any lower level translation table
> entries with the fixmap.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

This is nice, and largely addresses the fear I mentioned on the earlier patches
move the PCI_IO_* and FIXMAP_* regions. I'd still like to ensure that we have
assertions that those don't overlap with the VMEMMAP region, but assuming those
get added to the earlier patches, this looks good as-is:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/memory.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 2745bed8ae5b..b49575a92afc 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -30,8 +30,8 @@
>   * keep a constant PAGE_OFFSET and "fallback" to using the higher end
>   * of the VMEMMAP where 52-bit support is not available in hardware.
>   */
> -#define VMEMMAP_SHIFT	(PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
> -#define VMEMMAP_SIZE	((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
> +#define VMEMMAP_RANGE	(_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
> +#define VMEMMAP_SIZE	((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
>  
>  /*
>   * PAGE_OFFSET - the virtual address of the start of the linear map, at the
> @@ -47,8 +47,8 @@
>  #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
>  #define MODULES_VADDR		(_PAGE_END(VA_BITS_MIN))
>  #define MODULES_VSIZE		(SZ_2G)
> -#define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> -#define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
> +#define VMEMMAP_START		(VMEMMAP_END - VMEMMAP_SIZE)
> +#define VMEMMAP_END		(-UL(SZ_1G))
>  #define PCI_IO_START		(VMEMMAP_END + SZ_8M)
>  #define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
>  #define FIXADDR_TOP		(-UL(SZ_8M))
> -- 
> 2.43.0.rc1.413.gea7ed67945-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2
  2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
                   ` (40 preceding siblings ...)
  2023-11-29 11:16 ` [PATCH v6 41/41] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()" Ard Biesheuvel
@ 2023-12-12 17:20 ` Will Deacon
  41 siblings, 0 replies; 63+ messages in thread
From: Will Deacon @ 2023-12-12 17:20 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: catalin.marinas, kernel-team, Will Deacon, Kees Cook,
	Ryan Roberts, Marc Zyngier, Anshuman Khandual, Mark Rutland,
	Ard Biesheuvel

Hi Ard,

I applied some of this:

On Wed, 29 Nov 2023 12:15:56 +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> At the request of Catalin, this series was split off from my LPA2 series
> [0] in order to make the changes a bit more manageable.
> 
> This series reorganizes the kernel VA space, and refactors/replaces the
> early mapping code so that:
> - everything is done only once, in the appropriate order;
> - everything is done with the MMU and caches enabled (*)
> - everything is done from C code (notably, 100s of lines of
>   incomprehensible asm code are removed from head.S).
> 
> [...]

Applied the first three to arm64 (for-next/lpa2-prep):

[01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
        https://git.kernel.org/arm64/c/3dfdc2750c6c
[02/41] arm64: mm: Take potential load offset into account when KASLR is off
        https://git.kernel.org/arm64/c/a22fc8e102dc
[03/41] arm64: mm: get rid of kimage_vaddr global variable
        https://git.kernel.org/arm64/c/376f5a3bd7e2

Applied the idreg-override part to arm64 (for-next/early-idreg-overrides):

[14/41] arm64: idreg-override: Omit non-NULL checks for override pointer
        https://git.kernel.org/arm64/c/cbc59c9a4e57
[15/41] arm64: idreg-override: Prepare for place relative reloc patching
        https://git.kernel.org/arm64/c/01fd29092a35
[16/41] arm64: idreg-override: Avoid parameq() and parameqn()
        https://git.kernel.org/arm64/c/dc3f5aae0638
[17/41] arm64: idreg-override: avoid strlen() to check for empty strings
        https://git.kernel.org/arm64/c/bcf1eed3f8a0
[18/41] arm64: idreg-override: Avoid sprintf() for simple string concatenation
        https://git.kernel.org/arm64/c/060260a6be47
[19/41] arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit
        https://git.kernel.org/arm64/c/ea48626f8f0e
[20/41] arm64/kernel: Move 'nokaslr' parsing out of early idreg code
        https://git.kernel.org/arm64/c/50f176175e96

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region
  2023-12-11 14:35   ` Mark Rutland
@ 2023-12-12 21:34     ` Ard Biesheuvel
  0 siblings, 0 replies; 63+ messages in thread
From: Ard Biesheuvel @ 2023-12-12 21:34 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Marc Zyngier, Ryan Roberts, Anshuman Khandual, Kees Cook

On Mon, 11 Dec 2023 at 15:35, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Nov 29, 2023 at 12:16:04PM +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The placement and size of the vmemmap region in the kernel virtual
> > address space is currently derived from the base2 order of the size of a
> > struct page. This makes for nicely aligned constants with lots of
> > leading 0xf and trailing 0x0 digits, but given that the actual struct
> > pages are indexed as an ordinary array, this resulting region is
> > severely overdimensioned when the size of a struct page is just over a
> > power of 2.
> >
> > This doesn't matter today, but once we enable 52-bit virtual addressing
> > for 4k pages configurations, the vmemmap region may take up almost half
> > of the upper VA region with the current struct page upper bound at 64
> > bytes. And once we enable KMSAN or other features that push the size of
> > a struct page over 64 bytes, we will run out of VMALLOC space entirely.
> >
> > So instead, let's derive the region size from the actual size of a
> > struct page, and place the entire region 1 GB from the top of the VA
> > space, where it still doesn't share any lower level translation table
> > entries with the fixmap.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> This is nice, and largely addresses the fear I mentioned on the earlier patches
> move the PCI_IO_* and FIXMAP_* regions. I'd still like to ensure that we have
> assertions that those don't overlap with the VMEMMAP region, but assuming those
> get added to the earlier patches, this looks good as-is:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks.

But actually, the only meaningful assert here is that the fixmap does
not grow down into the PCI I/O region. Everything else is hard coded
to unambiguous numeric constants, i.e.

#define VMEMMAP_END             (-UL(SZ_1G))
#define PCI_IO_START            (VMEMMAP_END + SZ_8M)
#define PCI_IO_END              (PCI_IO_START + PCI_IO_SIZE)
#define FIXADDR_TOP             (-UL(SZ_8M))

So I will add the hunk below to

arm64: mm: Move fixmap region above vmemmap region

but adding a static assert that a region that starts at VMEMMAP_END+8M
does not overlap with the VMEMMAP region seems rather futile to me.


--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -16,6 +16,9 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>

+/* ensure that the fixmap region does not grow down into the PCI I/O region */
+static_assert(FIXADDR_TOT_START > PCI_IO_END);
+
 #define NR_BM_PTE_TABLES \
        SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
 #define NR_BM_PMD_TABLES \

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2023-12-12 21:35 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-29 11:15 [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Ard Biesheuvel
2023-11-29 11:15 ` [PATCH v6 01/41] arm64: kernel: Disable latent_entropy GCC plugin in early C runtime Ard Biesheuvel
2023-11-30  4:44   ` Anshuman Khandual
2023-11-29 11:15 ` [PATCH v6 02/41] arm64: mm: Take potential load offset into account when KASLR is off Ard Biesheuvel
2023-11-30  5:23   ` Anshuman Khandual
2023-12-04 14:12   ` Mark Rutland
2023-12-04 15:40     ` Ard Biesheuvel
2023-11-29 11:15 ` [PATCH v6 03/41] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
2023-11-30  5:38   ` Anshuman Khandual
2023-12-04 14:37     ` Mark Rutland
2023-12-05  2:26       ` Anshuman Khandual
2023-11-29 11:16 ` [PATCH v6 04/41] arm64: mm: Move PCI I/O emulation region above the vmemmap region Ard Biesheuvel
2023-11-30  7:59   ` Anshuman Khandual
2023-11-30  8:02     ` Ard Biesheuvel
2023-11-30  8:52       ` Anshuman Khandual
2023-11-30  8:56         ` Ard Biesheuvel
2023-12-11 13:57   ` Mark Rutland
2023-12-11 14:10     ` Ard Biesheuvel
2023-12-11 14:21       ` Mark Rutland
2023-11-29 11:16 ` [PATCH v6 05/41] arm64: mm: Move fixmap region above " Ard Biesheuvel
2023-12-11 14:23   ` Mark Rutland
2023-11-29 11:16 ` [PATCH v6 06/41] arm64: ptdump: Allow all region boundaries to be defined at boot time Ard Biesheuvel
2023-12-11 14:15   ` Mark Rutland
2023-11-29 11:16 ` [PATCH v6 07/41] arm64: ptdump: Discover start of vmemmap region at runtime Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 08/41] arm64: vmemmap: Avoid base2 order of struct page size to dimension region Ard Biesheuvel
2023-12-11 14:35   ` Mark Rutland
2023-12-12 21:34     ` Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 09/41] arm64: mm: Reclaim unused vmemmap region for vmalloc use Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 10/41] arm64: kaslr: Adjust randomization range dynamically Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 11/41] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
2023-11-29 12:27   ` Marc Zyngier
2023-11-29 12:46     ` Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 12/41] arm64: kernel: Don't rely on objcopy to make code under pi/ __init Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 13/41] arm64: head: move relocation handling to C code Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 14/41] arm64: idreg-override: Omit non-NULL checks for override pointer Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 15/41] arm64: idreg-override: Prepare for place relative reloc patching Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 16/41] arm64: idreg-override: Avoid parameq() and parameqn() Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 17/41] arm64: idreg-override: avoid strlen() to check for empty strings Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 18/41] arm64: idreg-override: Avoid sprintf() for simple string concatenation Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 19/41] arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 20/41] arm64/kernel: Move 'nokaslr' parsing out of early idreg code Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 21/41] arm64: idreg-override: Move to early mini C runtime Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 22/41] arm64: kernel: Remove early fdt remap code Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 23/41] arm64: head: Clear BSS and the kernel page tables in one go Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 24/41] arm64: Move feature overrides into the BSS section Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 25/41] arm64: head: Run feature override detection before mapping the kernel Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 26/41] arm64: head: move dynamic shadow call stack patching into early C runtime Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 27/41] arm64: cpufeature: Add helper to test for CPU feature overrides Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 28/41] arm64: kaslr: Use feature override instead of parsing the cmdline again Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 29/41] arm64: idreg-override: Create a pseudo feature for rodata=off Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 30/41] arm64: Add helpers to probe local CPU for PAC and BTI support Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 31/41] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 32/41] arm64: head: move memstart_offset_seed handling to C code Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 33/41] arm64: mm: Make kaslr_requires_kpti() a static inline Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 34/41] arm64: mmu: Make __cpu_replace_ttbr1() out of line Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 35/41] arm64: head: Move early kernel mapping routines into C code Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 36/41] arm64: mm: Use 48-bit virtual addressing for the permanent ID map Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 37/41] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 38/41] arm64: kernel: Create initial ID map from C code Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 39/41] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 40/41] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
2023-11-29 11:16 ` [PATCH v6 41/41] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()" Ard Biesheuvel
2023-12-12 17:20 ` [PATCH v6 00/41] arm64: Reorganize kernel VA space for LPA2 Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).