linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/19] arm64 kernel text replication
@ 2024-01-17  8:53 Hao Jia
  2024-01-17  8:53 ` [PATCH v3 01/19] arm64: provide cpu_replace_ttbr1_phys() Hao Jia
                   ` (19 more replies)
  0 siblings, 20 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc, root

From: root <root@n144-101-220.byted.org>

Many thanks to Russell King for his previous work on
arm64 kernel text replication.
https://lore.kernel.org/all/ZMKNYEkM7YnrDtOt@shell.armlinux.org.uk

After applying these patches, we tested that our business performance
increased by more than 5% and the NUMA node memory bandwidth was more
balanced.
I've recently been trying to make it work with different numbers of
page tables/page sizes, so updated this patch set to V3.

Patch overview:

Patch 1-16 is a patch set based on Russell King's previous arm64
kernel text replication, rebased on commit 052d534373b7.

The following three patches are new in v3:
patch 17 fixes compilation warning

patch 18 adapts arm64 kernel text replication to support more
page tables/page sizes, in addition to 16K page size and
4-level page tables.

patch 19 fixes the abnormal startup problem caused by module_alloc()
which may allocate an address larger than KIMAGE_VADDR when kernel text
replication is enabled.

[v2] https://lore.kernel.org/all/ZMKNYEkM7YnrDtOt@shell.armlinux.org.uk
[RFC] https://lore.kernel.org/all/ZHYCUVa8fzmB4XZV@shell.armlinux.org.uk

Please correct me if I've made a mistake, thank you very much!

Original message below.

Problem
-------

NUMA systems have greater latency when accessing data and instructions
across nodes, which can lead to a reduction in performance on CPU cores
that mainly perform accesses beyond their local node.

Normally when an ARM64 system boots, the kernel will end up placed in
memory, and each CPU core will have to fetch instructions and data from
which ever NUMA node the kernel has been placed. This means that while
executing kernel code, CPUs local to that node will run faster than
CPUs in remote nodes.

The higher the latency to access remote NUMA node memory, the more the
kernel performance suffers on those nodes.

If there is a local copy of the kernel text in each node's RAM, and
each node runs the kernel using its local copy of the kernel text,
then it stands to reason that the kernel will run faster due to fewer
stalls while instructions are fetched from remote memory.

The question then arises how to achieve this.

Background
----------

An important issue to contend with is what happens when a thread
migrates between nodes. Essentially, the thread's state (including
instruction pointer) is saved to memory, and the scheduler on that CPU
loads some other thread's state and that CPU resumes executing that
new thread.

The CPU gaining the migrating thread loads the saved state, again
including the instruction pointer, and the gaining CPU resumes fetching
instructions at the virtual address where the original CPU left off.

The key point is that the virtual address is what matters here, and
this gives us a way to implement kernel text replication fairly easily.
At a practical level, all we need to do is to ensure that the virtual
addresses which contain the kernel text point to a local copy of the
that text.

This is exactly how this proposal of kernel text replication achieves
the replication. We can go a little bit further and include most of
the read-only data in this replication, as that will never be written
to by the kernel (and thus remains constant.)

Solution
--------

So, what we need to achieve is:

1. multiple identical copies of the kernel text (and read-only data)
2. point the virtual mappings to the appropriate copy of kernel text
   for the NUMA node.

(1) is fairly easy to achieve - we just need to allocate some memory
in the appropriate node and copy the parts of the kernel we want to
replicate. However, we also need to deal with ARM64's kernel patching.
There are two functions that patch the kernel text,
__apply_alternatives() and aarch64_insn_patch_text_nosync(). Both of
these need to to be modified to update all copies of the kernel text.

(2) is slightly harder.

Firstly, the aarch64 architecture has a very useful feature here - the
kernel page tables are entirely separate from the user page tables.
The hardware contains two page table pointers, one is used for user
mappings, the other is used for kernel mappings.

Therefore, we only have one page table to be concerned with: the table
which maps kernel space. We do not need to be concerned with each
user processes page table.

The approach taken here is to ensure that the kernel is located in an
area of kernel virtual address space covered by a level-0 page table
entry which is not shared with any other user. We can then maintain
separate per-node level-0 page tables for kernel space where the only
difference between them is this level-0 page table entry.

This gives a couple of benefits. Firstly, when updates to the level-0
page table happen (e.g. when establishing new mappings) these updates
can simply be copied to the other level-0 page tables provided it isn't
for the kernel image. Secondly, we don't need complexity at lower
levels of the page table code to figure out whether a level-1 or lower
update needs to be propagated to other nodes.

The level-0 page table entry for the kernel can then be used to point
at a node-unique set of level 1..N page tables to make the appropriate
copy of the kernel text (and read-only data) into kernel space, while
keeping the kernel read-write data shared between nodes.

Performance Analysis
--------------------

Needless to say, the performance results from kernel text replication
are workload specific, but appear to show a gain of between 6% and
17% for database-centric like workloads. When combined with userspace
awareness of NUMA, this can result in a gain of over 50%.

Problems
--------

There are a few areas that are a problem for kernel text replication:
1) As this series changes the kernel space virtual address space
   layout, it breaks KASAN - and I've zero knowledge of KASAN so I
   have no idea how to fix it. I would be grateful for input from
   KASAN folk for suggestions how to fix this.

2) KASLR can not be used with kernel text replication, since we need
   to place the kernel in its own L0 page table entry, not in vmalloc
   space. KASLR is disabled when support for kernel text replication
   is enabled.

3) Changing the kernel virtual address space layout also means that
   kaslr_offset() and kaslr_enabled() need to become macros rather
   than inline functions due to the use of PGDIR_SIZE in the
   calculation of KIMAGE_VADDR. Since asm/pgtable.h defines this
   constant, but asm/memory.h is included by asm/pgtable.h, having
   this symbol available would produce a circular include
   dependency, so I don't think there is any choice here.

4) read-only protection for replicated kernel images is not yet
   implemented.

Hao Jia (3):
  arm64: text replication: fix compilation warning
  arm64: text replication: support more page sizes and levels
  arm64: text replication: keep modules inside module region when
    REPLICATE_KTEXT is enabled

Russell King (Oracle) (16):
  arm64: provide cpu_replace_ttbr1_phys()
  arm64: make clean_dcache_range_nopatch() visible
  arm64: place kernel in its own L0 page table entry
  arm64: text replication: add init function
  arm64: text replication: add sanity checks
  arm64: text replication: copy initial kernel text
  arm64: text replication: add node text patching
  arm64: text replication: add node 0 page table definitions
  arm64: text replication: add swapper page directory helpers
  arm64: text replication: create per-node kernel page tables
  arm64: text replication: boot secondary CPUs with appropriate TTBR1
  arm64: text replication: update cnp support
  arm64: text replication: setup page tables for copied kernel
  arm64: text replication: include most of read-only data as well
  arm64: text replication: early kernel option to enable replication
  arm64: text replication: add Kconfig

 .../admin-guide/kernel-parameters.txt         |   5 +
 arch/arm64/Kconfig                            |  10 +-
 arch/arm64/include/asm/cacheflush.h           |   2 +
 arch/arm64/include/asm/ktext.h                |  45 ++++
 arch/arm64/include/asm/memory.h               |  36 ++-
 arch/arm64/include/asm/mmu_context.h          |  11 +-
 arch/arm64/include/asm/pgtable.h              |  31 ++-
 arch/arm64/include/asm/smp.h                  |   1 +
 arch/arm64/kernel/alternative.c               |   4 +-
 arch/arm64/kernel/asm-offsets.c               |   1 +
 arch/arm64/kernel/head.S                      |   3 +-
 arch/arm64/kernel/hibernate.c                 |   2 +-
 arch/arm64/kernel/kaslr.c                     |   1 +
 arch/arm64/kernel/module.c                    |  20 +-
 arch/arm64/kernel/patching.c                  |   7 +-
 arch/arm64/kernel/smp.c                       |   3 +
 arch/arm64/mm/Makefile                        |   2 +
 arch/arm64/mm/init.c                          |   3 +
 arch/arm64/mm/ktext.c                         | 213 ++++++++++++++++++
 arch/arm64/mm/mmu.c                           |  73 +++++-
 20 files changed, 446 insertions(+), 27 deletions(-)
 create mode 100644 arch/arm64/include/asm/ktext.h
 create mode 100644 arch/arm64/mm/ktext.c

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 01/19] arm64: provide cpu_replace_ttbr1_phys()
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 02/19] arm64: make clean_dcache_range_nopatch() visible Hao Jia
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Provide a version of cpu_replace_ttbr1_phys() which operates using a
physical address rather than the virtual address of the page tables.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/mmu_context.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 9ce4200508b1..466797dcb5fc 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -152,7 +152,7 @@ static inline void cpu_install_ttbr0(phys_addr_t ttbr0, unsigned long t0sz)
  * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
  * avoiding the possibility of conflicting TLB entries being allocated.
  */
-static inline void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
+static inline void __cpu_replace_ttbr1_phys(phys_addr_t pgd_phys, pgd_t *idmap, bool cnp)
 {
 	typedef void (ttbr_replace_func)(phys_addr_t);
 	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
@@ -160,7 +160,7 @@ static inline void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
 	unsigned long daif;
 
 	/* phys_to_ttbr() zeros lower 2 bits of ttbr with 52-bit PA */
-	phys_addr_t ttbr1 = phys_to_ttbr(virt_to_phys(pgdp));
+	phys_addr_t ttbr1 = phys_to_ttbr(pgd_phys);
 
 	if (cnp)
 		ttbr1 |= TTBR_CNP_BIT;
@@ -180,6 +180,11 @@ static inline void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
 	cpu_uninstall_idmap();
 }
 
+static inline void __nocfi __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
+{
+	__cpu_replace_ttbr1_phys(virt_to_phys(pgdp), idmap, cnp);
+}
+
 static inline void cpu_enable_swapper_cnp(void)
 {
 	__cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir, true);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 02/19] arm64: make clean_dcache_range_nopatch() visible
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
  2024-01-17  8:53 ` [PATCH v3 01/19] arm64: provide cpu_replace_ttbr1_phys() Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 03/19] arm64: place kernel in its own L0 page table entry Hao Jia
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

When we hook into the kernel text patching code, we will need to call
clean_dcache_range_nopatch() to ensure that the patching of the
replicated kernel text is properly visible to other CPUs. Make this
function available to the replication code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/cacheflush.h | 2 ++
 arch/arm64/kernel/alternative.c     | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index fefac75fa009..f94255db1c75 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -104,6 +104,8 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
 }
 #define flush_icache_range flush_icache_range
 
+void clean_dcache_range_nopatch(u64 start, u64 end);
+
 /*
  * Copy user data from/to a page which is mapped into a different
  * processes address space.  Really, we want to allow our "user
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 8ff6610af496..ea3f4104771d 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -121,7 +121,7 @@ static noinstr void patch_alternative(struct alt_instr *alt,
  * accidentally call into the cache.S code, which is patched by us at
  * runtime.
  */
-static noinstr void clean_dcache_range_nopatch(u64 start, u64 end)
+noinstr void clean_dcache_range_nopatch(u64 start, u64 end)
 {
 	u64 cur, d_size, ctr_el0;
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 03/19] arm64: place kernel in its own L0 page table entry
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
  2024-01-17  8:53 ` [PATCH v3 01/19] arm64: provide cpu_replace_ttbr1_phys() Hao Jia
  2024-01-17  8:53 ` [PATCH v3 02/19] arm64: make clean_dcache_range_nopatch() visible Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 04/19] arm64: text replication: add init function Hao Jia
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Kernel text replication needs to maintain separate per-node page
tables for the kernel text. In order to do this without affecting
other kernel memory mappings, placing the kernel such that it does
not share a L0 page table entry with any other mapping is desirable.

Prior to this commit, the layout without KASLR was:

+----------+
|  vmalloc |
+----------+
|  Kernel  |
+----------+ MODULES_END, VMALLOC_START, KIMAGE_VADDR =
|  Modules |                 MODULES_VADDR + MODULES_VSIZE
+----------+ MODULES_VADDR = _PAGE_END(VA_BITS_MIN)
| VA space |
+----------+ 0

This becomes:

+----------+
|  vmalloc |
+----------+ VMALLOC_START = MODULES_END + PGDIR_SIZE
|  Kernel  |
+----------+ MODULES_END, KIMAGE_VADDR = _PAGE_END(VA_BITS_MIN) +
|  Modules |    max(PGDIR_SIZE, MODULES_VSIZE)
+----------+ MODULES_VADDR = MODULES_END - MODULES_VSIZE
| VA space |
+----------+ 0

This assumes MODULES_VSIZE (128M) <= PGDIR_SIZE.

One side effect of this change is that KIMAGE_VADDR's definition now
includes PGDIR_SIZE (to leave room for the modules) but this is not
defined when asm/memory.h is included. This means KIMAGE_VADDR can
not be used in inline functions within this file, so we convert
kaslr_offset() and kaslr_enabled() to be macros instead.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/memory.h  | 28 +++++++++++++++++++++-------
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/kernel/kaslr.c        |  1 +
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index d82305ab420f..c73820fb36a3 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -43,9 +43,26 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define _PAGE_OFFSET(va)	(-(UL(1) << (va)))
 #define PAGE_OFFSET		(_PAGE_OFFSET(VA_BITS))
-#define KIMAGE_VADDR		(MODULES_END)
-#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
-#define MODULES_VADDR		(_PAGE_END(VA_BITS_MIN))
+
+/*
+ * Setting KIMAGE_VADDR has got a lot harder, ideally we'd like to use
+ * min(PGDIR_SIZE, MODULES_VSIZE) but this can't work because this is used
+ * both in assembly as C, where it causes problems. min_t() solves the
+ * C problems but can't be used in assembly.
+ * CONFIG_ARM64_4K_PAGES, PGDIR_SIZE is 2M, 1G, 512G
+ * CONFIG_ARM64_16K_PAGES, PGDIR_SIZE is 32M, 64G or 128T
+ * CONFIG_ARM64_64K_PAGES, PGDIR_SIZE is 512M or 4T
+ */
+#if (CONFIG_ARM64_4K_PAGES && CONFIG_PGTABLE_LEVELS < 4) || \
+    (CONFIG_ARM64_16K_PAGES && CONFIG_PGTABLE_LEVELS < 3) || \
+    (CONFIG_ARM64_64K_PAGES && CONFIG_PGTABLE_LEVELS < 2)
+#define KIMAGE_OFFSET		MODULES_VSIZE
+#else
+#define KIMAGE_OFFSET		PGDIR_SIZE
+#endif
+#define KIMAGE_VADDR		(_PAGE_END(VA_BITS_MIN) + KIMAGE_OFFSET)
+#define MODULES_END		(KIMAGE_VADDR)
+#define MODULES_VADDR		(MODULES_END - MODULES_VSIZE)
 #define MODULES_VSIZE		(SZ_2G)
 #define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
 #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
@@ -223,10 +240,7 @@ extern s64			memstart_addr;
 /* the offset between the kernel virtual and physical mappings */
 extern u64			kimage_voffset;
 
-static inline unsigned long kaslr_offset(void)
-{
-	return (u64)&_text - KIMAGE_VADDR;
-}
+#define kaslr_offset()	((unsigned long)((u64)&_text - KIMAGE_VADDR))
 
 #ifdef CONFIG_RANDOMIZE_BASE
 void kaslr_init(void);
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 79ce70fbb751..97d2127d64eb 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -21,7 +21,7 @@
  * VMALLOC_END: extends to the available space below vmemmap, PCI I/O space
  *	and fixed mappings
  */
-#define VMALLOC_START		(MODULES_END)
+#define VMALLOC_START		(MODULES_END + PGDIR_SIZE)
 #define VMALLOC_END		(VMEMMAP_START - SZ_256M)
 
 #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 12c7f3c8ba76..1af065280d86 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -9,6 +9,7 @@
 
 #include <asm/cpufeature.h>
 #include <asm/memory.h>
+#include <asm/pgtable.h>
 
 u16 __initdata memstart_offset_seed;
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 04/19] arm64: text replication: add init function
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (2 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 03/19] arm64: place kernel in its own L0 page table entry Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 05/19] arm64: text replication: add sanity checks Hao Jia
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

A simple patch that adds an empty function for kernel text replication
initialisation and hooks it into the initialisation path.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/ktext.h | 20 ++++++++++++++++++++
 arch/arm64/mm/Makefile         |  2 ++
 arch/arm64/mm/init.c           |  3 +++
 arch/arm64/mm/ktext.c          |  8 ++++++++
 4 files changed, 33 insertions(+)
 create mode 100644 arch/arm64/include/asm/ktext.h
 create mode 100644 arch/arm64/mm/ktext.c

diff --git a/arch/arm64/include/asm/ktext.h b/arch/arm64/include/asm/ktext.h
new file mode 100644
index 000000000000..1a5f7452a3bf
--- /dev/null
+++ b/arch/arm64/include/asm/ktext.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022, Oracle and/or its affiliates.
+ */
+#ifndef ASM_KTEXT_H
+#define ASM_KTEXT_H
+
+#ifdef CONFIG_REPLICATE_KTEXT
+
+void ktext_replication_init(void);
+
+#else
+
+static inline void ktext_replication_init(void)
+{
+}
+
+#endif
+
+#endif
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index dbd1bc95967d..41e705027c57 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -14,3 +14,5 @@ KASAN_SANITIZE_physaddr.o	+= n
 
 obj-$(CONFIG_KASAN)		+= kasan_init.o
 KASAN_SANITIZE_kasan_init.o	:= n
+
+obj-$(CONFIG_REPLICATE_KTEXT)	+= ktext.o
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 74c1db8ce271..e336a26e1072 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -37,6 +37,7 @@
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
+#include <asm/ktext.h>
 #include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
@@ -329,6 +330,8 @@ void __init bootmem_init(void)
 
 	arch_numa_init();
 
+	ktext_replication_init();
+
 	/*
 	 * must be done after arch_numa_init() which calls numa_init() to
 	 * initialize node_online_map that gets used in hugetlb_cma_reserve()
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
new file mode 100644
index 000000000000..3a8d37c9abc4
--- /dev/null
+++ b/arch/arm64/mm/ktext.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022, Oracle and/or its affiliates.
+ */
+
+void __init ktext_replication_init(void)
+{
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 05/19] arm64: text replication: add sanity checks
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (3 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 04/19] arm64: text replication: add init function Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 06/19] arm64: text replication: copy initial kernel text Hao Jia
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

The kernel text and modules must be in separate L0 page table entries.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/mm/ktext.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 3a8d37c9abc4..901f159c65e6 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -3,6 +3,27 @@
  * Copyright (C) 2022, Oracle and/or its affiliates.
  */
 
+#include <linux/kernel.h>
+#include <linux/pgtable.h>
+
+#include <asm/ktext.h>
+#include <asm/memory.h>
+
 void __init ktext_replication_init(void)
 {
+	int kidx = pgd_index((phys_addr_t)KERNEL_START);
+
+	/*
+	 * If we've messed up and the kernel shares a L0 entry with the
+	 * module or vmalloc area, then don't even attempt to use text
+	 * replication.
+	 */
+	if (pgd_index(MODULES_VADDR) == kidx) {
+		pr_warn("Kernel is located in the same L0 index as modules - text replication disabled\n");
+		return;
+	}
+	if (pgd_index(VMALLOC_START) == kidx) {
+		pr_warn("Kernel is located in the same L0 index as vmalloc - text replication disabled\n");
+		return;
+	}
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 06/19] arm64: text replication: copy initial kernel text
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (4 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 05/19] arm64: text replication: add sanity checks Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 07/19] arm64: text replication: add node text patching Hao Jia
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Allocate memory on the appropriate node for the per-node copies of the
kernel text, and copy the kernel text to that memory. Clean and
invalidate the caches to the point of unification so that the copied
text is correctly visible to the target node.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/mm/ktext.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 901f159c65e6..4c803b89fcfe 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -4,14 +4,23 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/memblock.h>
+#include <linux/numa.h>
 #include <linux/pgtable.h>
+#include <linux/string.h>
 
+#include <asm/cacheflush.h>
 #include <asm/ktext.h>
 #include <asm/memory.h>
 
+static void *kernel_texts[MAX_NUMNODES];
+
+/* Allocate memory for the replicated kernel texts. */
 void __init ktext_replication_init(void)
 {
+	size_t size = _etext - _stext;
 	int kidx = pgd_index((phys_addr_t)KERNEL_START);
+	int nid;
 
 	/*
 	 * If we've messed up and the kernel shares a L0 entry with the
@@ -26,4 +35,16 @@ void __init ktext_replication_init(void)
 		pr_warn("Kernel is located in the same L0 index as vmalloc - text replication disabled\n");
 		return;
 	}
+
+	for_each_node(nid) {
+		/* Nothing to do for node 0 */
+		if (!nid)
+			continue;
+
+		/* Allocate and copy initial kernel text for this node */
+		kernel_texts[nid] = memblock_alloc_node(size, PAGE_SIZE, nid);
+		memcpy(kernel_texts[nid], _stext, size);
+		caches_clean_inval_pou((u64)kernel_texts[nid],
+				       (u64)kernel_texts[nid] + size);
+	}
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 07/19] arm64: text replication: add node text patching
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (5 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 06/19] arm64: text replication: copy initial kernel text Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 08/19] arm64: text replication: add node 0 page table definitions Hao Jia
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Add support for text patching on our replicated texts.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/ktext.h  | 12 +++++++
 arch/arm64/kernel/alternative.c |  2 ++
 arch/arm64/kernel/patching.c    |  7 +++-
 arch/arm64/mm/ktext.c           | 58 +++++++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/ktext.h b/arch/arm64/include/asm/ktext.h
index 1a5f7452a3bf..289e11289c06 100644
--- a/arch/arm64/include/asm/ktext.h
+++ b/arch/arm64/include/asm/ktext.h
@@ -5,9 +5,13 @@
 #ifndef ASM_KTEXT_H
 #define ASM_KTEXT_H
 
+#include <linux/kprobes.h>
+
 #ifdef CONFIG_REPLICATE_KTEXT
 
 void ktext_replication_init(void);
+void __kprobes ktext_replication_patch(u32 *tp,  __le32 insn);
+void ktext_replication_patch_alternative(__le32 *src, int nr_inst);
 
 #else
 
@@ -15,6 +19,14 @@ static inline void ktext_replication_init(void)
 {
 }
 
+static inline void __kprobes ktext_replication_patch(u32 *tp,  __le32 insn)
+{
+}
+
+static inline void ktext_replication_patch_alternative(__le32 *src, int nr_inst)
+{
+}
+
 #endif
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index ea3f4104771d..6f17e2b4e1c3 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -15,6 +15,7 @@
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
 #include <asm/insn.h>
+#include <asm/ktext.h>
 #include <asm/module.h>
 #include <asm/sections.h>
 #include <asm/vdso.h>
@@ -174,6 +175,7 @@ static void __apply_alternatives(const struct alt_region *region,
 		alt_cb(alt, origptr, updptr, nr_inst);
 
 		if (!is_module) {
+			ktext_replication_patch_alternative(updptr, nr_inst);
 			clean_dcache_range_nopatch((u64)origptr,
 						   (u64)(origptr + nr_inst));
 		}
diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index b4835f6d594b..627fff6ddda2 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -10,6 +10,7 @@
 #include <asm/fixmap.h>
 #include <asm/insn.h>
 #include <asm/kprobes.h>
+#include <asm/ktext.h>
 #include <asm/patching.h>
 #include <asm/sections.h>
 
@@ -115,9 +116,13 @@ int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
 		return -EINVAL;
 
 	ret = aarch64_insn_write(tp, insn);
-	if (ret == 0)
+	if (ret == 0) {
+		/* Also patch the other nodes */
+		ktext_replication_patch(tp, cpu_to_le32(insn));
+
 		caches_clean_inval_pou((uintptr_t)tp,
 				     (uintptr_t)tp + AARCH64_INSN_SIZE);
+	}
 
 	return ret;
 }
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 4c803b89fcfe..04b5ceddae4e 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -3,8 +3,10 @@
  * Copyright (C) 2022, Oracle and/or its affiliates.
  */
 
+#include <linux/kallsyms.h>
 #include <linux/kernel.h>
 #include <linux/memblock.h>
+#include <linux/mm.h>
 #include <linux/numa.h>
 #include <linux/pgtable.h>
 #include <linux/string.h>
@@ -15,6 +17,62 @@
 
 static void *kernel_texts[MAX_NUMNODES];
 
+void __kprobes ktext_replication_patch(u32 *tp, __le32 insn)
+{
+	unsigned long offset;
+	int nid, this_nid;
+	__le32 *p;
+
+	if (!is_kernel_text((unsigned long)tp))
+		return;
+
+	offset = (unsigned long)tp - (unsigned long)_stext;
+
+	this_nid = numa_node_id();
+	if (this_nid) {
+		/* The cache maintenance by aarch64_insn_patch_text_nosync()
+		 * will occur on this node. We need it to occur on node 0.
+		 */
+		p = (void *)lm_alias(_stext) + offset;
+		caches_clean_inval_pou((u64)p, (u64)p + AARCH64_INSN_SIZE);
+	}
+
+	for_each_node(nid) {
+		if (!kernel_texts[nid])
+			continue;
+
+		p = kernel_texts[nid] + offset;
+		WRITE_ONCE(*p, insn);
+		caches_clean_inval_pou((u64)p, (u64)p + AARCH64_INSN_SIZE);
+	}
+}
+
+/* Copy the patched alternative from the node0 image to the other
+ * modes. src is the node 0 linear-mapping address.
+ */
+void ktext_replication_patch_alternative(__le32 *src, int nr_inst)
+{
+	unsigned long offset;
+	size_t size;
+	int nid;
+	__le32 *p;
+
+	offset = (unsigned long)src - (unsigned long)lm_alias(_stext);
+	if (offset >= _etext - _stext)
+		return;
+
+	size = AARCH64_INSN_SIZE * nr_inst;
+
+	for_each_node(nid) {
+		if (!kernel_texts[nid])
+			continue;
+
+		p = kernel_texts[nid] + offset;
+		memcpy(p, src, size);
+		clean_dcache_range_nopatch((u64)p, (u64)p + size);
+	}
+}
+
 /* Allocate memory for the replicated kernel texts. */
 void __init ktext_replication_init(void)
 {
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 08/19] arm64: text replication: add node 0 page table definitions
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (6 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 07/19] arm64: text replication: add node text patching Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 09/19] arm64: text replication: add swapper page directory helpers Hao Jia
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Add a struct definition for the level zero page table group (the
optional trampoline page tables, reserved page tables, and swapper page
tables).

Add a symbol and extern declaration for the node 0 page table group.

Add an array of pointers to per-node page tables, which will default to
using the node 0 page table group.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
 arch/arm64/kernel/vmlinux.lds.S  |  3 +++
 arch/arm64/mm/ktext.c            |  4 ++++
 3 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 97d2127d64eb..0eb71b2b1bd2 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -623,6 +623,20 @@ extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
 extern pgd_t reserved_pg_dir[PTRS_PER_PGD];
 
+struct pgtables {
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+	pgd_t tramp_pg_dir[PTRS_PER_PGD];
+#endif
+	pgd_t reserved_pg_dir[PTRS_PER_PGD];
+	pgd_t swapper_pg_dir[PTRS_PER_PGD];
+};
+
+extern struct pgtables pgtable_node0;
+
+#ifdef CONFIG_REPLICATE_KTEXT
+extern struct pgtables *pgtables[MAX_NUMNODES];
+#endif
+
 extern void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
 
 static inline bool in_swapper_pgdir(void *addr)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3cd7e76cc562..d3c7ed76adbf 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -212,6 +212,9 @@ SECTIONS
 	idmap_pg_dir = .;
 	. += PAGE_SIZE;
 
+	/* pgtable struct - covers the tramp, reserved and swapper pgdirs */
+	pgtable_node0 = .;
+
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 	tramp_pg_dir = .;
 	. += PAGE_SIZE;
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 04b5ceddae4e..48d7943d6907 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -15,6 +15,10 @@
 #include <asm/ktext.h>
 #include <asm/memory.h>
 
+struct pgtables *pgtables[MAX_NUMNODES] = {
+	[0 ... MAX_NUMNODES - 1] = &pgtable_node0,
+};
+
 static void *kernel_texts[MAX_NUMNODES];
 
 void __kprobes ktext_replication_patch(u32 *tp, __le32 insn)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 09/19] arm64: text replication: add swapper page directory helpers
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (7 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 08/19] arm64: text replication: add node 0 page table definitions Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 10/19] arm64: text replication: create per-node kernel page tables Hao Jia
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Add a series of helpers for the swapper page directories - a set which
return those for the calling CPU, and those which take the NUMA node
number.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/pgtable.h | 19 +++++++++++++++++++
 arch/arm64/kernel/hibernate.c    |  2 +-
 arch/arm64/mm/ktext.c            | 20 ++++++++++++++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0eb71b2b1bd2..62a9d3e11fe1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -635,6 +635,25 @@ extern struct pgtables pgtable_node0;
 
 #ifdef CONFIG_REPLICATE_KTEXT
 extern struct pgtables *pgtables[MAX_NUMNODES];
+
+pgd_t *swapper_pg_dir_node(void);
+phys_addr_t __swapper_pg_dir_node_phys(int nid);
+phys_addr_t swapper_pg_dir_node_phys(void);
+#else
+static inline pgd_t *swapper_pg_dir_node(void)
+{
+	return swapper_pg_dir;
+}
+
+static inline phys_addr_t __swapper_pg_dir_node_phys(int nid)
+{
+	return __pa_symbol(swapper_pg_dir);
+}
+
+static inline phys_addr_t swapper_pg_dir_node_phys(void)
+{
+	return __pa_symbol(swapper_pg_dir);
+}
 #endif
 
 extern void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 02870beb271e..be69515da802 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -113,7 +113,7 @@ int arch_hibernation_header_save(void *addr, unsigned int max_size)
 		return -EOVERFLOW;
 
 	arch_hdr_invariants(&hdr->invariants);
-	hdr->ttbr1_el1		= __pa_symbol(swapper_pg_dir);
+	hdr->ttbr1_el1		= swapper_pg_dir_node_phys();
 	hdr->reenter_kernel	= _cpu_resume;
 
 	/* We can't use __hyp_get_vectors() because kvm may still be loaded */
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 48d7943d6907..7b9a1f1b12a1 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -21,6 +21,26 @@ struct pgtables *pgtables[MAX_NUMNODES] = {
 
 static void *kernel_texts[MAX_NUMNODES];
 
+static pgd_t *__swapper_pg_dir_node(int nid)
+{
+	return pgtables[nid]->swapper_pg_dir;
+}
+
+pgd_t *swapper_pg_dir_node(void)
+{
+	return __swapper_pg_dir_node(numa_node_id());
+}
+
+phys_addr_t __swapper_pg_dir_node_phys(int nid)
+{
+	return __pa(__swapper_pg_dir_node(nid));
+}
+
+phys_addr_t swapper_pg_dir_node_phys(void)
+{
+	return __swapper_pg_dir_node_phys(numa_node_id());
+}
+
 void __kprobes ktext_replication_patch(u32 *tp, __le32 insn)
 {
 	unsigned long offset;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 10/19] arm64: text replication: create per-node kernel page tables
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (8 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 09/19] arm64: text replication: add swapper page directory helpers Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 11/19] arm64: text replication: boot secondary CPUs with appropriate TTBR1 Hao Jia
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Allocate the level 0 page tables for the per-node kernel text
replication, but copy all level 0 table entries from the NUMA node 0
table. Therefore, for the time being, each node's level 0 page tables
will contain identical entries, and thus other nodes will continue
to use the node 0 kernel text.

Since the level 0 page tables can be updated at runtime to add entries
for vmalloc and module space, propagate these updates to the other
swapper page tables. The exception is if we see an update for the
level 0 entry which points to the kernel mapping.

We also need to setup a copy of the trampoline page tables as well, as
the assembly code relies on the two page tables being a fixed offset
apart.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/ktext.h | 12 ++++++++++
 arch/arm64/mm/ktext.c          | 42 +++++++++++++++++++++++++++++++++-
 arch/arm64/mm/mmu.c            |  5 ++++
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/ktext.h b/arch/arm64/include/asm/ktext.h
index 289e11289c06..386f9812d3c1 100644
--- a/arch/arm64/include/asm/ktext.h
+++ b/arch/arm64/include/asm/ktext.h
@@ -7,11 +7,15 @@
 
 #include <linux/kprobes.h>
 
+#include <asm/pgtable-types.h>
+
 #ifdef CONFIG_REPLICATE_KTEXT
 
 void ktext_replication_init(void);
 void __kprobes ktext_replication_patch(u32 *tp,  __le32 insn);
 void ktext_replication_patch_alternative(__le32 *src, int nr_inst);
+void ktext_replication_set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
+void ktext_replication_init_tramp(void);
 
 #else
 
@@ -27,6 +31,14 @@ static inline void ktext_replication_patch_alternative(__le32 *src, int nr_inst)
 {
 }
 
+static inline void ktext_replication_set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+}
+
+static inline void ktext_replication_init_tramp(void)
+{
+}
+
 #endif
 
 #endif
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 7b9a1f1b12a1..9efd21eb3308 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -14,6 +14,7 @@
 #include <asm/cacheflush.h>
 #include <asm/ktext.h>
 #include <asm/memory.h>
+#include <asm/pgalloc.h>
 
 struct pgtables *pgtables[MAX_NUMNODES] = {
 	[0 ... MAX_NUMNODES - 1] = &pgtable_node0,
@@ -97,7 +98,7 @@ void ktext_replication_patch_alternative(__le32 *src, int nr_inst)
 	}
 }
 
-/* Allocate memory for the replicated kernel texts. */
+/* Allocate page tables and memory for the replicated kernel texts. */
 void __init ktext_replication_init(void)
 {
 	size_t size = _etext - _stext;
@@ -128,5 +129,44 @@ void __init ktext_replication_init(void)
 		memcpy(kernel_texts[nid], _stext, size);
 		caches_clean_inval_pou((u64)kernel_texts[nid],
 				       (u64)kernel_texts[nid] + size);
+
+		/* Allocate the pagetables for this node */
+		pgtables[nid] = memblock_alloc_node(sizeof(*pgtables[0]),
+						    PGD_SIZE, nid);
+
+		/* Copy initial swapper page directory */
+		memcpy(pgtables[nid]->swapper_pg_dir, swapper_pg_dir, PGD_SIZE);
+	}
+}
+
+void ktext_replication_set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	unsigned long idx = pgdp - swapper_pg_dir;
+	int nid;
+
+	if (WARN_ON_ONCE(idx >= PTRS_PER_PGD) ||
+	    WARN_ON_ONCE(idx == pgd_index((phys_addr_t)KERNEL_START)))
+		return;
+
+	for_each_node(nid) {
+		if (pgtables[nid]->swapper_pg_dir == swapper_pg_dir)
+			continue;
+
+		WRITE_ONCE(pgtables[nid]->swapper_pg_dir[idx], pgd);
+	}
+}
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+void __init ktext_replication_init_tramp(void)
+{
+	int nid;
+
+	for_each_node(nid) {
+		/* Nothing to do for node 0 */
+		if (pgtables[nid]->tramp_pg_dir == tramp_pg_dir)
+			continue;
+
+		memcpy(pgtables[nid]->tramp_pg_dir, tramp_pg_dir, PGD_SIZE);
 	}
 }
+#endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1ac7467d34c9..f3ec38d9e232 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -31,6 +31,7 @@
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
+#include <asm/ktext.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <linux/sizes.h>
@@ -78,6 +79,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 	pgd_t *fixmap_pgdp;
 
 	spin_lock(&swapper_pgdir_lock);
+	ktext_replication_set_swapper_pgd(pgdp, pgd);
 	fixmap_pgdp = pgd_set_fixmap(__pa_symbol(pgdp));
 	WRITE_ONCE(*fixmap_pgdp, pgd);
 	/*
@@ -695,6 +697,9 @@ static int __init map_entry_trampoline(void)
 		__set_fixmap(FIX_ENTRY_TRAMP_TEXT1 - i,
 			     pa_start + i * PAGE_SIZE, PAGE_KERNEL_RO);
 
+	/* Copy trampoline page tables to other numa nodes */
+	ktext_replication_init_tramp();
+
 	return 0;
 }
 core_initcall(map_entry_trampoline);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 11/19] arm64: text replication: boot secondary CPUs with appropriate TTBR1
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (9 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 10/19] arm64: text replication: create per-node kernel page tables Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 12/19] arm64: text replication: update cnp support Hao Jia
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Arrange for secondary CPUs to boot with TTBR1 pointing at the
appropriate per-node copy of the kernel page tables for the CPUs NUMA
node.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/smp.h    | 1 +
 arch/arm64/kernel/asm-offsets.c | 1 +
 arch/arm64/kernel/head.S        | 3 ++-
 arch/arm64/kernel/smp.c         | 3 +++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index efb13112b408..a095616999a9 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -79,6 +79,7 @@ asmlinkage void secondary_start_kernel(void);
 struct secondary_data {
 	struct task_struct *task;
 	long status;
+	phys_addr_t ttbr1;
 };
 
 extern struct secondary_data secondary_data;
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 5ff1942b04fc..ce9d265bc099 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -121,6 +121,7 @@ int main(void)
   DEFINE(IRQ_CPUSTAT_SOFTIRQ_PENDING, offsetof(irq_cpustat_t, __softirq_pending));
   BLANK();
   DEFINE(CPU_BOOT_TASK,		offsetof(struct secondary_data, task));
+  DEFINE(CPU_BOOT_TTBR1,	offsetof(struct secondary_data, ttbr1));
   BLANK();
   DEFINE(FTR_OVR_VAL_OFFSET,	offsetof(struct arm64_ftr_override, val));
   DEFINE(FTR_OVR_MASK_OFFSET,	offsetof(struct arm64_ftr_override, mask));
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index cab7f91949d8..c21746685cdd 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -648,7 +648,8 @@ SYM_FUNC_START_LOCAL(secondary_startup)
 	ldr_l	x0, vabits_actual
 #endif
 	bl	__cpu_setup			// initialise processor
-	adrp	x1, swapper_pg_dir
+	adr_l	x1, secondary_data
+	ldr	x1, [x1, #CPU_BOOT_TTBR1]
 	adrp	x2, idmap_pg_dir
 	bl	__enable_mmu
 	ldr	x8, =__secondary_switched
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4ced34f62dab..80a8e55e79b2 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -127,6 +127,9 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	 * page tables.
 	 */
 	secondary_data.task = idle;
+	secondary_data.ttbr1 = __swapper_pg_dir_node_phys(cpu_to_node(cpu));
+	dcache_clean_poc((uintptr_t)&secondary_data,
+			 (uintptr_t)&secondary_data + sizeof(secondary_data));
 	update_cpu_boot_status(CPU_MMU_OFF);
 
 	/* Now bring the CPU into our world */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 12/19] arm64: text replication: update cnp support
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (10 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 11/19] arm64: text replication: boot secondary CPUs with appropriate TTBR1 Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 13/19] arm64: text replication: setup page tables for copied kernel Hao Jia
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Add changes for CNP (Common Not Private) support of kernel text
replication. Although text replication has only been tested on
dual-socket Ampere A1 systems, provided the different NUMA nodes
are not part of the same inner shareable domain, CNP should not
be a problem.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/mmu_context.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 466797dcb5fc..4f78f4db5df4 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -187,7 +187,7 @@ static inline void __nocfi __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool c
 
 static inline void cpu_enable_swapper_cnp(void)
 {
-	__cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir, true);
+	__cpu_replace_ttbr1_phys(swapper_pg_dir_node_phys(), idmap_pg_dir, true);
 }
 
 static inline void cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 13/19] arm64: text replication: setup page tables for copied kernel
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (11 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 12/19] arm64: text replication: update cnp support Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 14/19] arm64: text replication: include most of read-only data as well Hao Jia
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Setup page table entries in each non-boot NUMA node page table to
point at each node's own copy of the kernel text. This switches
each node to use its own unique copy of the kernel text.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/ktext.h |  1 +
 arch/arm64/mm/ktext.c          |  8 +++++
 arch/arm64/mm/mmu.c            | 53 ++++++++++++++++++++++++++++------
 3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/ktext.h b/arch/arm64/include/asm/ktext.h
index 386f9812d3c1..6ece59ca90a2 100644
--- a/arch/arm64/include/asm/ktext.h
+++ b/arch/arm64/include/asm/ktext.h
@@ -16,6 +16,7 @@ void __kprobes ktext_replication_patch(u32 *tp,  __le32 insn);
 void ktext_replication_patch_alternative(__le32 *src, int nr_inst);
 void ktext_replication_set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
 void ktext_replication_init_tramp(void);
+void create_kernel_nid_map(pgd_t *pgdp, void *ktext);
 
 #else
 
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 9efd21eb3308..6692759e78a8 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -136,6 +136,14 @@ void __init ktext_replication_init(void)
 
 		/* Copy initial swapper page directory */
 		memcpy(pgtables[nid]->swapper_pg_dir, swapper_pg_dir, PGD_SIZE);
+
+		/* Clear the kernel mapping */
+		memset(&pgtables[nid]->swapper_pg_dir[kidx], 0,
+		       sizeof(pgtables[nid]->swapper_pg_dir[kidx]));
+
+		/* Create kernel mapping pointing at our local copy */
+		create_kernel_nid_map(pgtables[nid]->swapper_pg_dir,
+				      kernel_texts[nid]);
 	}
 }
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f3ec38d9e232..181d5339dd05 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -638,6 +638,16 @@ void mark_rodata_ro(void)
 	debug_checkwx();
 }
 
+static void __init create_kernel_mapping(pgd_t *pgdp, phys_addr_t pa_start,
+					 void *va_start, void *va_end,
+					 pgprot_t prot, int flags)
+{
+	size_t size = va_end - va_start;
+
+	__create_pgd_mapping(pgdp, pa_start, (unsigned long)va_start, size,
+			     prot, early_pgtable_alloc, flags);
+}
+
 static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 				      pgprot_t prot, struct vm_struct *vma,
 				      int flags, unsigned long vm_flags)
@@ -648,8 +658,7 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(pa_start));
 	BUG_ON(!PAGE_ALIGNED(size));
 
-	__create_pgd_mapping(pgdp, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, flags);
+	create_kernel_mapping(pgdp, pa_start, va_start, va_end, prot, flags);
 
 	if (!(vm_flags & VM_NO_GUARD))
 		size += PAGE_SIZE;
@@ -721,14 +730,8 @@ static bool arm64_early_this_cpu_has_bti(void)
 						    ID_AA64PFR1_EL1_BT_SHIFT);
 }
 
-/*
- * Create fine-grained mappings for the kernel.
- */
-static void __init map_kernel(pgd_t *pgdp)
+static pgprot_t __init kernel_text_pgprot(void)
 {
-	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
-				vmlinux_initdata, vmlinux_data;
-
 	/*
 	 * External debuggers may need to write directly to the text
 	 * mapping to install SW breakpoints. Allow this (only) when
@@ -744,6 +747,38 @@ static void __init map_kernel(pgd_t *pgdp)
 	if (arm64_early_this_cpu_has_bti())
 		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
 
+	return text_prot;
+}
+
+#ifdef CONFIG_REPLICATE_KTEXT
+void __init create_kernel_nid_map(pgd_t *pgdp, void *ktext)
+{
+	pgprot_t text_prot = kernel_text_pgprot();
+
+	create_kernel_mapping(pgdp, __pa(ktext), _stext, _etext, text_prot, 0);
+	create_kernel_mapping(pgdp, __pa_symbol(__start_rodata),
+			      __start_rodata, __inittext_begin,
+			      PAGE_KERNEL, NO_CONT_MAPPINGS);
+	create_kernel_mapping(pgdp, __pa_symbol(__inittext_begin),
+			      __inittext_begin, __inittext_end,
+			      text_prot, 0);
+	create_kernel_mapping(pgdp, __pa_symbol(__initdata_begin),
+			      __initdata_begin, __initdata_end,
+			      PAGE_KERNEL, 0);
+	create_kernel_mapping(pgdp, __pa_symbol(_data), _data, _end,
+			      PAGE_KERNEL, 0);
+}
+#endif
+
+/*
+ * Create fine-grained mappings for the kernel.
+ */
+static void __init map_kernel(pgd_t *pgdp)
+{
+	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+				vmlinux_initdata, vmlinux_data;
+	pgprot_t text_prot = kernel_text_pgprot();
+
 	/*
 	 * Only rodata will be remapped with different permissions later on,
 	 * all other segments are allowed to use contiguous mappings.
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 14/19] arm64: text replication: include most of read-only data as well
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (12 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 13/19] arm64: text replication: setup page tables for copied kernel Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 15/19] arm64: text replication: early kernel option to enable replication Hao Jia
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Include as much of the read-only data in the replication as we can
without needing to move away from the generic RO_DATA() macro in
the linker script.

Unfortunately, the read-only data section is immedaitely followed
by the read-only after init data with no page alignment, which
means we can't have separate mappings for the read-only data
section and everything else. Changing that would mean replacing
the generic RO_DATA() macro which increases the maintenance burden.

however, this is likely not worth the effort as the majority of
read-only data will be covered.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/mm/ktext.c |  2 +-
 arch/arm64/mm/mmu.c   | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 6692759e78a8..6265a2db449b 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -101,7 +101,7 @@ void ktext_replication_patch_alternative(__le32 *src, int nr_inst)
 /* Allocate page tables and memory for the replicated kernel texts. */
 void __init ktext_replication_init(void)
 {
-	size_t size = _etext - _stext;
+	size_t size = __end_rodata - _stext;
 	int kidx = pgd_index((phys_addr_t)KERNEL_START);
 	int nid;
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 181d5339dd05..a4efc5015bee 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -753,11 +753,26 @@ static pgprot_t __init kernel_text_pgprot(void)
 #ifdef CONFIG_REPLICATE_KTEXT
 void __init create_kernel_nid_map(pgd_t *pgdp, void *ktext)
 {
+	phys_addr_t pa_ktext;
+	size_t ro_offset;
+	void *ro_end;
 	pgprot_t text_prot = kernel_text_pgprot();
 
-	create_kernel_mapping(pgdp, __pa(ktext), _stext, _etext, text_prot, 0);
-	create_kernel_mapping(pgdp, __pa_symbol(__start_rodata),
-			      __start_rodata, __inittext_begin,
+	pa_ktext = __pa(ktext);
+	ro_offset = __pa_symbol(__start_rodata) - __pa_symbol(_stext);
+	/*
+	 * We must not cover the read-only data after init, since this
+	 * is written to during boot, and thus must be shared between
+	 * the NUMA nodes.
+	 */
+	ro_end = PTR_ALIGN_DOWN((void *)__start_ro_after_init, PAGE_SIZE);
+
+	create_kernel_mapping(pgdp, pa_ktext, _stext, _etext, text_prot, 0);
+	create_kernel_mapping(pgdp, pa_ktext + ro_offset,
+			      __start_rodata, ro_end,
+			      PAGE_KERNEL, NO_CONT_MAPPINGS);
+	create_kernel_mapping(pgdp, __pa_symbol(ro_end),
+			      ro_end, __inittext_begin,
 			      PAGE_KERNEL, NO_CONT_MAPPINGS);
 	create_kernel_mapping(pgdp, __pa_symbol(__inittext_begin),
 			      __inittext_begin, __inittext_end,
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 15/19] arm64: text replication: early kernel option to enable replication
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (13 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 14/19] arm64: text replication: include most of read-only data as well Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 16/19] arm64: text replication: add Kconfig Hao Jia
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Provide an early kernel option "ktext=" which allows the kernel text
replication to be enabled. This takes a boolean argument.

The way this has been implemented means that we take all the same paths
through the kernel at runtime whether kernel text replication has been
enabled or not; this allows the performance effects of the code changes
to be evaluated separately from the act of running with replicating the
kernel text.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 .../admin-guide/kernel-parameters.txt          |  5 +++++
 arch/arm64/mm/ktext.c                          | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6ee0f9a5da70..bace7bd404d3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2544,6 +2544,11 @@
 			0: force disabled
 			1: force enabled
 
+	ktext=		[ARM64] Control kernel text replication on NUMA
+			machines. Default: disabled.
+			0: disable kernel text replication
+			1: enable kernel text replication
+
 	kunit.enable=	[KUNIT] Enable executing KUnit tests. Requires
 			CONFIG_KUNIT to be set to be fully enabled. The
 			default value can be overridden via
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 6265a2db449b..3dde6e1d99d7 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -98,6 +98,21 @@ void ktext_replication_patch_alternative(__le32 *src, int nr_inst)
 	}
 }
 
+static bool ktext_enabled;
+
+static int __init parse_ktext(char *str)
+{
+	bool enabled;
+	int ret = kstrtobool(str, &enabled);
+
+	if (ret)
+		return ret;
+
+	ktext_enabled = enabled;
+	return 0;
+}
+early_param("ktext", parse_ktext);
+
 /* Allocate page tables and memory for the replicated kernel texts. */
 void __init ktext_replication_init(void)
 {
@@ -119,6 +134,9 @@ void __init ktext_replication_init(void)
 		return;
 	}
 
+	if (!ktext_enabled)
+		return;
+
 	for_each_node(nid) {
 		/* Nothing to do for node 0 */
 		if (!nid)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 16/19] arm64: text replication: add Kconfig
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (14 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 15/19] arm64: text replication: early kernel option to enable replication Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 17/19] arm64: text replication: fix compilation warning Hao Jia
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Add the Kconfig symbol for kernel text replication. This unfortunately
requires KASAN and kernel text randomisation options to be disabled at
the moment.

Currently, we do not support CONFIG_ARM64_16K_PAGES and
CONFIG_PGTABLE_LEVLS=4, because PGDIR_SIZE is 128T,
which is too large to allow the kernel text to
exclusively occupy an L0 page table entry.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/Kconfig | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8f6cf1221b6a..a9dfe6e0006a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -163,7 +163,7 @@ config ARM64
 	select HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
-	select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
+	select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48 && !REPLICATE_KTEXT)
 	select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
@@ -1443,6 +1443,13 @@ config NODES_SHIFT
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accommodate various tables.
 
+config REPLICATE_KTEXT
+	bool "Replicate kernel text across numa nodes"
+	depends on NUMA && !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
+	help
+	  Say Y here to enable replicating the kernel text across multiple
+	  nodes in a NUMA cluster.  This trades memory for speed.
+
 source "kernel/Kconfig.hz"
 
 config ARCH_SPARSEMEM_ENABLE
@@ -2161,6 +2168,7 @@ config RELOCATABLE
 
 config RANDOMIZE_BASE
 	bool "Randomize the address of the kernel image"
+	depends on !REPLICATE_KTEXT
 	select RELOCATABLE
 	help
 	  Randomizes the virtual address at which the kernel image is
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 17/19] arm64: text replication: fix compilation warning
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (15 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 16/19] arm64: text replication: add Kconfig Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 18/19] arm64: text replication: support more page sizes and levels Hao Jia
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc, Hao Jia

This commit fixes the following compilation warning When
configured CONFIG_ARM64_64K_PAGES.

./arch/arm64/include/asm/memory.h:56:6: warning: "CONFIG_ARM64_4K_PAGES" is not defined, evaluates to 0 [-Wundef]
 #if (CONFIG_ARM64_4K_PAGES && CONFIG_PGTABLE_LEVELS < 4) || \
      ^~~~~~~~~~~~~~~~~~~~~
./arch/arm64/include/asm/memory.h:57:6: warning: "CONFIG_ARM64_16K_PAGES" is not defined, evaluates to 0 [-Wundef]
     (CONFIG_ARM64_16K_PAGES && CONFIG_PGTABLE_LEVELS < 3) || \

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
 arch/arm64/include/asm/memory.h | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index c73820fb36a3..2652ce170550 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -53,11 +53,19 @@
  * CONFIG_ARM64_16K_PAGES, PGDIR_SIZE is 32M, 64G or 128T
  * CONFIG_ARM64_64K_PAGES, PGDIR_SIZE is 512M or 4T
  */
-#if (CONFIG_ARM64_4K_PAGES && CONFIG_PGTABLE_LEVELS < 4) || \
-    (CONFIG_ARM64_16K_PAGES && CONFIG_PGTABLE_LEVELS < 3) || \
-    (CONFIG_ARM64_64K_PAGES && CONFIG_PGTABLE_LEVELS < 2)
+#if defined(CONFIG_ARM64_4K_PAGES) && CONFIG_PGTABLE_LEVELS < 4
 #define KIMAGE_OFFSET		MODULES_VSIZE
-#else
+#elif defined(CONFIG_ARM64_4K_PAGES)
+#define KIMAGE_OFFSET		PGDIR_SIZE
+#endif
+#if defined(CONFIG_ARM64_16K_PAGES) && CONFIG_PGTABLE_LEVELS < 3
+#define KIMAGE_OFFSET		MODULES_VSIZE
+#elif defined(CONFIG_ARM64_16K_PAGES)
+#define KIMAGE_OFFSET		PGDIR_SIZE
+#endif
+#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_PGTABLE_LEVELS < 2
+#define KIMAGE_OFFSET		MODULES_VSIZE
+#elif defined(CONFIG_ARM64_64K_PAGES)
 #define KIMAGE_OFFSET		PGDIR_SIZE
 #endif
 #define KIMAGE_VADDR		(_PAGE_END(VA_BITS_MIN) + KIMAGE_OFFSET)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 18/19] arm64: text replication: support more page sizes and levels
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (16 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 17/19] arm64: text replication: fix compilation warning Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  8:53 ` [PATCH v3 19/19] arm64: text replication: keep modules inside module region when REPLICATE_KTEXT is enabled Hao Jia
  2024-01-17  9:41 ` [PATCH v3 00/19] arm64 kernel text replication Russell King (Oracle)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc, Hao Jia

Previously, the page table group variable (pgtables) of each node
pointed to pgtable_node0 by default. This method only worked properly
in the configuration of 4K page szie and 4-level page table. Because
in this configuration, the offset between the member variables of
struct pgtables is exactly equal to the offset between *_pg_dir defined
in vmlinux.lds.S. But this won't work for other page sizes configurations.

Therefore, we modify the member variables of struct pgtables to pointer
variables and point to the global *_pg_dir defined in vmlinux.lds.S by
default, which will no longer rely on offset equality. The member variables
of struct pgtables will be allocated memory separately and reassigned in
ktext_replication_init(). This will allow us to support more page sizes
and page level configurations.

In addition, the kernel text size is not always smaller than PGDIR_SIZE
(for example, PGDIR_SIZE is 32M when 16K page size and 2-level page table
are configured). The kernel text may need to occupy more than one L0 page
table entry. So we need to clean up the pgdir entry of kernel mapping in
a loop in ktext_replication_init().

But we still cannot support the configuration of 16K page size and 4-level
page table. In this configuration, PGDIR_SIZE is 128T, because it is too large
to allow the kernel text to exclusively occupy at least one L0 page table entry.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
 arch/arm64/include/asm/pgtable.h | 12 +++-----
 arch/arm64/kernel/vmlinux.lds.S  |  3 --
 arch/arm64/mm/ktext.c            | 53 ++++++++++++++++++++------------
 3 files changed, 38 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 62a9d3e11fe1..e0b428e780c7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -21,7 +21,7 @@
  * VMALLOC_END: extends to the available space below vmemmap, PCI I/O space
  *	and fixed mappings
  */
-#define VMALLOC_START		(MODULES_END + PGDIR_SIZE)
+#define VMALLOC_START		(MODULES_END + KIMAGE_OFFSET)
 #define VMALLOC_END		(VMEMMAP_START - SZ_256M)
 
 #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
@@ -625,17 +625,13 @@ extern pgd_t reserved_pg_dir[PTRS_PER_PGD];
 
 struct pgtables {
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-	pgd_t tramp_pg_dir[PTRS_PER_PGD];
+	pgd_t *tramp_pg_dir;
 #endif
-	pgd_t reserved_pg_dir[PTRS_PER_PGD];
-	pgd_t swapper_pg_dir[PTRS_PER_PGD];
+	pgd_t *reserved_pg_dir;
+	pgd_t *swapper_pg_dir;
 };
 
-extern struct pgtables pgtable_node0;
-
 #ifdef CONFIG_REPLICATE_KTEXT
-extern struct pgtables *pgtables[MAX_NUMNODES];
-
 pgd_t *swapper_pg_dir_node(void);
 phys_addr_t __swapper_pg_dir_node_phys(int nid);
 phys_addr_t swapper_pg_dir_node_phys(void);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index d3c7ed76adbf..3cd7e76cc562 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -212,9 +212,6 @@ SECTIONS
 	idmap_pg_dir = .;
 	. += PAGE_SIZE;
 
-	/* pgtable struct - covers the tramp, reserved and swapper pgdirs */
-	pgtable_node0 = .;
-
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 	tramp_pg_dir = .;
 	. += PAGE_SIZE;
diff --git a/arch/arm64/mm/ktext.c b/arch/arm64/mm/ktext.c
index 3dde6e1d99d7..e50828189824 100644
--- a/arch/arm64/mm/ktext.c
+++ b/arch/arm64/mm/ktext.c
@@ -16,15 +16,21 @@
 #include <asm/memory.h>
 #include <asm/pgalloc.h>
 
-struct pgtables *pgtables[MAX_NUMNODES] = {
-	[0 ... MAX_NUMNODES - 1] = &pgtable_node0,
+static struct pgtables pgtables[MAX_NUMNODES] = {
+	[0 ... MAX_NUMNODES - 1] = {
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+		tramp_pg_dir,
+#endif
+		reserved_pg_dir,
+		swapper_pg_dir
+	},
 };
 
 static void *kernel_texts[MAX_NUMNODES];
 
 static pgd_t *__swapper_pg_dir_node(int nid)
 {
-	return pgtables[nid]->swapper_pg_dir;
+	return pgtables[nid].swapper_pg_dir;
 }
 
 pgd_t *swapper_pg_dir_node(void)
@@ -116,20 +122,21 @@ early_param("ktext", parse_ktext);
 /* Allocate page tables and memory for the replicated kernel texts. */
 void __init ktext_replication_init(void)
 {
+	int kidx_base = pgd_index((phys_addr_t)KERNEL_START);
+	int kidx_end = pgd_index((phys_addr_t)KERNEL_END);
 	size_t size = __end_rodata - _stext;
-	int kidx = pgd_index((phys_addr_t)KERNEL_START);
-	int nid;
+	int nid, i;
 
 	/*
 	 * If we've messed up and the kernel shares a L0 entry with the
 	 * module or vmalloc area, then don't even attempt to use text
 	 * replication.
 	 */
-	if (pgd_index(MODULES_VADDR) == kidx) {
+	if (pgd_index(MODULES_VADDR) == kidx_base) {
 		pr_warn("Kernel is located in the same L0 index as modules - text replication disabled\n");
 		return;
 	}
-	if (pgd_index(VMALLOC_START) == kidx) {
+	if (pgd_index(VMALLOC_START) == kidx_end) {
 		pr_warn("Kernel is located in the same L0 index as vmalloc - text replication disabled\n");
 		return;
 	}
@@ -149,36 +156,44 @@ void __init ktext_replication_init(void)
 				       (u64)kernel_texts[nid] + size);
 
 		/* Allocate the pagetables for this node */
-		pgtables[nid] = memblock_alloc_node(sizeof(*pgtables[0]),
-						    PGD_SIZE, nid);
-
+		pgtables[nid].swapper_pg_dir = memblock_alloc_node(sizeof(swapper_pg_dir),
+									PGD_SIZE, nid);
+		pgtables[nid].reserved_pg_dir = memblock_alloc_node(sizeof(reserved_pg_dir),
+									PGD_SIZE, nid);
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+		pgtables[nid].tramp_pg_dir = memblock_alloc_node(sizeof(tramp_pg_dir),
+									PGD_SIZE, nid);
+#endif
 		/* Copy initial swapper page directory */
-		memcpy(pgtables[nid]->swapper_pg_dir, swapper_pg_dir, PGD_SIZE);
+		memcpy(pgtables[nid].swapper_pg_dir, swapper_pg_dir, PGD_SIZE);
 
 		/* Clear the kernel mapping */
-		memset(&pgtables[nid]->swapper_pg_dir[kidx], 0,
-		       sizeof(pgtables[nid]->swapper_pg_dir[kidx]));
+		for (i = kidx_base; i <= kidx_end; i++)
+			memset(&pgtables[nid].swapper_pg_dir[i], 0,
+			       sizeof(pgtables[nid].swapper_pg_dir[i]));
 
 		/* Create kernel mapping pointing at our local copy */
-		create_kernel_nid_map(pgtables[nid]->swapper_pg_dir,
+		create_kernel_nid_map(pgtables[nid].swapper_pg_dir,
 				      kernel_texts[nid]);
 	}
 }
 
 void ktext_replication_set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 {
+	int kidx_base = pgd_index((phys_addr_t)KERNEL_START);
+	int kidx_end = pgd_index((phys_addr_t)KERNEL_END);
 	unsigned long idx = pgdp - swapper_pg_dir;
 	int nid;
 
 	if (WARN_ON_ONCE(idx >= PTRS_PER_PGD) ||
-	    WARN_ON_ONCE(idx == pgd_index((phys_addr_t)KERNEL_START)))
+	    WARN_ON_ONCE(idx >= kidx_base && idx <= kidx_end))
 		return;
 
 	for_each_node(nid) {
-		if (pgtables[nid]->swapper_pg_dir == swapper_pg_dir)
+		if (pgtables[nid].swapper_pg_dir == swapper_pg_dir)
 			continue;
 
-		WRITE_ONCE(pgtables[nid]->swapper_pg_dir[idx], pgd);
+		WRITE_ONCE(pgtables[nid].swapper_pg_dir[idx], pgd);
 	}
 }
 
@@ -189,10 +204,10 @@ void __init ktext_replication_init_tramp(void)
 
 	for_each_node(nid) {
 		/* Nothing to do for node 0 */
-		if (pgtables[nid]->tramp_pg_dir == tramp_pg_dir)
+		if (!nid)
 			continue;
 
-		memcpy(pgtables[nid]->tramp_pg_dir, tramp_pg_dir, PGD_SIZE);
+		memcpy(pgtables[nid].tramp_pg_dir, tramp_pg_dir, PGD_SIZE);
 	}
 }
 #endif
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 19/19] arm64: text replication: keep modules inside module region when REPLICATE_KTEXT is enabled
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (17 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 18/19] arm64: text replication: support more page sizes and levels Hao Jia
@ 2024-01-17  8:53 ` Hao Jia
  2024-01-17  9:41 ` [PATCH v3 00/19] arm64 kernel text replication Russell King (Oracle)
  19 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-17  8:53 UTC (permalink / raw)
  To: mark.rutland, rmk+kernel, catalin.marinas, corbet, will, willy
  Cc: linux-arm-kernel, linux-doc, Hao Jia

Kernel text replication requires maintaining a separate per-node page table
for kernel text. To accomplish this without affecting other kernel memory maps,
it is best to place the kernel in a location that does not share L0 page
table entries with any other mappings.

So, limit the module_alloc() address range so that they do not overlap.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
 arch/arm64/kernel/module.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index dd851297596e..53e1c5e50907 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -112,6 +112,7 @@ subsys_initcall(module_init_limits);
 
 void *module_alloc(unsigned long size)
 {
+	u64 module_direct_end, module_plt_end;
 	void *p = NULL;
 
 	/*
@@ -119,18 +120,33 @@ void *module_alloc(unsigned long size)
 	 * kernel such that no PLTs are necessary.
 	 */
 	if (module_direct_base) {
+#ifdef CONFIG_REPLICATE_KTEXT
+		/*
+		 * Kernel text replication requires an L0 page table entry to
+		 * be exclusive to kernel text, so no other mappings should be
+		 * shared with it.
+		 */
+		module_direct_end = MODULES_END;
+#else
+		module_direct_end = module_direct_base + SZ_128M;
+#endif
 		p = __vmalloc_node_range(size, MODULE_ALIGN,
 					 module_direct_base,
-					 module_direct_base + SZ_128M,
+					 module_direct_end,
 					 GFP_KERNEL | __GFP_NOWARN,
 					 PAGE_KERNEL, 0, NUMA_NO_NODE,
 					 __builtin_return_address(0));
 	}
 
 	if (!p && module_plt_base) {
+#ifdef CONFIG_REPLICATE_KTEXT
+		module_plt_end = MODULES_END;
+#else
+		module_plt_end = module_plt_base + SZ_2G;
+#endif
 		p = __vmalloc_node_range(size, MODULE_ALIGN,
 					 module_plt_base,
-					 module_plt_base + SZ_2G,
+					 module_plt_end,
 					 GFP_KERNEL | __GFP_NOWARN,
 					 PAGE_KERNEL, 0, NUMA_NO_NODE,
 					 __builtin_return_address(0));
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 00/19] arm64 kernel text replication
  2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
                   ` (18 preceding siblings ...)
  2024-01-17  8:53 ` [PATCH v3 19/19] arm64: text replication: keep modules inside module region when REPLICATE_KTEXT is enabled Hao Jia
@ 2024-01-17  9:41 ` Russell King (Oracle)
  2024-01-18  7:00   ` [External] " Hao Jia
  19 siblings, 1 reply; 23+ messages in thread
From: Russell King (Oracle) @ 2024-01-17  9:41 UTC (permalink / raw)
  To: Hao Jia
  Cc: mark.rutland, catalin.marinas, corbet, will, willy,
	linux-arm-kernel, linux-doc, root

On Wed, Jan 17, 2024 at 04:53:38PM +0800, Hao Jia wrote:
> From: root <root@n144-101-220.byted.org>
> 
> Many thanks to Russell King for his previous work on
> arm64 kernel text replication.
> https://lore.kernel.org/all/ZMKNYEkM7YnrDtOt@shell.armlinux.org.uk
> 
> After applying these patches, we tested that our business performance
> increased by more than 5% and the NUMA node memory bandwidth was more
> balanced.
> I've recently been trying to make it work with different numbers of
> page tables/page sizes, so updated this patch set to V3.
> 
> Patch overview:
> 
> Patch 1-16 is a patch set based on Russell King's previous arm64
> kernel text replication, rebased on commit 052d534373b7.
> 
> The following three patches are new in v3:
> patch 17 fixes compilation warning
> 
> patch 18 adapts arm64 kernel text replication to support more
> page tables/page sizes, in addition to 16K page size and
> 4-level page tables.
> 
> patch 19 fixes the abnormal startup problem caused by module_alloc()
> which may allocate an address larger than KIMAGE_VADDR when kernel text
> replication is enabled.
> 
> [v2] https://lore.kernel.org/all/ZMKNYEkM7YnrDtOt@shell.armlinux.org.uk
> [RFC] https://lore.kernel.org/all/ZHYCUVa8fzmB4XZV@shell.armlinux.org.uk
> 
> Please correct me if I've made a mistake, thank you very much!

Note that, even though I haven't posted an update (I see it as mostly
pointless because *noone* commented on the previous posting) I do
maintain these patches:

  git://git.armlinux.org.uk/~rmk/linux-arm.git aarch64/ktext/head

currently has them against v6.7

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3 00/19] arm64 kernel text replication
  2024-01-17  9:41 ` [PATCH v3 00/19] arm64 kernel text replication Russell King (Oracle)
@ 2024-01-18  7:00   ` Hao Jia
  0 siblings, 0 replies; 23+ messages in thread
From: Hao Jia @ 2024-01-18  7:00 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: mark.rutland, catalin.marinas, corbet, will, willy,
	linux-arm-kernel, linux-doc, root



On 2024/1/17 Russell King (Oracle) wrote:
> On Wed, Jan 17, 2024 at 04:53:38PM +0800, Hao Jia wrote:
>> Many thanks to Russell King for his previous work on
>> arm64 kernel text replication.
>> https://lore.kernel.org/all/ZMKNYEkM7YnrDtOt@shell.armlinux.org.uk
>>
>> After applying these patches, we tested that our business performance
>> increased by more than 5% and the NUMA node memory bandwidth was more
>> balanced.
>> I've recently been trying to make it work with different numbers of
>> page tables/page sizes, so updated this patch set to V3.
>>
>> Patch overview:
>>
>> Patch 1-16 is a patch set based on Russell King's previous arm64
>> kernel text replication, rebased on commit 052d534373b7.
>>
>> The following three patches are new in v3:
>> patch 17 fixes compilation warning
>>
>> patch 18 adapts arm64 kernel text replication to support more
>> page tables/page sizes, in addition to 16K page size and
>> 4-level page tables.
>>
>> patch 19 fixes the abnormal startup problem caused by module_alloc()
>> which may allocate an address larger than KIMAGE_VADDR when kernel text
>> replication is enabled.
>>
>> [v2] https://lore.kernel.org/all/ZMKNYEkM7YnrDtOt@shell.armlinux.org.uk
>> [RFC] https://lore.kernel.org/all/ZHYCUVa8fzmB4XZV@shell.armlinux.org.uk
>>
>> Please correct me if I've made a mistake, thank you very much!
> 
> Note that, even though I haven't posted an update (I see it as mostly
> pointless because *noone* commented on the previous posting) I do
> maintain these patches:
> 
>    git://git.armlinux.org.uk/~rmk/linux-arm.git aarch64/ktext/head
> 
> currently has them against v6.7
> 

Thanks for sharing the information.
Would you mind reviewing patch 18 and patch 19?

patch 18 attempts to adapt arm64 kernel text replication to support more 
page tables/page sizes
patch 19 fixes the problem of abnormal startup when kernel text 
replication is enabled

Maybe you have a better idea to support more page tables/page sizes, any 
suggestions would make sense to me.


Thanks,
Hao

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 00/19] arm64 kernel text replication
       [not found] <20240123103509.696983-1-wangyuquan1236@phytium.com.cn>
@ 2024-01-23 17:25 ` Russell King (Oracle)
  0 siblings, 0 replies; 23+ messages in thread
From: Russell King (Oracle) @ 2024-01-23 17:25 UTC (permalink / raw)
  To: Yuquan Wang; +Cc: jiahao.os, linux-arm-kernel, linux-doc

On Tue, Jan 23, 2024 at 06:35:09PM +0800, Yuquan Wang wrote:
> > 
> > After applying these patches, we tested that our business performance
> > increased by more than 5% and the NUMA node memory bandwidth was more
> > balanced.
> > 
> 
> I have successfully applied your patches on my arm64 linux. And I could 
> start it with a qemu machine(virt). However, I don't know the way to test
> the performance it brings to the kernel. Do you have some suggestions?

Please can I make one thing utterly clear... kernel text replication
in a virtual machine generally doesn't make sense unless one can
setup the virtual machine to be truly NUMA. In other words, groups
of CPUs with their local memory and remote-node memory having higher
latency.

Kernel text replication is something which solves the problem on
bare metal NUMA machines where running kernel text that is located
in a foreign node results in the CPU running slower than it would
do if the kernel text were in its local RAM.

Unless the VM is setup in exactly that way, then kernel text
replication has no place in a VM, and probably would result in
poorer performance.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-01-23 17:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-17  8:53 [PATCH v3 00/19] arm64 kernel text replication Hao Jia
2024-01-17  8:53 ` [PATCH v3 01/19] arm64: provide cpu_replace_ttbr1_phys() Hao Jia
2024-01-17  8:53 ` [PATCH v3 02/19] arm64: make clean_dcache_range_nopatch() visible Hao Jia
2024-01-17  8:53 ` [PATCH v3 03/19] arm64: place kernel in its own L0 page table entry Hao Jia
2024-01-17  8:53 ` [PATCH v3 04/19] arm64: text replication: add init function Hao Jia
2024-01-17  8:53 ` [PATCH v3 05/19] arm64: text replication: add sanity checks Hao Jia
2024-01-17  8:53 ` [PATCH v3 06/19] arm64: text replication: copy initial kernel text Hao Jia
2024-01-17  8:53 ` [PATCH v3 07/19] arm64: text replication: add node text patching Hao Jia
2024-01-17  8:53 ` [PATCH v3 08/19] arm64: text replication: add node 0 page table definitions Hao Jia
2024-01-17  8:53 ` [PATCH v3 09/19] arm64: text replication: add swapper page directory helpers Hao Jia
2024-01-17  8:53 ` [PATCH v3 10/19] arm64: text replication: create per-node kernel page tables Hao Jia
2024-01-17  8:53 ` [PATCH v3 11/19] arm64: text replication: boot secondary CPUs with appropriate TTBR1 Hao Jia
2024-01-17  8:53 ` [PATCH v3 12/19] arm64: text replication: update cnp support Hao Jia
2024-01-17  8:53 ` [PATCH v3 13/19] arm64: text replication: setup page tables for copied kernel Hao Jia
2024-01-17  8:53 ` [PATCH v3 14/19] arm64: text replication: include most of read-only data as well Hao Jia
2024-01-17  8:53 ` [PATCH v3 15/19] arm64: text replication: early kernel option to enable replication Hao Jia
2024-01-17  8:53 ` [PATCH v3 16/19] arm64: text replication: add Kconfig Hao Jia
2024-01-17  8:53 ` [PATCH v3 17/19] arm64: text replication: fix compilation warning Hao Jia
2024-01-17  8:53 ` [PATCH v3 18/19] arm64: text replication: support more page sizes and levels Hao Jia
2024-01-17  8:53 ` [PATCH v3 19/19] arm64: text replication: keep modules inside module region when REPLICATE_KTEXT is enabled Hao Jia
2024-01-17  9:41 ` [PATCH v3 00/19] arm64 kernel text replication Russell King (Oracle)
2024-01-18  7:00   ` [External] " Hao Jia
     [not found] <20240123103509.696983-1-wangyuquan1236@phytium.com.cn>
2024-01-23 17:25 ` Russell King (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).