linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Jonathan Corbet <corbet@lwn.net>, Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH RFC 00/17] arm64 kernel text replication
Date: Mon, 5 Jun 2023 10:05:22 +0100	[thread overview]
Message-ID: <ZH2lUj0zDWFppdJI@shell.armlinux.org.uk> (raw)
In-Reply-To: <ZHYCUVa8fzmB4XZV@shell.armlinux.org.uk>

Hi,

Are there any comments on this?

Thanks.

On Tue, May 30, 2023 at 03:04:01PM +0100, Russell King (Oracle) wrote:
> Problem
> -------
> 
> NUMA systems have greater latency when accessing data and instructions
> across nodes, which can lead to a reduction in performance on CPU cores
> that mainly perform accesses beyond their local node.
> 
> Normally when an ARM64 system boots, the kernel will end up placed in
> memory, and each CPU core will have to fetch instructions and data from
> which ever NUMA node the kernel has been placed. This means that while
> executing kernel code, CPUs local to that node will run faster than
> CPUs in remote nodes.
> 
> The higher the latency to access remote NUMA node memory, the more the
> kernel performance suffers on those nodes.
> 
> If there is a local copy of the kernel text in each node's RAM, and
> each node runs the kernel using its local copy of the kernel text,
> then it stands to reason that the kernel will run faster due to fewer
> stalls while instructions are fetched from remote memory.
> 
> The question then arises how to achieve this.
> 
> Background
> ----------
> 
> An important issue to contend with is what happens when a thread
> migrates between nodes. Essentially, the thread's state (including
> instruction pointer) is saved to memory, and the scheduler on that CPU
> loads some other thread's state and that CPU resumes executing that
> new thread.
> 
> The CPU gaining the migrating thread loads the saved state, again
> including the instruction pointer, and the gaining CPU resumes fetching
> instructions at the virtual address where the original CPU left off.
> 
> The key point is that the virtual address is what matters here, and
> this gives us a way to implement kernel text replication fairly easily.
> At a practical level, all we need to do is to ensure that the virtual
> addresses which contain the kernel text point to a local copy of the
> that text.
> 
> This is exactly how this proposal of kernel text replication achieves
> the replication. We can go a little bit further and include most of
> the read-only data in this replication, as that will never be written
> to by the kernel (and thus remains constant.)
> 
> Solution
> --------
> 
> So, what we need to achieve is:
> 
> 1. multiple identical copies of the kernel text (and read-only data)
> 2. point the virtual mappings to the appropriate copy of kernel text
>    for the NUMA node.
> 
> (1) is fairly easy to achieve - we just need to allocate some memory
> in the appropriate node and copy the parts of the kernel we want to
> replicate. However, we also need to deal with ARM64's kernel patching.
> There are two functions that patch the kernel text,
> __apply_alternatives() and aarch64_insn_patch_text_nosync(). Both of
> these need to to be modified to update all copies of the kernel text.
> 
> (2) is slightly harder.
> 
> Firstly, the aarch64 architecture has a very useful feature here - the
> kernel page tables are entirely separate from the user page tables.
> The hardware contains two page table pointers, one is used for user
> mappings, the other is used for kernel mappings.
> 
> Therefore, we only have one page table to be concerned with: the table
> which maps kernel space. We do not need to be concerned with each
> user processes page table.
> 
> The approach taken here is to ensure that the kernel is located in an
> area of kernel virtual address space covered by a level-0 page table
> entry which is not shared with any other user. We can then maintain
> separate per-node level-0 page tables for kernel space where the only
> difference between them is this level-0 page table entry.
> 
> This gives a couple of benefits. Firstly, when updates to the level-0
> page table happen (e.g. when establishing new mappings) these updates
> can simply be copied to the other level-0 page tables provided it isn't
> for the kernel image. Secondly, we don't need complexity at lower
> levels of the page table code to figure out whether a level-1 or lower
> update needs to be propagated to other nodes.
> 
> The level-0 page table entry for the kernel can then be used to point
> at a node-unique set of level 1..N page tables to make the appropriate
> copy of the kernel text (and read-only data) into kernel space, while
> keeping the kernel read-write data shared between nodes.
> 
> Performance Analysis
> --------------------
> 
> Needless to say, the performance results from kernel text replication
> are workload specific, but appear to show a gain of between 6% and
> 17% for database-centric like workloads. When combined with userspace
> awareness of NUMA, this can result in a gain of over 50%.
> 
> Problems
> --------
> 
> There are a few areas that are a problem for kernel text replication:
> 1) As this series changes the kernel space virtual address space
>    layout, it breaks KASAN - and I've zero knowledge of KASAN so I
>    have no idea how to fix it. I would be grateful for input from
>    KASAN folk for suggestions how to fix this.
> 
> 2) KASLR can not be used with kernel text replication, since we need
>    to place the kernel in its own L0 page table entry, not in vmalloc
>    space. KASLR is disabled when support for kernel text replication
>    is enabled.
> 
> 3) Changing the kernel virtual address space layout also means that
>    kaslr_offset() and kaslr_enabled() need to become macros rather
>    than inline functions due to the use of PGDIR_SIZE in the
>    calculation of KIMAGE_VADDR. Since asm/pgtable.h defines this
>    constant, but asm/memory.h is included by asm/pgtable.h, having
>    this symbol available would produce a circular include
>    dependency, so I don't think there is any choice here.
> 
> 4) read-only protection for replicated kernel images is not yet
>    implemented.
> 
> Patch overview:
> 
> Patch 1 cleans up the rox page protection logic.
> Patch 2 reoganises the kernel virtual address space layout (causing
>   problems (1 and 3).
> Patch 3 provides a version of cpu_replace_ttbr1 that takes physical
>   addresses.
> Patch 4 makes a needed cache flushing function visible.
> Patch 5 through 16 are the guts of kernel text replication.
> Patch 17 adds the Kconfig entry for it.
> 
> Further patches not included in this set add a Kconfig for the default
> state, a test module, and add code to verify the replicated kernel
> text matches the node 0 text after the kernel has completed most of
> its boot.
> 
>  Documentation/admin-guide/kernel-parameters.txt |   5 +
>  arch/arm64/Kconfig                              |  10 +-
>  arch/arm64/include/asm/cacheflush.h             |   2 +
>  arch/arm64/include/asm/ktext.h                  |  45 ++++++
>  arch/arm64/include/asm/memory.h                 |  26 ++--
>  arch/arm64/include/asm/mmu_context.h            |  12 +-
>  arch/arm64/include/asm/pgtable.h                |  35 ++++-
>  arch/arm64/include/asm/smp.h                    |   1 +
>  arch/arm64/kernel/alternative.c                 |   4 +-
>  arch/arm64/kernel/asm-offsets.c                 |   1 +
>  arch/arm64/kernel/cpufeature.c                  |   2 +-
>  arch/arm64/kernel/head.S                        |   3 +-
>  arch/arm64/kernel/hibernate.c                   |   2 +-
>  arch/arm64/kernel/patching.c                    |   7 +-
>  arch/arm64/kernel/smp.c                         |   3 +
>  arch/arm64/kernel/suspend.c                     |   3 +-
>  arch/arm64/kernel/vmlinux.lds.S                 |   3 +
>  arch/arm64/mm/Makefile                          |   2 +
>  arch/arm64/mm/init.c                            |   3 +
>  arch/arm64/mm/ktext.c                           | 198 ++++++++++++++++++++++++
>  arch/arm64/mm/mmu.c                             |  85 ++++++++--
>  21 files changed, 413 insertions(+), 39 deletions(-)
>  create mode 100644 arch/arm64/include/asm/ktext.h
>  create mode 100644 arch/arm64/mm/ktext.c
> 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-06-05  9:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 14:04 [PATCH RFC 00/17] arm64 kernel text replication Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 01/17] arm64: consolidate rox page protection logic Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry Russell King (Oracle)
     [not found]   ` <ZIb+Lg9F9b4ay90p@FVFF77S0Q05N>
2023-06-12 15:04     ` Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 03/17] arm64: provide cpu_replace_ttbr1_phys() Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 04/17] arm64: make clean_dcache_range_nopatch() visible Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 05/17] arm64: text replication: add init function Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 06/17] arm64: text replication: add sanity checks Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 07/17] arm64: text replication: copy initial kernel text Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 08/17] arm64: text replication: add node text patching Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 09/17] arm64: text replication: add node 0 page table definitions Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 10/17] arm64: text replication: add swapper page directory helpers Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 11/17] arm64: text replication: create per-node kernel page tables Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 12/17] arm64: text replication: boot secondary CPUs with appropriate TTBR1 Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 13/17] arm64: text replication: update cnp support Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 14/17] arm64: text replication: setup page tables for copied kernel Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 15/17] arm64: text replication: include most of read-only data as well Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 16/17] arm64: text replication: early kernel option to enable replication Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 17/17] arm64: text replication: add Kconfig Russell King (Oracle)
2023-06-05  9:05 ` Russell King (Oracle) [this message]
2023-06-05 13:46   ` [PATCH RFC 00/17] arm64 kernel text replication Mark Rutland
2023-06-23 15:24   ` Ard Biesheuvel
2023-06-23 15:34     ` Russell King (Oracle)
2023-06-23 15:54       ` Marc Zyngier
2023-06-26 23:42         ` Lameter, Christopher
2023-06-27  8:02           ` Marc Zyngier
2023-06-23 16:37     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZH2lUj0zDWFppdJI@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).