public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Jonathan Corbet <corbet@lwn.net>, Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry
Date: Mon, 12 Jun 2023 12:14:54 +0100	[thread overview]
Message-ID: <ZIb+Lg9F9b4ay90p@FVFF77S0Q05N> (raw)
In-Reply-To: <E1q3zxo-00A5Js-9E@rmk-PC.armlinux.org.uk>

Hi Russell,

On Tue, May 30, 2023 at 03:04:40PM +0100, Russell King (Oracle) wrote:
> Kernel text replication needs to maintain separate per-node page
> tables for the kernel text. In order to do this without affecting
> other kernel memory mappings, placing the kernel such that it does
> not share a L0 page table entry with any other mapping is desirable.
> 
> Prior to this commit, the layout without KASLR was:
> 
> +----------+
> |  vmalloc |
> +----------+
> |  Kernel  |
> +----------+ MODULES_END, VMALLOC_START, KIMAGE_VADDR =
> |  Modules |                 MODULES_VADDR + MODULES_VSIZE
> +----------+ MODULES_VADDR = _PAGE_END(VA_BITS_MIN)
> | VA space |
> +----------+ 0
> 
> This becomes:
> 
> +----------+
> |  vmalloc |
> +----------+ VMALLOC_START = MODULES_END + PGDIR_SIZE
> |  Kernel  |
> +----------+ MODULES_END, KIMAGE_VADDR = _PAGE_END(VA_BITS_MIN) + PGDIR_SIZE
> |  Modules |
> +----------+ MODULES_VADDR = MODULES_END - MODULES_VSIZE
> | VA space |
> +----------+ 0

With KSASLR we may randomize the kernel and module space over a substantial
portion of the vmalloc range. Are you expecting that text replication is going
to restruct that range, or that we'd make it mutually exclusive with KASLR?

I also note that the L0 table could have as few as two entries (with 16K pages
and 4 levels). So either we'd need to also mess with an L1 table, or make text
replication mutually exclusive with such configurations.

> This assumes MODULES_VSIZE (128M) <= PGDIR_SIZE.

As a heads-up, we've just changed MODULES_VSIZE to be 2G in

  https://lore.kernel.org/linux-arm-kernel/20230530110328.2213762-1-mark.rutland@arm.com/

.. which is queued in the arm64 for-next/module-alloc branch:

  https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/module-alloc

> One side effect of this change is that KIMAGE_VADDR's definition now
> includes PGDIR_SIZE (to leave room for the modules) but this is not
> defined when asm/memory.h is included. This means KIMAGE_VADDR can
> not be used in inline functions within this file, so we convert
> kaslr_offset() and kaslr_enabled() to be macros instead.

That series above also decoupled kaslr_enabled() from kaslr_offset(), 
so we'd only need to change kaslr_offset().

> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
>  arch/arm64/include/asm/memory.h  | 26 ++++++++++----------------
>  arch/arm64/include/asm/pgtable.h |  2 +-
>  arch/arm64/mm/mmu.c              |  3 ++-
>  3 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index c735afdf639b..089f556b7387 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -43,9 +43,9 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define _PAGE_OFFSET(va)	(-(UL(1) << (va)))
>  #define PAGE_OFFSET		(_PAGE_OFFSET(VA_BITS))
> -#define KIMAGE_VADDR		(MODULES_END)
> -#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
> -#define MODULES_VADDR		(_PAGE_END(VA_BITS_MIN))
> +#define KIMAGE_VADDR		(_PAGE_END(VA_BITS_MIN) + PGDIR_SIZE)
> +#define MODULES_END		(KIMAGE_VADDR)
> +#define MODULES_VADDR		(MODULES_END - MODULES_VSIZE)
>  #define MODULES_VSIZE		(SZ_128M)
>  #define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
>  #define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
> @@ -199,20 +199,14 @@ extern u64			kimage_vaddr;
>  /* the offset between the kernel virtual and physical mappings */
>  extern u64			kimage_voffset;
>  
> -static inline unsigned long kaslr_offset(void)
> -{
> -	return kimage_vaddr - KIMAGE_VADDR;
> -}
> +#define kaslr_offset()	((unsigned long)(kimage_vaddr - KIMAGE_VADDR))
>  
> -static inline bool kaslr_enabled(void)
> -{
> -	/*
> -	 * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
> -	 * placement of the image rather than from the seed, so a displacement
> -	 * of less than MIN_KIMG_ALIGN means that no seed was provided.
> -	 */
> -	return kaslr_offset() >= MIN_KIMG_ALIGN;
> -}
> +/*
> + * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
> + * placement of the image rather than from the seed, so a displacement
> + * of less than MIN_KIMG_ALIGN means that no seed was provided.
> + */
> +#define kaslr_enabled()	(kaslr_offset() >= MIN_KIMG_ALIGN)
>  
>  /*
>   * Allow all memory at the discovery stage. We will clip it later.
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0bd18de9fd97..cb526e69299d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -21,7 +21,7 @@
>   * VMALLOC_END: extends to the available space below vmemmap, PCI I/O space
>   *	and fixed mappings
>   */
> -#define VMALLOC_START		(MODULES_END)
> +#define VMALLOC_START		(MODULES_END + PGDIR_SIZE)
>  #define VMALLOC_END		(VMEMMAP_START - SZ_256M)
>  
>  #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 4829abe017e9..baf74d0c43c9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -478,7 +478,8 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>  static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
>  				phys_addr_t size, pgprot_t prot)
>  {
> -	if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
> +	if ((virt >= PAGE_END) && (virt < VMALLOC_START) &&
> +	    !is_kernel(virt)) {
>  		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
>  			&phys, virt);
>  		return;

I think the existing conditions here aren't quite right, and have become bogus
over time, and I don't think that the is_kernel() check is necessary here.

Originally, back in commit:

  c1cc1552616d0f35 ("arm64: MMU initialisation")

We had:

	if (virt < VMALLOC_START) {
		pr_warning("BUG: not creating mapping for 0x%016llx at 0x%016lx - outside kernel range\n",
			   phys, virt);
		return;
	}

... which checked that the VA range we were manipulating was in the TTBR1 VA
range, as at the time, VMALLOC_START happened to be the lowest TTBR1 address.

That didn't substantially change until commit:

  14c127c957c1c607 ("arm64: mm: Flip kernel VA space")

... when the test was changed to:

	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
			&phys, virt);
		return;
	}

Note: in that commit, VA_START was actually the end of the linear map (which
was itself a the start of the TTBR1 address space), so this is just checking if
we're poking a small portion of the TTBR1 address space, rather than if we're
poking *outside* of the TTBR1 address space.

That doesn't make much sense, and I'm pretty sure that was a thinko rather than
an intentional change of semantic.

I "fixed" that without thinking in commit:

  77ad4ce69321abbe ("arm64: memory: rename VA_START to PAGE_END")

... making that:

	if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
			&phys, virt);
		return;
	}

... but clearly it has lost the original semantic and doesn't make much sense.

I think the test should actually be something like:

	/* Must be a TTBR1 address */
	if (virt < PAGE_OFFSET ) {
		...
	}

... and then we won't randomly trip for kernel mappings if those fall between
the linear map and vmalloc range.

Thanks,
Mark.

  reply	other threads:[~2023-06-12 11:37 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 14:04 [PATCH RFC 00/17] arm64 kernel text replication Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 01/17] arm64: consolidate rox page protection logic Russell King (Oracle)
2023-06-12 10:37   ` Mark Rutland
2023-05-30 14:04 ` [PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry Russell King (Oracle)
2023-06-12 11:14   ` Mark Rutland [this message]
2023-06-12 15:04     ` Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 03/17] arm64: provide cpu_replace_ttbr1_phys() Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 04/17] arm64: make clean_dcache_range_nopatch() visible Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 05/17] arm64: text replication: add init function Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 06/17] arm64: text replication: add sanity checks Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 07/17] arm64: text replication: copy initial kernel text Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 08/17] arm64: text replication: add node text patching Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 09/17] arm64: text replication: add node 0 page table definitions Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 10/17] arm64: text replication: add swapper page directory helpers Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 11/17] arm64: text replication: create per-node kernel page tables Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 12/17] arm64: text replication: boot secondary CPUs with appropriate TTBR1 Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 13/17] arm64: text replication: update cnp support Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 14/17] arm64: text replication: setup page tables for copied kernel Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 15/17] arm64: text replication: include most of read-only data as well Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 16/17] arm64: text replication: early kernel option to enable replication Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 17/17] arm64: text replication: add Kconfig Russell King (Oracle)
2023-06-05  9:05 ` [PATCH RFC 00/17] arm64 kernel text replication Russell King (Oracle)
2023-06-05 13:46   ` Mark Rutland
2023-06-23 15:24   ` Ard Biesheuvel
2023-06-23 15:34     ` Russell King (Oracle)
2023-06-23 15:54       ` Marc Zyngier
2023-06-26 23:42         ` Lameter, Christopher
2023-06-27  8:02           ` Marc Zyngier
2023-06-23 16:37     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIb+Lg9F9b4ay90p@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox