From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Jonathan Corbet <corbet@lwn.net>, Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry
Date: Mon, 12 Jun 2023 16:04:58 +0100 [thread overview]
Message-ID: <ZIc0Glqg2HalujXJ@shell.armlinux.org.uk> (raw)
In-Reply-To: <ZIb+Lg9F9b4ay90p@FVFF77S0Q05N>
On Mon, Jun 12, 2023 at 12:14:54PM +0100, Mark Rutland wrote:
> Hi Russell,
>
> On Tue, May 30, 2023 at 03:04:40PM +0100, Russell King (Oracle) wrote:
> > Kernel text replication needs to maintain separate per-node page
> > tables for the kernel text. In order to do this without affecting
> > other kernel memory mappings, placing the kernel such that it does
> > not share a L0 page table entry with any other mapping is desirable.
> >
> > Prior to this commit, the layout without KASLR was:
> >
> > +----------+
> > | vmalloc |
> > +----------+
> > | Kernel |
> > +----------+ MODULES_END, VMALLOC_START, KIMAGE_VADDR =
> > | Modules | MODULES_VADDR + MODULES_VSIZE
> > +----------+ MODULES_VADDR = _PAGE_END(VA_BITS_MIN)
> > | VA space |
> > +----------+ 0
> >
> > This becomes:
> >
> > +----------+
> > | vmalloc |
> > +----------+ VMALLOC_START = MODULES_END + PGDIR_SIZE
> > | Kernel |
> > +----------+ MODULES_END, KIMAGE_VADDR = _PAGE_END(VA_BITS_MIN) + PGDIR_SIZE
> > | Modules |
> > +----------+ MODULES_VADDR = MODULES_END - MODULES_VSIZE
> > | VA space |
> > +----------+ 0
>
> With KSASLR we may randomize the kernel and module space over a substantial
> portion of the vmalloc range. Are you expecting that text replication is going
> to restruct that range, or that we'd make it mutually exclusive with KASLR?
In the patch that adds the REPLICATE_KTEXT config option, I've made it
exclusive with RANDOMIZE_BASE, but this change in layout isn't dependent
on REPLICATE_KTEXT.
I've tested it with RANDOMIZE_BASE=y, and nothing seems to get upset,
so I believe that this patch doesn't cause any negative issues.
> I also note that the L0 table could have as few as two entries (with 16K pages
> and 4 levels). So either we'd need to also mess with an L1 table, or make text
> replication mutually exclusive with such configurations.
Ah, thanks for pointing that out - I was hoping to avoid needing
to touch anything but L0 tables.
However, it brings up a question: are there any NUMA systems that would
have just two entries in the L0 table? I suspect NUMA systems have lots
of RAM, and so would want a page table layout that results in multiple
L0 entries.
> > This assumes MODULES_VSIZE (128M) <= PGDIR_SIZE.
>
> As a heads-up, we've just changed MODULES_VSIZE to be 2G in
>
> https://lore.kernel.org/linux-arm-kernel/20230530110328.2213762-1-mark.rutland@arm.com/
>
> .. which is queued in the arm64 for-next/module-alloc branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/module-alloc
Ok - so I need to get a bit more clever about calculating MODULES_END
and KIMAGE_VADDR
> > One side effect of this change is that KIMAGE_VADDR's definition now
> > includes PGDIR_SIZE (to leave room for the modules) but this is not
> > defined when asm/memory.h is included. This means KIMAGE_VADDR can
> > not be used in inline functions within this file, so we convert
> > kaslr_offset() and kaslr_enabled() to be macros instead.
>
> That series above also decoupled kaslr_enabled() from kaslr_offset(),
> so we'd only need to change kaslr_offset().
Ok, I'll take a look to see how my changes are impacted.
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 4829abe017e9..baf74d0c43c9 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -478,7 +478,8 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> > static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
> > phys_addr_t size, pgprot_t prot)
> > {
> > - if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
> > + if ((virt >= PAGE_END) && (virt < VMALLOC_START) &&
> > + !is_kernel(virt)) {
> > pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
> > &phys, virt);
> > return;
>
> I think the existing conditions here aren't quite right, and have become bogus
> over time, and I don't think that the is_kernel() check is necessary here.
>
> Originally, back in commit:
>
> c1cc1552616d0f35 ("arm64: MMU initialisation")
>
> We had:
>
> if (virt < VMALLOC_START) {
> pr_warning("BUG: not creating mapping for 0x%016llx at 0x%016lx - outside kernel range\n",
> phys, virt);
> return;
> }
>
> ... which checked that the VA range we were manipulating was in the TTBR1 VA
> range, as at the time, VMALLOC_START happened to be the lowest TTBR1 address.
>
> That didn't substantially change until commit:
>
> 14c127c957c1c607 ("arm64: mm: Flip kernel VA space")
>
> ... when the test was changed to:
>
> if ((virt >= VA_START) && (virt < VMALLOC_START)) {
> pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
> &phys, virt);
> return;
> }
>
> Note: in that commit, VA_START was actually the end of the linear map (which
> was itself a the start of the TTBR1 address space), so this is just checking if
> we're poking a small portion of the TTBR1 address space, rather than if we're
> poking *outside* of the TTBR1 address space.
>
> That doesn't make much sense, and I'm pretty sure that was a thinko rather than
> an intentional change of semantic.
>
> I "fixed" that without thinking in commit:
>
> 77ad4ce69321abbe ("arm64: memory: rename VA_START to PAGE_END")
>
> ... making that:
>
> if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
> pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
> &phys, virt);
> return;
> }
>
> ... but clearly it has lost the original semantic and doesn't make much sense.
>
> I think the test should actually be something like:
>
> /* Must be a TTBR1 address */
> if (virt < PAGE_OFFSET ) {
> ...
> }
>
> ... and then we won't randomly trip for kernel mappings if those fall between
> the linear map and vmalloc range.
Okay, so that sounds like if this is fixed, then I won't need to patch
it! Yay!
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
WARNING: multiple messages have this Message-ID (diff)
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Jonathan Corbet <corbet@lwn.net>, Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry
Date: Mon, 12 Jun 2023 16:04:58 +0100 [thread overview]
Message-ID: <ZIc0Glqg2HalujXJ@shell.armlinux.org.uk> (raw)
In-Reply-To: <ZIb+Lg9F9b4ay90p@FVFF77S0Q05N>
On Mon, Jun 12, 2023 at 12:14:54PM +0100, Mark Rutland wrote:
> Hi Russell,
>
> On Tue, May 30, 2023 at 03:04:40PM +0100, Russell King (Oracle) wrote:
> > Kernel text replication needs to maintain separate per-node page
> > tables for the kernel text. In order to do this without affecting
> > other kernel memory mappings, placing the kernel such that it does
> > not share a L0 page table entry with any other mapping is desirable.
> >
> > Prior to this commit, the layout without KASLR was:
> >
> > +----------+
> > | vmalloc |
> > +----------+
> > | Kernel |
> > +----------+ MODULES_END, VMALLOC_START, KIMAGE_VADDR =
> > | Modules | MODULES_VADDR + MODULES_VSIZE
> > +----------+ MODULES_VADDR = _PAGE_END(VA_BITS_MIN)
> > | VA space |
> > +----------+ 0
> >
> > This becomes:
> >
> > +----------+
> > | vmalloc |
> > +----------+ VMALLOC_START = MODULES_END + PGDIR_SIZE
> > | Kernel |
> > +----------+ MODULES_END, KIMAGE_VADDR = _PAGE_END(VA_BITS_MIN) + PGDIR_SIZE
> > | Modules |
> > +----------+ MODULES_VADDR = MODULES_END - MODULES_VSIZE
> > | VA space |
> > +----------+ 0
>
> With KSASLR we may randomize the kernel and module space over a substantial
> portion of the vmalloc range. Are you expecting that text replication is going
> to restruct that range, or that we'd make it mutually exclusive with KASLR?
In the patch that adds the REPLICATE_KTEXT config option, I've made it
exclusive with RANDOMIZE_BASE, but this change in layout isn't dependent
on REPLICATE_KTEXT.
I've tested it with RANDOMIZE_BASE=y, and nothing seems to get upset,
so I believe that this patch doesn't cause any negative issues.
> I also note that the L0 table could have as few as two entries (with 16K pages
> and 4 levels). So either we'd need to also mess with an L1 table, or make text
> replication mutually exclusive with such configurations.
Ah, thanks for pointing that out - I was hoping to avoid needing
to touch anything but L0 tables.
However, it brings up a question: are there any NUMA systems that would
have just two entries in the L0 table? I suspect NUMA systems have lots
of RAM, and so would want a page table layout that results in multiple
L0 entries.
> > This assumes MODULES_VSIZE (128M) <= PGDIR_SIZE.
>
> As a heads-up, we've just changed MODULES_VSIZE to be 2G in
>
> https://lore.kernel.org/linux-arm-kernel/20230530110328.2213762-1-mark.rutland@arm.com/
>
> .. which is queued in the arm64 for-next/module-alloc branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/module-alloc
Ok - so I need to get a bit more clever about calculating MODULES_END
and KIMAGE_VADDR
> > One side effect of this change is that KIMAGE_VADDR's definition now
> > includes PGDIR_SIZE (to leave room for the modules) but this is not
> > defined when asm/memory.h is included. This means KIMAGE_VADDR can
> > not be used in inline functions within this file, so we convert
> > kaslr_offset() and kaslr_enabled() to be macros instead.
>
> That series above also decoupled kaslr_enabled() from kaslr_offset(),
> so we'd only need to change kaslr_offset().
Ok, I'll take a look to see how my changes are impacted.
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 4829abe017e9..baf74d0c43c9 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -478,7 +478,8 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> > static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
> > phys_addr_t size, pgprot_t prot)
> > {
> > - if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
> > + if ((virt >= PAGE_END) && (virt < VMALLOC_START) &&
> > + !is_kernel(virt)) {
> > pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
> > &phys, virt);
> > return;
>
> I think the existing conditions here aren't quite right, and have become bogus
> over time, and I don't think that the is_kernel() check is necessary here.
>
> Originally, back in commit:
>
> c1cc1552616d0f35 ("arm64: MMU initialisation")
>
> We had:
>
> if (virt < VMALLOC_START) {
> pr_warning("BUG: not creating mapping for 0x%016llx at 0x%016lx - outside kernel range\n",
> phys, virt);
> return;
> }
>
> ... which checked that the VA range we were manipulating was in the TTBR1 VA
> range, as at the time, VMALLOC_START happened to be the lowest TTBR1 address.
>
> That didn't substantially change until commit:
>
> 14c127c957c1c607 ("arm64: mm: Flip kernel VA space")
>
> ... when the test was changed to:
>
> if ((virt >= VA_START) && (virt < VMALLOC_START)) {
> pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
> &phys, virt);
> return;
> }
>
> Note: in that commit, VA_START was actually the end of the linear map (which
> was itself a the start of the TTBR1 address space), so this is just checking if
> we're poking a small portion of the TTBR1 address space, rather than if we're
> poking *outside* of the TTBR1 address space.
>
> That doesn't make much sense, and I'm pretty sure that was a thinko rather than
> an intentional change of semantic.
>
> I "fixed" that without thinking in commit:
>
> 77ad4ce69321abbe ("arm64: memory: rename VA_START to PAGE_END")
>
> ... making that:
>
> if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
> pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
> &phys, virt);
> return;
> }
>
> ... but clearly it has lost the original semantic and doesn't make much sense.
>
> I think the test should actually be something like:
>
> /* Must be a TTBR1 address */
> if (virt < PAGE_OFFSET ) {
> ...
> }
>
> ... and then we won't randomly trip for kernel mappings if those fall between
> the linear map and vmalloc range.
Okay, so that sounds like if this is fixed, then I won't need to patch
it! Yay!
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-06-12 15:05 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 14:04 [PATCH RFC 00/17] arm64 kernel text replication Russell King (Oracle)
2023-05-30 14:04 ` Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 01/17] arm64: consolidate rox page protection logic Russell King (Oracle)
2023-05-30 14:04 ` Russell King (Oracle)
2023-06-12 10:37 ` Mark Rutland
2023-05-30 14:04 ` [PATCH RFC 02/17] arm64: place kernel in its own L0 page table entry Russell King (Oracle)
2023-05-30 14:04 ` Russell King (Oracle)
2023-06-12 11:14 ` Mark Rutland
2023-06-12 15:04 ` Russell King (Oracle) [this message]
2023-06-12 15:04 ` Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 03/17] arm64: provide cpu_replace_ttbr1_phys() Russell King (Oracle)
2023-05-30 14:04 ` Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 04/17] arm64: make clean_dcache_range_nopatch() visible Russell King (Oracle)
2023-05-30 14:04 ` Russell King (Oracle)
2023-05-30 14:04 ` [PATCH RFC 05/17] arm64: text replication: add init function Russell King (Oracle)
2023-05-30 14:04 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 06/17] arm64: text replication: add sanity checks Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 07/17] arm64: text replication: copy initial kernel text Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 08/17] arm64: text replication: add node text patching Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 09/17] arm64: text replication: add node 0 page table definitions Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 10/17] arm64: text replication: add swapper page directory helpers Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 11/17] arm64: text replication: create per-node kernel page tables Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 12/17] arm64: text replication: boot secondary CPUs with appropriate TTBR1 Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 13/17] arm64: text replication: update cnp support Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 14/17] arm64: text replication: setup page tables for copied kernel Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 15/17] arm64: text replication: include most of read-only data as well Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 16/17] arm64: text replication: early kernel option to enable replication Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-05-30 14:05 ` [PATCH RFC 17/17] arm64: text replication: add Kconfig Russell King (Oracle)
2023-05-30 14:05 ` Russell King (Oracle)
2023-06-05 9:05 ` [PATCH RFC 00/17] arm64 kernel text replication Russell King (Oracle)
2023-06-05 9:05 ` Russell King (Oracle)
2023-06-05 13:46 ` Mark Rutland
2023-06-05 13:46 ` Mark Rutland
2023-06-23 15:24 ` Ard Biesheuvel
2023-06-23 15:24 ` Ard Biesheuvel
2023-06-23 15:34 ` Russell King (Oracle)
2023-06-23 15:34 ` Russell King (Oracle)
2023-06-23 15:54 ` Marc Zyngier
2023-06-23 15:54 ` Marc Zyngier
2023-06-26 23:42 ` Lameter, Christopher
2023-06-26 23:42 ` Lameter, Christopher
2023-06-27 8:02 ` Marc Zyngier
2023-06-27 8:02 ` Marc Zyngier
2023-06-23 16:37 ` Marc Zyngier
2023-06-23 16:37 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZIc0Glqg2HalujXJ@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.