public inbox for linux-hardening@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ard Biesheuvel" <ardb@kernel.org>
To: "Borislav Petkov" <bp@alien8.de>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Josh Poimboeuf" <jpoimboe@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Kees Cook" <kees@kernel.org>, "Uros Bizjak" <ubizjak@gmail.com>,
	"Brian Gerst" <brgerst@gmail.com>,
	linux-hardening@vger.kernel.org
Subject: Re: [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping
Date: Mon, 09 Mar 2026 15:11:19 +0100	[thread overview]
Message-ID: <a29fa2af-a894-40bd-923d-51115dca7940@app.fastmail.com> (raw)
In-Reply-To: <20260306190729.GMaasl8VFJh31kS3mi@fat_crate.local>



On Fri, 6 Mar 2026, at 20:07, Borislav Petkov wrote:
> On Thu, Jan 08, 2026 at 09:25:30AM +0000, Ard Biesheuvel wrote:
>> The primary mapping of the kernel image is made using huge pages where
>> possible, mostly to minimize TLB pressure (Only the entry text section
>> requires alignment to 2 MiB). This involves some rounding and padding of
>> the .text and .rodata sections, resulting in gaps.  These gaps are
>> smaller than a huge page, and are remapped using different permissions,
>> resulting in fragmentation of the huge page mappings at the edges of
>> those regions.
>> 
>> Similarly, there is a gap between .data and .bss, where the init text
>> and data regions reside. This means that the end of the .data region and
>> the start of the .bss region are not covered by huge page mappings
>> either, even though both regions use the same permissions (RW+NX).
>> 
>> Improve the situation, by placing .data and .bss adjacently in the
>> linker map, and putting the init text and data regions after .rodata,
>> taking the place of the rodata/data gap. This results in one fewer gap,
>> and a more efficient mapping of the .data and .bss regions.
>> 
>> To preserve the x86_64 ELF layout with PT_LOAD regions aligned to 2 MiB,
>> start the second ELF segment at .init.data and align it to 2 MiB.  The
>> resulting padding will be covered by the init region and will be freed
>> along with it after boot.
>> 
>> defconfig + Clang 19:
>> 
>> Before:
>> 
>>   0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
>>   0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
>>   0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE  GLB NX pmd
>>   0xffffffff83400000-0xffffffff83800000     4M  RW       GLB NX pte
>> 
>> After:
>> 
>>   0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
>>   0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
>>   0xffffffff82c00000-0xffffffff82e00000     2M  RW  PSE  GLB NX pmd
>>   0xffffffff82e00000-0xffffffff83000000     2M  RW       GLB NX pte
>>   0xffffffff83000000-0xffffffff83800000     8M  RW  PSE  GLB NX pmd
>> 
>> With the gaps removed/unmapped (pti=on)
>> 
>> Before:
>> 
>>   0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
>>   0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K                  pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
>>   0xffffffff82b40000-0xffffffff82c00000   768K                  pte
>>   0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE      NX pmd
>>   0xffffffff83400000-0xffffffff8342a000   168K  RW           NX pte
>>   0xffffffff8342a000-0xffffffff836f3000  2852K                  pte
>>   0xffffffff836f3000-0xffffffff83800000  1076K  RW           NX pte
>> 
>> After:
>> 
>>   0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
>>   0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K                  pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
>>   0xffffffff82b40000-0xffffffff82e3d000  3060K                  pte
>>   0xffffffff82e3d000-0xffffffff83000000  1804K  RW           NX pte
>>   0xffffffff83000000-0xffffffff83800000     8M  RW  PSE      NX pmd
>> 
>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> ---
>>  arch/x86/kernel/vmlinux.lds.S | 91 +++++++++++---------
>>  arch/x86/mm/init_64.c         |  5 +-
>>  arch/x86/mm/pat/set_memory.c  |  2 +-
>>  3 files changed, 52 insertions(+), 46 deletions(-)
>
> I guess we could do this - I don't see why not... we'll have to take it for
> a longer spin tho.
>
>> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>> index 3a24a3fc55f5..1dee2987c42b 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -61,12 +61,15 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
>>  #define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
>>  
>>  #define X86_ALIGN_RODATA_END					\
>> -		. = ALIGN(HPAGE_SIZE);				\
>> -		__end_rodata_hpage_align = .;			\
>
> $ git grep __end_rodata_hpage_align
> arch/x86/include/asm/sections.h:13:extern char __end_rodata_hpage_align[];
> arch/x86/tools/relocs.c:93:     "__end_rodata_hpage_align|"
>
> I guess you wanna remove those too and say that that marker is unused. Better
> yet do that in a pre-patch.
>

Indeed. When __end_rodata_hpage_align exists, it is always equal to __end_rodata_aligned, so it can just be dropped entirely.


  reply	other threads:[~2026-03-09 14:11 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section Ard Biesheuvel
2026-01-22 13:08   ` Borislav Petkov
2026-01-22 13:48     ` Ard Biesheuvel
2026-01-22 13:58       ` Borislav Petkov
2026-01-22 14:09         ` Ard Biesheuvel
2026-01-22 14:16           ` Borislav Petkov
2026-01-22 14:20             ` Ard Biesheuvel
2026-01-22 14:25               ` Borislav Petkov
2026-01-08  9:25 ` [RFC/RFT PATCH 02/19] x86/sev: Don't emit BSS_DECRYPT section unless it is in use Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping Ard Biesheuvel
2026-03-06 19:07   ` Borislav Petkov
2026-03-09 14:11     ` Ard Biesheuvel [this message]
2026-01-08  9:25 ` [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable Ard Biesheuvel
2026-01-12  4:01   ` H. Peter Anvin
2026-01-12 10:47     ` David Laight
2026-01-12 12:06       ` H. Peter Anvin
2026-01-08  9:25 ` [RFC/RFT PATCH 05/19] x86/efistub: Simplify early remapping of kernel text Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 06/19] alloc_tag: Use __ prefixed ELF section names Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 07/19] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel Ard Biesheuvel
2026-01-09 21:34   ` Jan Engelhardt
2026-01-09 22:07     ` Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 09/19] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
2026-01-20 17:04   ` Sean Christopherson
2026-01-20 19:43     ` David Laight
2026-01-20 20:54       ` Ard Biesheuvel
2026-01-20 22:00         ` David Laight
2026-01-08  9:25 ` [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address Ard Biesheuvel
2026-01-08 12:08   ` David Laight
2026-01-08 12:10     ` Ard Biesheuvel
2026-01-08 12:19       ` Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 12/19] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 13/19] x86/entry_64: " Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 14/19] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
2026-01-09  5:01   ` Brian Gerst
2026-01-09  7:59     ` Ard Biesheuvel
2026-01-09 11:46       ` Brian Gerst
2026-01-09 12:09         ` Ard Biesheuvel
2026-01-09 12:10           ` Ard Biesheuvel
2026-01-09 12:51             ` Brian Gerst
2026-01-08  9:25 ` [RFC/RFT PATCH 16/19] x86/kexec: Use 64-bit wide absolute reference from relocated code Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 17/19] x86/head64: Avoid absolute references in startup asm Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 18/19] x86/boot: Implement support for RELA/RELR/REL runtime relocations Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 19/19] x86/kernel: Switch to PIE linking for the relocatable kernel Ard Biesheuvel
2026-01-08 16:35 ` [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Alexander Lobakin
2026-01-09  0:36 ` H. Peter Anvin
2026-01-09  9:21   ` Ard Biesheuvel
2026-01-14 18:16     ` Kees Cook
2026-01-20 20:45       ` H. Peter Anvin
2026-01-21  8:56         ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a29fa2af-a894-40bd-923d-51115dca7940@app.fastmail.com \
    --to=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=ubizjak@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox