public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
@ 2026-01-08  9:25 Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section Ard Biesheuvel
                   ` (20 more replies)
  0 siblings, 21 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

This series is a follow-up to a series I sent a bit more than a year
ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
for further hardening measures, such as fg-kaslr [1], as well as further
harmonization of the boot protocols between architectures [2].

The main sticking point is the fact that PIE linking on x86_64 requires
PIE codegen, and that was shot down before on the basis that
a) GOTs in fully linked binaries are stupid
b) the code size increase would be prohibitive
c) the performance would suffer.

This series implements PIE codegen without permitting the use of GOT
slots. The code size increase is between 0.2% (clang) and 0.5% (gcc),
and I could not identify any performance regressions (using hackbench)
on various different micro-architectures that I tried it on.
(Suggestions for other benchmarks/test cases are welcome)

So now that we have some actual numbers, I would like to try and revisit
this discussion, and get a conclusion on whether this is really a
non-starter. Note that only the KASLR kernel would rely on this, and
disabling CONFIG_RANDOMIZE_BASE will revert to the current situation
(provided that patch #4 is applied)

Some minor asm tweaks are needed too (patches #9 - #17), but those all
seem uncontroversial to me. 

The first 5 patches are general cleanup, and could be taken into
consideration independently of the discussion around PIC codegen.

[1] There have been a few attempts at landing fine grained KASLR for
x86, but the main problem is that it was tied to the x86 relocation
format, which deviates from how fully linked relocatable ELF binaries
are generally constructed (using PIE). Implementing fgkaslr in the ELF
domain would make it suitable for other architectures too, as well as
other use cases (bare metal or hosted) where no dynamic linking is
performed (firmware, hypervisors). In order to implement this properly,
i.e., with debugging support etc, it needs support from the tooling
side. (Fine grained KASLR in combination with execute-only code mappings
makes it extremely difficult for an attacker to subvert the control flow
in the kernel in a way that can be meaningfully exploited).

[2] EFI zboot is already used by various architectures that have no
decompressor stage at all (arm64, RISC-V, LoongArch), and this format
can be combined with an ELF payload too. EFI zboot accommodates non-EFI
boot chains by describing the size, offset, payload type and compression
type in its header, so that it can be extracted and booted by other
means.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: linux-hardening@vger.kernel.org

Ard Biesheuvel (19):
  x86/idt: Move idt_table to __ro_after_init section
  x86/sev: Don't emit BSS_DECRYPT section unless it is in use
  x86: Combine .data with .bss in kernel mapping
  x86: Make the 64-bit bzImage always physically relocatable
  x86/efistub: Simplify early remapping of kernel text
  alloc_tag: Use __ prefixed ELF section names
  tools/objtool: Treat indirect ftrace calls as direct calls
  x86: Use PIE codegen for the relocatable 64-bit kernel
  x86/pm-trace: Use RIP-relative accesses for .tracedata
  x86/kvm: Use RIP-relative addressing
  x86/rethook: Use RIP-relative reference for fake return address
  x86/sync_core: Use RIP-relative addressing
  x86/entry_64: Use RIP-relative addressing
  x86/hibernate: Prefer RIP-relative accesses
  x64/acpi: Use PIC-compatible references in wakeup_64.S
  x86/kexec: Use 64-bit wide absolute reference from relocated code
  x86/head64: Avoid absolute references in startup asm
  x86/boot: Implement support for RELA/RELR/REL runtime relocations
  x86/kernel: Switch to PIE linking for the relocatable kernel

 arch/x86/Kconfig                        |  45 ++++---
 arch/x86/Makefile                       |  24 +++-
 arch/x86/boot/Makefile                  |   1 +
 arch/x86/boot/compressed/Makefile       |   2 +-
 arch/x86/boot/compressed/head_64.S      |   4 -
 arch/x86/boot/compressed/misc.c         |  85 +++++++++++--
 arch/x86/boot/header.S                  |   8 +-
 arch/x86/entry/calling.h                |   9 +-
 arch/x86/entry/entry_64.S               |  14 +--
 arch/x86/entry/vdso/Makefile            |   1 +
 arch/x86/include/asm/boot.h             |   2 -
 arch/x86/include/asm/pm-trace.h         |   4 +-
 arch/x86/include/asm/sync_core.h        |   3 +-
 arch/x86/kernel/acpi/wakeup_64.S        |  11 +-
 arch/x86/kernel/head_64.S               |  17 +--
 arch/x86/kernel/idt.c                   |   5 +-
 arch/x86/kernel/kvm.c                   |   5 +-
 arch/x86/kernel/relocate_kernel_64.S    |   2 +-
 arch/x86/kernel/rethook.c               |   6 +-
 arch/x86/kernel/vmlinux.lds.S           | 132 ++++++++++++--------
 arch/x86/mm/init_64.c                   |   5 +-
 arch/x86/mm/pat/set_memory.c            |   2 +-
 arch/x86/power/hibernate_asm_64.S       |   4 +-
 arch/x86/realmode/rm/Makefile           |   1 +
 drivers/base/power/trace.c              |   6 +-
 drivers/firmware/efi/libstub/x86-stub.c |   4 +-
 include/asm-generic/codetag.lds.h       |  14 ++-
 include/asm-generic/vmlinux.lds.h       |   1 +
 include/linux/alloc_tag.h               |  11 +-
 include/linux/hidden.h                  |   2 +
 include/uapi/linux/elf.h                |   3 +
 lib/alloc_tag.c                         |   6 +-
 tools/objtool/check.c                   |  32 ++++-
 33 files changed, 315 insertions(+), 156 deletions(-)


base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
-- 
2.47.3


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-22 13:08   ` Borislav Petkov
  2026-01-08  9:25 ` [RFC/RFT PATCH 02/19] x86/sev: Don't emit BSS_DECRYPT section unless it is in use Ard Biesheuvel
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Currently, idt_table is allocated as page-aligned .bss, and remapped
read-only after init. This breaks a 2 MiB large page into 4k page
mappings, which defeats some of the effort done at boot to map the
kernel image using large pages, for improved TLB efficiency.

Mark this allocation as __ro_after_init instead, so it will be made
read-only automatically after boot, without breaking up large page
mappings.

This also fixes a latent bug on i386, where the size of idt_table is
less than a page, and so remapping it read-only could potentially affect
other read-write variables too, if those are not page-aligned as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/idt.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index f445bec516a0..d6da25d7964f 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
 };
 
 /* Must be page-aligned because the real IDT is used in the cpu entry area */
-static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
+static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
 
 static struct desc_ptr idt_descr __ro_after_init = {
 	.size		= IDT_TABLE_SIZE - 1,
@@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
 	idt_map_in_cea();
 	load_idt(&idt_descr);
 
-	/* Make the IDT table read only */
-	set_memory_ro((unsigned long)&idt_table, 1);
-
 	idt_setup_done = true;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 02/19] x86/sev: Don't emit BSS_DECRYPT section unless it is in use
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-31 14:09   ` [tip: x86/sev] x86/sev: Don't emit BSS_DECRYPTED " tip-bot2 for Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping Ard Biesheuvel
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

The BSS_DECRYPT section that gets emitted into .bss will be empty if
CONFIG_AMD_MEM_ENCRYPT is not defined. However, due to the fact that it
is injected into .bss rather than emitted as a separate section, the
2 MiB alignment that it specifies is still taken into account
unconditionally, pushing .bss out to the next 2 MiB boundary, leaving a
gap that is never freed.

So only emit a non-empty BSS_DECRYPT section if it is going to be used.
In that case, it would still be nice to free the padding, but that is
left for later.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/vmlinux.lds.S | 21 +++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index d7af4a64c211..3a24a3fc55f5 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -67,7 +67,18 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
+#else
+
+#define X86_ALIGN_RODATA_BEGIN
+#define X86_ALIGN_RODATA_END					\
+		. = ALIGN(PAGE_SIZE);				\
+		__end_rodata_aligned = .;
 
+#define ALIGN_ENTRY_TEXT_BEGIN
+#define ALIGN_ENTRY_TEXT_END
+#endif
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 /*
  * This section contains data which will be mapped as decrypted. Memory
  * encryption operates on a page basis. Make this section PMD-aligned
@@ -88,17 +99,9 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 	__pi___end_bss_decrypted = .;				\
 
 #else
-
-#define X86_ALIGN_RODATA_BEGIN
-#define X86_ALIGN_RODATA_END					\
-		. = ALIGN(PAGE_SIZE);				\
-		__end_rodata_aligned = .;
-
-#define ALIGN_ENTRY_TEXT_BEGIN
-#define ALIGN_ENTRY_TEXT_END
 #define BSS_DECRYPTED
-
 #endif
+
 #if defined(CONFIG_X86_64) && defined(CONFIG_KEXEC_CORE)
 #define KEXEC_RELOCATE_KERNEL					\
 	. = ALIGN(0x100);					\
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 02/19] x86/sev: Don't emit BSS_DECRYPT section unless it is in use Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-03-06 19:07   ` Borislav Petkov
  2026-01-08  9:25 ` [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable Ard Biesheuvel
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

The primary mapping of the kernel image is made using huge pages where
possible, mostly to minimize TLB pressure (Only the entry text section
requires alignment to 2 MiB). This involves some rounding and padding of
the .text and .rodata sections, resulting in gaps.  These gaps are
smaller than a huge page, and are remapped using different permissions,
resulting in fragmentation of the huge page mappings at the edges of
those regions.

Similarly, there is a gap between .data and .bss, where the init text
and data regions reside. This means that the end of the .data region and
the start of the .bss region are not covered by huge page mappings
either, even though both regions use the same permissions (RW+NX).

Improve the situation, by placing .data and .bss adjacently in the
linker map, and putting the init text and data regions after .rodata,
taking the place of the rodata/data gap. This results in one fewer gap,
and a more efficient mapping of the .data and .bss regions.

To preserve the x86_64 ELF layout with PT_LOAD regions aligned to 2 MiB,
start the second ELF segment at .init.data and align it to 2 MiB.  The
resulting padding will be covered by the init region and will be freed
along with it after boot.

defconfig + Clang 19:

Before:

  0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
  0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
  0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
  0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
  0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
  0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
  0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE  GLB NX pmd
  0xffffffff83400000-0xffffffff83800000     4M  RW       GLB NX pte

After:

  0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
  0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
  0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
  0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
  0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
  0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
  0xffffffff82c00000-0xffffffff82e00000     2M  RW  PSE  GLB NX pmd
  0xffffffff82e00000-0xffffffff83000000     2M  RW       GLB NX pte
  0xffffffff83000000-0xffffffff83800000     8M  RW  PSE  GLB NX pmd

With the gaps removed/unmapped (pti=on)

Before:

  0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
  0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
  0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
  0xffffffff8231c000-0xffffffff82400000   912K                  pte
  0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
  0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
  0xffffffff82b40000-0xffffffff82c00000   768K                  pte
  0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE      NX pmd
  0xffffffff83400000-0xffffffff8342a000   168K  RW           NX pte
  0xffffffff8342a000-0xffffffff836f3000  2852K                  pte
  0xffffffff836f3000-0xffffffff83800000  1076K  RW           NX pte

After:

  0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
  0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
  0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
  0xffffffff8231c000-0xffffffff82400000   912K                  pte
  0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
  0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
  0xffffffff82b40000-0xffffffff82e3d000  3060K                  pte
  0xffffffff82e3d000-0xffffffff83000000  1804K  RW           NX pte
  0xffffffff83000000-0xffffffff83800000     8M  RW  PSE      NX pmd

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/vmlinux.lds.S | 91 +++++++++++---------
 arch/x86/mm/init_64.c         |  5 +-
 arch/x86/mm/pat/set_memory.c  |  2 +-
 3 files changed, 52 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3a24a3fc55f5..1dee2987c42b 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -61,12 +61,15 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 #define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
 
 #define X86_ALIGN_RODATA_END					\
-		. = ALIGN(HPAGE_SIZE);				\
-		__end_rodata_hpage_align = .;			\
-		__end_rodata_aligned = .;
+		. = ALIGN(PAGE_SIZE);				\
+		__end_rodata_aligned = ALIGN(HPAGE_SIZE);
 
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
+
+#define DATA_SEGMENT_START					\
+	. = ALIGN(HPAGE_SIZE);					\
+	__data_segment_start = .;
 #else
 
 #define X86_ALIGN_RODATA_BEGIN
@@ -76,9 +79,14 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 
 #define ALIGN_ENTRY_TEXT_BEGIN
 #define ALIGN_ENTRY_TEXT_END
+
+#define DATA_SEGMENT_START					\
+	. = ALIGN(PAGE_SIZE);					\
+	__data_segment_start = .;
 #endif
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+
 /*
  * This section contains data which will be mapped as decrypted. Memory
  * encryption operates on a page basis. Make this section PMD-aligned
@@ -171,43 +179,6 @@ SECTIONS
 	RO_DATA(PAGE_SIZE)
 	X86_ALIGN_RODATA_END
 
-	/* Data */
-	.data : AT(ADDR(.data) - LOAD_OFFSET) {
-		/* Start of data section */
-		_sdata = .;
-
-		/* init_task */
-		INIT_TASK_DATA(THREAD_SIZE)
-
-		/* equivalent to task_pt_regs(&init_task) */
-		__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
-
-#ifdef CONFIG_X86_32
-		/* 32 bit has nosave before _edata */
-		NOSAVE_DATA
-#endif
-
-		PAGE_ALIGNED_DATA(PAGE_SIZE)
-
-		CACHE_HOT_DATA(L1_CACHE_BYTES)
-
-		CACHELINE_ALIGNED_DATA(L1_CACHE_BYTES)
-
-		DATA_DATA
-		CONSTRUCTORS
-		KEXEC_RELOCATE_KERNEL
-
-		/* rarely changed data like cpu maps */
-		READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
-
-		/* End of data section */
-		_edata = .;
-	} :data
-
-	BUG_TABLE
-
-	ORC_UNWIND_TABLE
-
 	/* Init code and data - will be freed after init */
 	. = ALIGN(PAGE_SIZE);
 	.init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) {
@@ -229,7 +200,8 @@ SECTIONS
 		__inittext_end = .;
 	}
 
-	INIT_DATA_SECTION(16)
+	DATA_SEGMENT_START
+	INIT_DATA_SECTION(16) :data
 
 	.x86_cpu_dev.init : AT(ADDR(.x86_cpu_dev.init) - LOAD_OFFSET) {
 		__x86_cpu_dev_start = .;
@@ -358,6 +330,43 @@ SECTIONS
 		__smp_locks_end = .;
 	}
 
+	/* Data */
+	.data : AT(ADDR(.data) - LOAD_OFFSET) {
+		/* Start of data section */
+		_sdata = .;
+
+		/* init_task */
+		INIT_TASK_DATA(THREAD_SIZE)
+
+		/* equivalent to task_pt_regs(&init_task) */
+		__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;
+
+#ifdef CONFIG_X86_32
+		/* 32 bit has nosave before _edata */
+		NOSAVE_DATA
+#endif
+
+		PAGE_ALIGNED_DATA(PAGE_SIZE)
+
+		CACHE_HOT_DATA(L1_CACHE_BYTES)
+
+		CACHELINE_ALIGNED_DATA(L1_CACHE_BYTES)
+
+		DATA_DATA
+		CONSTRUCTORS
+		KEXEC_RELOCATE_KERNEL
+
+		/* rarely changed data like cpu maps */
+		READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
+
+		/* End of data section */
+		_edata = .;
+	}
+
+	BUG_TABLE
+
+	ORC_UNWIND_TABLE
+
 #ifdef CONFIG_X86_64
 	.data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) {
 		NOSAVE_DATA
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 9983017ecbe0..6c2120dd5607 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1397,9 +1397,8 @@ void mark_rodata_ro(void)
 {
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long rodata_start = PFN_ALIGN(__start_rodata);
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end = (unsigned long)__end_rodata;
 	unsigned long text_end = PFN_ALIGN(_etext);
-	unsigned long rodata_end = PFN_ALIGN(__end_rodata);
 	unsigned long all_end;
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
@@ -1435,8 +1434,6 @@ void mark_rodata_ro(void)
 
 	free_kernel_image_pages("unused kernel image (text/rodata gap)",
 				(void *)text_end, (void *)rodata_start);
-	free_kernel_image_pages("unused kernel image (rodata/data gap)",
-				(void *)rodata_end, (void *)_sdata);
 }
 
 /*
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 6c6eb486f7a6..ad4d55f2413b 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -554,7 +554,7 @@ static pgprotval_t protect_kernel_text(unsigned long start, unsigned long end)
 static pgprotval_t protect_kernel_text_ro(unsigned long start,
 					  unsigned long end)
 {
-	unsigned long t_end = (unsigned long)__end_rodata_hpage_align - 1;
+	unsigned long t_end = (unsigned long)__end_rodata - 1;
 	unsigned long t_start = (unsigned long)_text;
 	unsigned int level;
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-12  4:01   ` H. Peter Anvin
  2026-01-08  9:25 ` [RFC/RFT PATCH 05/19] x86/efistub: Simplify early remapping of kernel text Ard Biesheuvel
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

On x86_64, the physical placement of the kernel is independent from its
mapping in the 'High Kernel Mapping' range. This means that even a
position dependent kernel built without boot-time relocation support can
run from any suitably aligned physical address, and there is no need to
make this behavior dependent on whether or not the kernel is virtually
relocatable.

On i386, the situation is different, given that the physical and virtual
load offsets must be equal, and so only a relocatable kernel can be
loaded at a physical address that deviates from its build-time default.

Clarify this in Kconfig and in the code, and advertise the 64-bit
bzImage as loadable at any physical offset regardless of whether
CONFIG_RELOCATABLE is set. In practice, this makes little difference,
given that it defaults to 'y' and is a prerequisite for EFI_STUB and
RANDOMIZE_BASE, but it will help with some future refactoring of the
relocation code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/Kconfig                   | 40 ++++++++++++--------
 arch/x86/boot/compressed/head_64.S |  4 --
 arch/x86/boot/compressed/misc.c    |  8 ++--
 arch/x86/boot/header.S             |  8 +---
 4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f859..bf51e17d5813 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1931,7 +1931,7 @@ config EFI
 config EFI_STUB
 	bool "EFI stub support"
 	depends on EFI
-	select RELOCATABLE
+	select RELOCATABLE if X86_32
 	help
 	  This kernel feature allows a bzImage to be loaded directly
 	  by EFI firmware without the use of a bootloader.
@@ -2028,8 +2028,9 @@ config PHYSICAL_START
 	help
 	  This gives the physical address where the kernel is loaded.
 
-	  If the kernel is not relocatable (CONFIG_RELOCATABLE=n) then bzImage
-	  will decompress itself to above physical address and run from there.
+	  If the kernel is not relocatable (CONFIG_RELOCATABLE=n) and built for
+	  i386, then the bzImage will decompress itself to the above physical
+	  address and run from there.
 	  Otherwise, bzImage will run from the address where it has been loaded
 	  by the boot loader. The only exception is if it is loaded below the
 	  above physical address, in which case it will relocate itself there.
@@ -2064,16 +2065,22 @@ config PHYSICAL_START
 	  Don't change this unless you know what you are doing.
 
 config RELOCATABLE
-	bool "Build a relocatable kernel"
-	default y
+	bool "Build a relocatable kernel" if X86_32
+	default X86_32
 	help
-	  This builds a kernel image that retains relocation information
-	  so it can be loaded someplace besides the default 1MB.
+	  This builds a kernel image that retains relocation information so it
+	  can be placed someplace besides the default PAGE_OFFSET + 1MB. This
+	  is a prerequisite for KASLR.
 	  The relocations tend to make the kernel binary about 10% larger,
 	  but are discarded at runtime.
 
-	  One use is for the kexec on panic case where the recovery kernel
-	  must live at a different physical address than the primary
+	  On i386, where the virtual and physical load offset of the kernel
+	  must be equal, this also allows the kernel image to be placed at a
+	  physical load address that differs from the compile time default. On
+	  x86_64, this is always permitted.
+
+	  One use is for the kexec on panic case on i386, where the recovery
+	  kernel must live at a different physical address than the primary
 	  kernel.
 
 	  Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
@@ -2082,7 +2089,7 @@ config RELOCATABLE
 
 config RANDOMIZE_BASE
 	bool "Randomize the address of the kernel image (KASLR)"
-	depends on RELOCATABLE
+	select RELOCATABLE
 	default y
 	help
 	  In support of Kernel Address Space Layout Randomization (KASLR),
@@ -2118,7 +2125,7 @@ config RANDOMIZE_BASE
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
-	depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
+	depends on RELOCATABLE
 	select ARCH_VMLINUX_NEEDS_RELOCS
 
 config PHYSICAL_ALIGN
@@ -2131,12 +2138,13 @@ config PHYSICAL_ALIGN
 	  where kernel is loaded and run from. Kernel is compiled for an
 	  address which meets above alignment restriction.
 
-	  If bootloader loads the kernel at a non-aligned address and
-	  CONFIG_RELOCATABLE is set, kernel will move itself to nearest
-	  address aligned to above value and run from there.
+	  If the bootloader loads the kernel at a non-aligned address and it
+	  is built for x86_64 or CONFIG_RELOCATABLE is set, the kernel will
+	  move itself to the nearest address aligned to above value and run
+	  from there.
 
-	  If bootloader loads the kernel at a non-aligned address and
-	  CONFIG_RELOCATABLE is not set, kernel will ignore the run time
+	  If the bootloader loads the i386 kernel at a non-aligned address and
+	  CONFIG_RELOCATABLE is not set, the kernel will ignore the run time
 	  load address and decompress itself to the address it has been
 	  compiled for and run from there. The address for which kernel is
 	  compiled already meets above alignment restrictions. Hence the
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d9dab940ff62..8a964a4d45c2 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -143,7 +143,6 @@ SYM_FUNC_START(startup_32)
  * for safe in-place decompression.
  */
 
-#ifdef CONFIG_RELOCATABLE
 	movl	%ebp, %ebx
 	movl	BP_kernel_alignment(%esi), %eax
 	decl	%eax
@@ -152,7 +151,6 @@ SYM_FUNC_START(startup_32)
 	andl	%eax, %ebx
 	cmpl	$LOAD_PHYSICAL_ADDR, %ebx
 	jae	1f
-#endif
 	movl	$LOAD_PHYSICAL_ADDR, %ebx
 1:
 
@@ -312,7 +310,6 @@ SYM_CODE_START(startup_64)
 	 */
 
 	/* Start with the delta to where the kernel will run at. */
-#ifdef CONFIG_RELOCATABLE
 	leaq	startup_32(%rip) /* - $startup_32 */, %rbp
 	movl	BP_kernel_alignment(%rsi), %eax
 	decl	%eax
@@ -321,7 +318,6 @@ SYM_CODE_START(startup_64)
 	andq	%rax, %rbp
 	cmpq	$LOAD_PHYSICAL_ADDR, %rbp
 	jae	1f
-#endif
 	movq	$LOAD_PHYSICAL_ADDR, %rbp
 1:
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 0f41ca0e52c0..d37569e7ee10 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -314,12 +314,10 @@ static size_t parse_elf(void *output)
 			if ((phdr->p_align % 0x200000) != 0)
 				error("Alignment of LOAD segment isn't multiple of 2MB");
 #endif
-#ifdef CONFIG_RELOCATABLE
-			dest = output;
-			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
-#else
 			dest = (void *)(phdr->p_paddr);
-#endif
+			if (IS_ENABLED(CONFIG_X86_64) ||
+			    IS_ENABLED(CONFIG_RELOCATABLE))
+				dest += (unsigned long)output - LOAD_PHYSICAL_ADDR;
 			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9bea5a1e2c52..b72e6055e103 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -332,7 +332,7 @@ initrd_addr_max: .long 0x7fffffff
 kernel_alignment:  .long CONFIG_PHYSICAL_ALIGN	#physical addr alignment
 						#required for protected mode
 						#kernel
-#ifdef CONFIG_RELOCATABLE
+#if defined(CONFIG_RELOCATABLE) || defined(CONFIG_X86_64)
 relocatable_kernel:    .byte 1
 #else
 relocatable_kernel:    .byte 0
@@ -342,14 +342,10 @@ min_alignment:		.byte MIN_KERNEL_ALIGN_LG2	# minimum alignment
 xloadflags:
 #ifdef CONFIG_X86_64
 # define XLF0 XLF_KERNEL_64			/* 64-bit kernel */
-#else
-# define XLF0 0
-#endif
-
-#if defined(CONFIG_RELOCATABLE) && defined(CONFIG_X86_64)
    /* kernel/boot_param/ramdisk could be loaded above 4g */
 # define XLF1 XLF_CAN_BE_LOADED_ABOVE_4G
 #else
+# define XLF0 0
 # define XLF1 0
 #endif
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 05/19] x86/efistub: Simplify early remapping of kernel text
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 06/19] alloc_tag: Use __ prefixed ELF section names Ard Biesheuvel
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Now that the kernel's .text, .rodata and .inittext are all covered by a
single ELF segment, there is no need to remap .inittext separately.
Instead, remap the entire region in a single call.

This remapping is needed because the EFI stub hands over to the core
kernel while running in long mode, using the page tables provided by the
firmware.  Recent so-called 'MS secured core' (tm) PCs are more strict
when it comes to separating writable from executable mappings, and so
for compatibility with such systems, any code that may be callable
during early boot (i.e., before the kernel switches to its own page
tables) must be remapped executable explicitly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/Makefile       | 2 +-
 arch/x86/boot/compressed/misc.c         | 4 +---
 arch/x86/include/asm/boot.h             | 2 --
 arch/x86/kernel/vmlinux.lds.S           | 2 --
 drivers/firmware/efi/libstub/x86-stub.c | 4 +---
 5 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 68f9d7a1683b..bc071bdcd11e 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -76,7 +76,7 @@ LDFLAGS_vmlinux += -T
 hostprogs	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__start_rodata\|_sinittext\|__inittext_end\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__data_segment_start\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index d37569e7ee10..1ea419cf88fe 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -329,9 +329,7 @@ static size_t parse_elf(void *output)
 	return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
 }
 
-const unsigned long kernel_text_size = VO___start_rodata - VO__text;
-const unsigned long kernel_inittext_offset = VO__sinittext - VO__text;
-const unsigned long kernel_inittext_size = VO___inittext_end - VO__sinittext;
+const unsigned long kernel_text_size = VO___data_segment_start - VO__text;
 const unsigned long kernel_total_size = VO__end - VO__text;
 
 static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index f7b67cb73915..02b23aa78955 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -82,8 +82,6 @@
 #ifndef __ASSEMBLER__
 extern unsigned int output_len;
 extern const unsigned long kernel_text_size;
-extern const unsigned long kernel_inittext_offset;
-extern const unsigned long kernel_inittext_size;
 extern const unsigned long kernel_total_size;
 
 unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1dee2987c42b..6772fe9a9957 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -196,8 +196,6 @@ SECTIONS
 	 */
 	.altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
 		*(.altinstr_aux)
-		. = ALIGN(PAGE_SIZE);
-		__inittext_end = .;
 	}
 
 	DATA_SEGMENT_START
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index cef32e2c82d8..ffe30ef73fda 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -890,9 +890,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry,
 
 	*kernel_entry = addr + entry;
 
-	return efi_adjust_memory_range_protection(addr, kernel_text_size) ?:
-	       efi_adjust_memory_range_protection(addr + kernel_inittext_offset,
-						  kernel_inittext_size);
+	return efi_adjust_memory_range_protection(addr, kernel_text_size);
 }
 
 static void __noreturn enter_kernel(unsigned long kernel_addr,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 06/19] alloc_tag: Use __ prefixed ELF section names
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 05/19] x86/efistub: Simplify early remapping of kernel text Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 07/19] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

The compiler will emit static relocations related to a section .foo into
a separate section called .rela.foo, i.e., it just appends the section
name to the string ".rela"

When section names start with . or __, this results in section names
that are correctly matched by the various pattern rules in the various
linker scripts across the tree.

Without any such leading delimiter, it may lead to spurious warnings
such as

  >> ld: warning: orphan section `.relaalloc_tags' from `init/main.o' being placed in section `.rela.dyn'
     ld: warning: dot moved backwards before `.rela.dyn'
     ld: .tmp_vmlinux1: section `.rela.dyn' can't be allocated in segment 1

Fix this by renaming the section to __alloc_tags. While at it, tweak the
headers so that the definition appears only a single time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/asm-generic/codetag.lds.h | 14 +++++++++-----
 include/linux/alloc_tag.h         | 11 ++++++-----
 lib/alloc_tag.c                   |  6 +++---
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
index a14f4bdafdda..d7ff181862da 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -2,6 +2,10 @@
 #ifndef __ASM_GENERIC_CODETAG_LDS_H
 #define __ASM_GENERIC_CODETAG_LDS_H
 
+#include <linux/compiler_types.h>
+
+#define ALLOC_TAG_SECTION_NAME	__alloc_tags
+
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 #define IF_MEM_ALLOC_PROFILING(...) __VA_ARGS__
 #else
@@ -10,15 +14,15 @@
 
 #define SECTION_WITH_BOUNDARIES(_name)	\
 	. = ALIGN(8);			\
-	__start_##_name = .;		\
+	__PASTE(__start_, _name) = .;	\
 	KEEP(*(_name))			\
-	__stop_##_name = .;
+	__PASTE(__stop_, _name) = .;
 
 #define CODETAG_SECTIONS()		\
-	IF_MEM_ALLOC_PROFILING(SECTION_WITH_BOUNDARIES(alloc_tags))
+	IF_MEM_ALLOC_PROFILING(SECTION_WITH_BOUNDARIES(ALLOC_TAG_SECTION_NAME))
 
 #define MOD_SEPARATE_CODETAG_SECTION(_name)	\
-	.codetag.##_name : {			\
+	.codetag._name : {			\
 		SECTION_WITH_BOUNDARIES(_name)	\
 	}
 
@@ -28,6 +32,6 @@
  * unload them individually once unused.
  */
 #define MOD_SEPARATE_CODETAG_SECTIONS()		\
-	IF_MEM_ALLOC_PROFILING(MOD_SEPARATE_CODETAG_SECTION(alloc_tags))
+	IF_MEM_ALLOC_PROFILING(MOD_SEPARATE_CODETAG_SECTION(ALLOC_TAG_SECTION_NAME))
 
 #endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index d40ac39bfbe8..f39d85b05b8a 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -15,6 +15,9 @@
 #include <linux/static_key.h>
 #include <linux/irqflags.h>
 
+/* for ALLOC_TAG_SECTION_NAME */
+#include <asm-generic/codetag.lds.h>
+
 struct alloc_tag_counters {
 	u64 bytes;
 	u64 calls;
@@ -74,8 +77,6 @@ static inline void set_codetag_empty(union codetag_ref *ref)
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
-#define ALLOC_TAG_SECTION_NAME	"alloc_tags"
-
 struct codetag_bytes {
 	struct codetag *ct;
 	s64 bytes;
@@ -98,7 +99,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 
 #define DEFINE_ALLOC_TAG(_alloc_tag)						\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
-	__section(ALLOC_TAG_SECTION_NAME) = {					\
+	__section(__stringify(ALLOC_TAG_SECTION_NAME)) = {			\
 		.ct = CODE_TAG_INIT,						\
 		.counters = &_shared_alloc_tag };
 
@@ -108,7 +109,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 
 #define DEFINE_ALLOC_TAG(_alloc_tag)						\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
-	__section(ALLOC_TAG_SECTION_NAME) = {					\
+	__section(__stringify(ALLOC_TAG_SECTION_NAME)) = {			\
 		.ct = CODE_TAG_INIT,						\
 		.counters = NULL };
 
@@ -117,7 +118,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 #define DEFINE_ALLOC_TAG(_alloc_tag)						\
 	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
 	static struct alloc_tag _alloc_tag __used __aligned(8)			\
-	__section(ALLOC_TAG_SECTION_NAME) = {					\
+	__section(__stringify(ALLOC_TAG_SECTION_NAME)) = {			\
 		.ct = CODE_TAG_INIT,						\
 		.counters = &_alloc_tag_cntr };
 
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 27fee57a5c91..3eff7e912521 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -15,8 +15,8 @@
 
 #define ALLOCINFO_FILE_NAME		"allocinfo"
 #define MODULE_ALLOC_TAG_VMAP_SIZE	(100000UL * sizeof(struct alloc_tag))
-#define SECTION_START(NAME)		(CODETAG_SECTION_START_PREFIX NAME)
-#define SECTION_STOP(NAME)		(CODETAG_SECTION_STOP_PREFIX NAME)
+#define SECTION_START(NAME)		(CODETAG_SECTION_START_PREFIX #NAME)
+#define SECTION_STOP(NAME)		(CODETAG_SECTION_STOP_PREFIX #NAME)
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 static bool mem_profiling_support = true;
@@ -810,7 +810,7 @@ static inline void sysctl_init(void) {}
 static int __init alloc_tag_init(void)
 {
 	const struct codetag_type_desc desc = {
-		.section		= ALLOC_TAG_SECTION_NAME,
+		.section		= __stringify(ALLOC_TAG_SECTION_NAME),
 		.tag_size		= sizeof(struct alloc_tag),
 #ifdef CONFIG_MODULES
 		.needs_section_mem	= needs_section_mem,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 07/19] tools/objtool: Treat indirect ftrace calls as direct calls
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 06/19] alloc_tag: Use __ prefixed ELF section names Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel Ard Biesheuvel
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

In some cases, the compiler may rely on indirect calls using GOT slots
as memory operands to emit function calls. This leaves it up to the
linker to relax the call to a direct call if possible, i.e., if the
destination address is known at link time and in range, which may not be
the case when building shared libraries for user space.

On x86, this may happen when building in PIC mode with ftrace enabled,
and given that vmlinux is a fully linked binary, this relaxation is
always possible, and therefore mandatory per the x86_64 psABI.

This means that the indirect calls to __fentry__ that are observeable in
vmlinux.o will have been converted to direct calls in vmlinux, and can
be treated as such by objtool.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 tools/objtool/check.c | 32 ++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 3f7999317f4d..765f818af839 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1660,11 +1660,39 @@ static int add_call_destinations(struct objtool_file *file)
 
 	for_each_insn(file, insn) {
 		struct symbol *func = insn_func(insn);
-		if (insn->type != INSN_CALL)
+		if (insn->type != INSN_CALL &&
+		    insn->type != INSN_CALL_DYNAMIC)
 			continue;
 
 		reloc = insn_reloc(file, insn);
-		if (!reloc) {
+		if (insn->type == INSN_CALL_DYNAMIC) {
+			if (!reloc)
+				continue;
+
+			/*
+			 * GCC 13 and older on x86 will always emit the call to
+			 * __fentry__ using a relaxable GOT-based symbol
+			 * reference when operating in PIC mode, i.e.,
+			 *
+			 *   call   *0x0(%rip)
+			 *             R_X86_64_GOTPCRELX  __fentry__-0x4
+			 *
+			 * where it is left up to the linker to relax this into
+			 *
+			 *   call   __fentry__
+			 *   nop
+			 *
+			 * if __fentry__ turns out to be DSO local, which is
+			 * always the case for vmlinux. Given that this
+			 * relaxation is mandatory per the x86_64 psABI, these
+			 * calls can simply be treated as direct calls.
+			 */
+			if (arch_ftrace_match(reloc->sym->name)) {
+				insn->type = INSN_CALL;
+				add_call_dest(file, insn, reloc->sym, false);
+			}
+
+		} else if (!reloc) {
 			dest_off = arch_jump_destination(insn);
 			dest = find_call_destination(insn->sec, dest_off);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 07/19] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-09 21:34   ` Jan Engelhardt
  2026-01-08  9:25 ` [RFC/RFT PATCH 09/19] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

As an intermediate step towards enabling PIE linking for the x86_64
KASLR kernel, enable PIE codegen for all C and Rust objects that are
linked into the kernel proper. Add a Kconfig option RELOCATABLE_PIE for
this, depending on RELR support in the linker, as the relocation tables
will blow up the kernel image otherwise.

This results in a code size increase of between 0.2% (clang) and 0.5%
(gcc). Performance (hackbench) appears to be unaffected across several
different uarchs.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/Kconfig                  |  4 ++++
 arch/x86/Makefile                 | 19 ++++++++++++++++++-
 arch/x86/boot/Makefile            |  1 +
 arch/x86/boot/compressed/Makefile |  2 +-
 arch/x86/entry/vdso/Makefile      |  1 +
 arch/x86/realmode/rm/Makefile     |  1 +
 include/asm-generic/vmlinux.lds.h |  1 +
 include/linux/hidden.h            |  2 ++
 8 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bf51e17d5813..b3a64cfe04cf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2067,6 +2067,7 @@ config PHYSICAL_START
 config RELOCATABLE
 	bool "Build a relocatable kernel" if X86_32
 	default X86_32
+	select RELOCATABLE_PIE if TOOLS_SUPPORT_RELR
 	help
 	  This builds a kernel image that retains relocation information so it
 	  can be placed someplace besides the default PAGE_OFFSET + 1MB. This
@@ -2087,6 +2088,9 @@ config RELOCATABLE
 	  it has been loaded at and the compile time physical address
 	  (CONFIG_PHYSICAL_START) is used as the minimum location.
 
+config RELOCATABLE_PIE
+	bool
+
 config RANDOMIZE_BASE
 	bool "Randomize the address of the kernel image (KASLR)"
 	select RELOCATABLE
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1d403a3612ea..b211d6c950aa 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -89,6 +89,8 @@ ifdef CONFIG_CC_IS_GCC
 CC_FLAGS_FPU += -mhard-float
 endif
 
+rustflags-nojumptables := $(if $(call rustc-min-version,109300),-Cjump-tables=n,-Zno-jump-tables)
+
 ifeq ($(CONFIG_X86_KERNEL_IBT),y)
 #
 # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate
@@ -100,7 +102,7 @@ ifeq ($(CONFIG_X86_KERNEL_IBT),y)
 #   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816
 #
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch -fno-jump-tables)
-KBUILD_RUSTFLAGS += -Zcf-protection=branch $(if $(call rustc-min-version,109300),-Cjump-tables=n,-Zno-jump-tables)
+KBUILD_RUSTFLAGS += -Zcf-protection=branch $(rustflags-nojumptables)
 else
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
 endif
@@ -178,6 +180,21 @@ endif
         KBUILD_RUSTFLAGS += -Ccode-model=kernel
 
         percpu_seg := gs
+
+        pie-ccflags-$(CONFIG_CC_IS_GCC) += $(call cc-option.-mdirect-extern-access)
+        pie-ccflags-$(CONFIG_CC_IS_CLANG) += -fdirect-access-external-data
+
+        # objtool gets confused by unannotated PIC flavor jump tables
+        pie-ccflags-y += $(call cc-option,-fannotate-jump-tables,-fno-jump-tables)
+
+        pie-cflags-$(CONFIG_RELOCATABLE_PIE) := $(pie-ccflags-y) -fpie -mcmodel=small \
+                                -include $(srctree)/include/linux/hidden.h
+        pie-rustflags-$(CONFIG_RELOCATABLE_PIE) := -Crelocation-model=pie \
+                                -Ccode-model=small -Zdirect-access-external-data=yes \
+                                $(rustflags-nojumptables)
+
+        KBUILD_CFLAGS_KERNEL    += $(pie-cflags-y)
+        KBUILD_RUSTFLAGS_KERNEL += $(pie-rustflags-y)
 endif
 
 ifeq ($(CONFIG_STACKPROTECTOR),y)
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 3f9fb3698d66..491b3b2a9a02 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -55,6 +55,7 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS	+= $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
+KBUILD_CFLAGS_KERNEL :=
 
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
 
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index bc071bdcd11e..96099b5d1064 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -76,7 +76,7 @@ LDFLAGS_vmlinux += -T
 hostprogs	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__data_segment_start\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDdGRSTtVW] \(_text\|__data_segment_start\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index f247f5f5cb44..bf4221a0fc08 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -143,6 +143,7 @@ endif
 endif
 
 $(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
+$(obj)/vdso32.so.dbg: KBUILD_CFLAGS_KERNEL :=
 
 $(obj)/vdso32.so.dbg: $(obj)/vdso32/vdso32.lds $(vobjs32) FORCE
 	$(call if_changed,vdso_and_check)
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index a0fb39abc5c8..70bf0a26da91 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -67,3 +67,4 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 		   -I$(srctree)/arch/x86/boot
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS_KERNEL :=
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8ca130af301f..1782b6b87b2d 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -373,6 +373,7 @@
 	*(DATA_MAIN)							\
 	*(.data..decrypted)						\
 	*(.ref.data)							\
+	*(.data.rel*)							\
 	*(.data..shared_aligned) /* percpu related */			\
 	*(.data..unlikely)						\
 	__start_once = .;						\
diff --git a/include/linux/hidden.h b/include/linux/hidden.h
index 49a17b6b5962..2ad764c0ca18 100644
--- a/include/linux/hidden.h
+++ b/include/linux/hidden.h
@@ -16,4 +16,6 @@
  * giving them 'hidden' visibility.
  */
 
+#ifndef __BINDGEN__
 #pragma GCC visibility push(hidden)
+#endif
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 09/19] x86/pm-trace: Use RIP-relative accesses for .tracedata
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Use RIP-relative accesses and 32-bit offsets for .tracedata, to avoid
the need for relocation fixups at boot time. This is a prerequisite for
PIE linking, which only permits 64-bit wide loader-visible absolute
references.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/pm-trace.h | 4 ++--
 drivers/base/power/trace.c      | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index bfa32aa428e5..123faf978473 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -8,10 +8,10 @@
 do {								\
 	if (pm_trace_enabled) {					\
 		const void *tracedata;				\
-		asm volatile(_ASM_MOV " $1f,%0\n"		\
+		asm volatile("lea " _ASM_RIP(1f) ", %0\n"	\
 			     ".section .tracedata,\"a\"\n"	\
 			     "1:\t.word %c1\n\t"		\
-			     _ASM_PTR " %c2\n"			\
+			     ".long %c2 - .\n"			\
 			     ".previous"			\
 			     :"=r" (tracedata)			\
 			     : "i" (__LINE__), "i" (__FILE__));	\
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
index d8da7195bb00..111be5825529 100644
--- a/drivers/base/power/trace.c
+++ b/drivers/base/power/trace.c
@@ -167,7 +167,7 @@ EXPORT_SYMBOL(set_trace_device);
 void generate_pm_trace(const void *tracedata, unsigned int user)
 {
 	unsigned short lineno = *(unsigned short *)tracedata;
-	const char *file = *(const char **)(tracedata + 2);
+	const char *file = offset_to_ptr((int *)(tracedata + 2));
 	unsigned int user_hash_value, file_hash_value;
 
 	if (!x86_platform.legacy.rtc)
@@ -187,9 +187,9 @@ static int show_file_hash(unsigned int value)
 
 	match = 0;
 	for (tracedata = __tracedata_start ; tracedata < __tracedata_end ;
-			tracedata += 2 + sizeof(unsigned long)) {
+			tracedata += 2 + sizeof(int)) {
 		unsigned short lineno = *(unsigned short *)tracedata;
-		const char *file = *(const char **)(tracedata + 2);
+		const char *file = offset_to_ptr((int *)(tracedata + 2));
 		unsigned int hash = hash_string(lineno, file, FILEHASH);
 		if (hash != value)
 			continue;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 09/19] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-20 17:04   ` Sean Christopherson
  2026-01-08  9:25 ` [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address Ard Biesheuvel
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Replace absolute references in inline asm with RIP-relative ones, to
avoid the need for relocation fixups at boot time. This is a
prerequisite for PIE linking, which only permits 64-bit wide
loader-visible absolute references.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/kvm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index df78ddee0abb..1a0335f328e1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -807,8 +807,9 @@ extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
  * restoring to/from the stack.
  */
 #define PV_VCPU_PREEMPTED_ASM						     \
- "movq   __per_cpu_offset(,%rdi,8), %rax\n\t"				     \
- "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
+ "0:leaq 0b(%rip), %rax\n\t"						     \
+ "addq   __per_cpu_offset - 0b(%rax,%rdi,8), %rax\n\t"			     \
+ "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time-0b(%rax)\n\t" \
  "setne  %al\n\t"
 
 DEFINE_ASM_FUNC(__raw_callee_save___kvm_vcpu_is_preempted,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08 12:08   ` David Laight
  2026-01-08  9:25 ` [RFC/RFT PATCH 12/19] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Pushing an immediate absolute address to the stack is not permitted when
linking x86_64 code in PIE mode. Usually, the address can be taken using
a RIP-relative LEA instruction, but this is not possible here as there
are no available registers.

So instead, take the address into a static global, and push it onto the
stack using a RIP-relative memory operand.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/rethook.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
index 85e2f2d16a90..50812ac718b0 100644
--- a/arch/x86/kernel/rethook.c
+++ b/arch/x86/kernel/rethook.c
@@ -11,6 +11,10 @@
 
 __visible void arch_rethook_trampoline_callback(struct pt_regs *regs);
 
+#ifdef CONFIG_X86_64
+static __used void * const __arch_rethook_trampoline = &arch_rethook_trampoline;
+#endif
+
 #ifndef ANNOTATE_NOENDBR
 #define ANNOTATE_NOENDBR
 #endif
@@ -27,7 +31,7 @@ asm(
 #ifdef CONFIG_X86_64
 	ANNOTATE_NOENDBR "\n"	/* This is only jumped from ret instruction */
 	/* Push a fake return address to tell the unwinder it's a rethook. */
-	"	pushq $arch_rethook_trampoline\n"
+	"	pushq __arch_rethook_trampoline(%rip)\n"
 	UNWIND_HINT_FUNC
 	"       pushq $" __stringify(__KERNEL_DS) "\n"
 	/* Save the 'sp - 16', this will be fixed later. */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 12/19] x86/sync_core: Use RIP-relative addressing
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 13/19] x86/entry_64: " Ard Biesheuvel
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Use RIP-relative accesses for sync_core(). This removes a 32-bit
absolute reference that requires fixing up at runtime when KASLR is
enabled. This is a prerequisite for PIE linking, which only permits
64-bit wide loader-visible absolute references.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/sync_core.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 96bda43538ee..547fdc690ecc 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -31,7 +31,8 @@ static __always_inline void iret_to_self(void)
 		"pushfq\n\t"
 		"mov %%cs, %0\n\t"
 		"pushq %q0\n\t"
-		"pushq $1f\n\t"
+		"leaq 1f(%%rip), %q0\n\t"
+		"pushq %q0\n\t"
 		"iretq\n\t"
 		"1:"
 		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 13/19] x86/entry_64: Use RIP-relative addressing
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 12/19] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 14/19] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Replace a couple of instances in the x86_64 entry code where the
absolute address of a symbol is taken in a manner that is not supported
when linking in PIE mode, and use RIP-relative references instead, which
don't require boot-time fixups at all.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/entry/calling.h  |  9 +++++----
 arch/x86/entry/entry_64.S | 14 +++++++-------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 77e2d920a640..a37b402432a3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -376,8 +376,8 @@ For 32-bit we have the following conventions - kernel is built with
 .endm
 
 .macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+	GET_PERCPU_BASE \scratch_reg \save_reg
 	rdgsbase \save_reg
-	GET_PERCPU_BASE \scratch_reg
 	wrgsbase \scratch_reg
 .endm
 
@@ -413,15 +413,16 @@ For 32-bit we have the following conventions - kernel is built with
  * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
  * while running KVM's run loop.
  */
-.macro GET_PERCPU_BASE reg:req
+.macro GET_PERCPU_BASE reg:req scratch:req
 	LOAD_CPU_AND_NODE_SEG_LIMIT \reg
 	andq	$VDSO_CPUNODE_MASK, \reg
-	movq	__per_cpu_offset(, \reg, 8), \reg
+	leaq	__per_cpu_offset(%rip), \scratch
+	movq	(\scratch, \reg, 8), \reg
 .endm
 
 #else
 
-.macro GET_PERCPU_BASE reg:req
+.macro GET_PERCPU_BASE reg:req scratch:req
 	movq	pcpu_unit_offsets(%rip), \reg
 .endm
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f9983a1907bf..77584f5ebb4b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1040,7 +1040,8 @@ SYM_CODE_START(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	leaq	.Lgs_change(%rip), %rcx
+	cmpq	%rcx, RIP+8(%rsp)
 	jne	.Lerror_entry_done_lfence
 
 	/*
@@ -1252,10 +1253,10 @@ SYM_CODE_START(asm_exc_nmi)
 	 * the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
+	leaq	repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	1f
-	movq	$end_repeat_nmi, %rdx
+	leaq	end_repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	nested_nmi_out
 1:
@@ -1309,7 +1310,8 @@ nested_nmi:
 	pushq	%rdx
 	pushfq
 	pushq	$__KERNEL_CS
-	pushq	$repeat_nmi
+	leaq	repeat_nmi(%rip), %rdx
+	pushq	%rdx
 
 	/* Put stack back */
 	addq	$(6*8), %rsp
@@ -1348,10 +1350,8 @@ first_nmi:
 	addq	$8, (%rsp)	/* Fix up RSP */
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
-	iretq			/* continues at repeat_nmi below */
+	call	native_irq_return_iret
 	UNWIND_HINT_IRET_REGS
-1:
 #endif
 
 repeat_nmi:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 14/19] x86/hibernate: Prefer RIP-relative accesses
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 13/19] x86/entry_64: " Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Replace some absolute symbol references with RIP-relative ones, to avoid
fixups at boot time. This is a prerequisite for PIE linking, which only
permits 64-bit wide loader-visible absolute references.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index c73be0a02a6c..173df717275a 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -40,7 +40,7 @@ SYM_FUNC_START(restore_registers)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	pt_regs_sp(%rax), %rsp
 	movq	pt_regs_bp(%rax), %rbp
 	movq	pt_regs_si(%rax), %rsi
@@ -71,7 +71,7 @@ SYM_FUNC_START(restore_registers)
 SYM_FUNC_END(restore_registers)
 
 SYM_FUNC_START(swsusp_arch_suspend)
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 14/19] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-09  5:01   ` Brian Gerst
  2026-01-08  9:25 ` [RFC/RFT PATCH 16/19] x86/kexec: Use 64-bit wide absolute reference from relocated code Ard Biesheuvel
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Use ordinary RIP-relative references to make the code compatible with
running the linker in PIE mode.

Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
so there is no need to record the address of .Lresume_point in a global
variable. And fix the comment while at it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 04f561f75e99..15233a4e1c95 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 
 .code64
 	/*
-	 * Hooray, we are in Long 64-bit mode (but still running in low memory)
+	 * Hooray, we are in Long 64-bit mode
 	 */
 SYM_FUNC_START(wakeup_long64)
 	ANNOTATE_NOENDBR
@@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
 	movq	saved_rsi(%rip), %rsi
 	movq	saved_rbp(%rip), %rbp
 
-	movq	saved_rip(%rip), %rax
+	leaq	.Lresume_point(%rip), %rax
 	ANNOTATE_RETPOLINE_SAFE
 	jmp	*%rax
 SYM_FUNC_END(wakeup_long64)
@@ -52,7 +52,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
 	xorl	%eax, %eax
 	call	save_processor_state
 
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -71,8 +71,6 @@ SYM_FUNC_START(do_suspend_lowlevel)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	movq	$.Lresume_point, saved_rip(%rip)
-
 	movq	%rsp, saved_rsp(%rip)
 	movq	%rbp, saved_rbp(%rip)
 	movq	%rbx, saved_rbx(%rip)
@@ -90,7 +88,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
 .Lresume_point:
 	ANNOTATE_NOENDBR
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	saved_context_cr4(%rax), %rbx
 	movq	%rbx, %cr4
 	movq	saved_context_cr3(%rax), %rbx
@@ -139,7 +137,6 @@ saved_rsi:		.quad	0
 saved_rdi:		.quad	0
 saved_rbx:		.quad	0
 
-saved_rip:		.quad	0
 saved_rsp:		.quad	0
 
 SYM_DATA(saved_magic,	.quad	0)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 16/19] x86/kexec: Use 64-bit wide absolute reference from relocated code
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 17/19] x86/head64: Avoid absolute references in startup asm Ard Biesheuvel
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

The virtual_mapped() kexec routine runs from a different virtual address
than it was linked at, and so it needs to use an absolute reference to
load the address of 'saved_context'. Change this reference to a 64-bit
wide one, to make the code compatible with linking in PIE mode.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/relocate_kernel_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 4ffba68dc57b..3fc1a3002e32 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -311,7 +311,7 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
 
 #ifdef CONFIG_KEXEC_JUMP
 	/* Saved in save_processor_state. */
-	movq    $saved_context, %rax
+	movabsq $saved_context, %rax
 	lgdt    saved_context_gdt_desc(%rax)
 #endif
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 17/19] x86/head64: Avoid absolute references in startup asm
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 16/19] x86/kexec: Use 64-bit wide absolute reference from relocated code Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 18/19] x86/boot: Implement support for RELA/RELR/REL runtime relocations Ard Biesheuvel
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Replace a couple of occurrences of absolute references with RIP-relative
ones. This removes the need for boot-time fixups. This is a prerequisite
for PIE linking, which only permits 64-bit wide loader-visible absolute
references.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/head_64.S | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 21816b48537c..2c666c8c4519 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -179,8 +179,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	xorl	%r15d, %r15d
 
 	/* Derive the runtime physical address of init_top_pgt[] */
-	movq	phys_base(%rip), %rax
-	addq	$(init_top_pgt - __START_KERNEL_map), %rax
+	leaq	init_top_pgt(%rip), %rax
+	subq	$__START_KERNEL_map, %rax
+	addq	phys_base(%rip), %rax
 
 	/*
 	 * Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -232,6 +233,9 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 	btsl	$X86_CR4_PGE_BIT, %ecx
 	movq	%rcx, %cr4
 
+	/* Use .text as an anchor to emit PC-relative symbol references */
+	leaq	.text(%rip), %rbx
+
 #ifdef CONFIG_SMP
 	/*
 	 * For parallel boot, the APIC ID is read from the APIC, and then
@@ -288,10 +292,9 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 .Llookup_AP:
 	/* EAX contains the APIC ID of the current CPU */
 	xorl	%ecx, %ecx
-	leaq	cpuid_to_apicid(%rip), %rbx
 
 .Lfind_cpunr:
-	cmpl	(%rbx,%rcx,4), %eax
+	cmpl	cpuid_to_apicid - .text(%rbx,%rcx,4), %eax
 	jz	.Lsetup_cpu
 	inc	%ecx
 #ifdef CONFIG_FORCE_NR_CPUS
@@ -311,7 +314,7 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 
 .Lsetup_cpu:
 	/* Get the per cpu offset for the given CPU# which is in ECX */
-	movq	__per_cpu_offset(,%rcx,8), %rdx
+	movq	__per_cpu_offset - .text(%rbx,%rcx,8), %rdx
 #else
 	xorl	%edx, %edx /* zero-extended to clear all of RDX */
 #endif /* CONFIG_SMP */
@@ -322,7 +325,7 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 	 *
 	 * RDX contains the per-cpu offset
 	 */
-	movq	current_task(%rdx), %rax
+	movq	current_task - .text(%rbx,%rdx), %rax
 	movq	TASK_threadsp(%rax), %rsp
 
 	/*
@@ -343,7 +346,7 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 	 */
 	subq	$16, %rsp
 	movw	$(GDT_SIZE-1), (%rsp)
-	leaq	gdt_page(%rdx), %rax
+	leaq	gdt_page - .text(%rbx,%rdx), %rax
 	movq	%rax, 2(%rsp)
 	lgdt	(%rsp)
 	addq	$16, %rsp
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 18/19] x86/boot: Implement support for RELA/RELR/REL runtime relocations
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 17/19] x86/head64: Avoid absolute references in startup asm Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08  9:25 ` [RFC/RFT PATCH 19/19] x86/kernel: Switch to PIE linking for the relocatable kernel Ard Biesheuvel
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

Given that the decompressor already incorporates an ELF loader that
parses the program headers of the decompressed ELF image, support for
dealing with the PT_DYNAMIC program header can be added quite easily,
which describes the location of the RELA and RELR relocation tables in
the image.

This is a more efficient, and more idiomatic format, which allows the
handling of boot-time randomization (KASLR) in a generic manner, rather
than based on a bespoke x86-specific relocation format. This is a
prerequisite for enabling further hardening measures that are
implemented in the ELF domain, i.e., fgkaslr.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/misc.c | 73 +++++++++++++++++++-
 include/uapi/linux/elf.h        |  3 +
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 1ea419cf88fe..bc5677e697ca 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -278,7 +278,68 @@ static inline void handle_relocations(void *output, unsigned long output_len,
 { }
 #endif
 
-static size_t parse_elf(void *output)
+#define ELF(type) __PASTE(__PASTE(Elf, __LONG_WIDTH__), __PASTE(_, type))
+
+static void handle_dynamic(const ELF(Dyn) *dyn, unsigned long p2v_offset,
+			   unsigned long va_shift)
+{
+	const ELF(Rela) *rela = NULL;
+	const ELF(Rel) *rel = NULL;
+	unsigned long *relr = NULL;
+	unsigned long *place;
+	int relasize = 0;
+	int relrsize = 0;
+	int relsize = 0;
+
+	for (auto d = dyn; d->d_tag != DT_NULL; d++) {
+		switch (d->d_tag) {
+		case DT_RELA:
+			rela = (void *)(d->d_un.d_ptr + p2v_offset);
+			break;
+		case DT_RELASZ:
+			relasize = d->d_un.d_val;
+			break;
+		case DT_RELR:
+			relr = (void *)(d->d_un.d_ptr + p2v_offset);
+			break;
+		case DT_RELRSZ:
+			relrsize = d->d_un.d_val;
+			break;
+		case DT_REL:
+			rel = (void *)(d->d_un.d_ptr + p2v_offset);
+			break;
+		case DT_RELSZ:
+			relsize = d->d_un.d_val;
+			break;
+		}
+	}
+
+	for (int i = 0; i < relasize / sizeof(*rela); i++) {
+		place = (unsigned long *)(rela[i].r_offset + p2v_offset);
+		*place += va_shift;
+	}
+
+	for (int i = 0; i < relrsize / sizeof(*relr); i++) {
+		if ((relr[i] & 1) == 0) {
+			place = (unsigned long *)(relr[i] + p2v_offset);
+			*place++ += va_shift;
+			continue;
+		}
+
+		for (unsigned long *p = place, r = relr[i] >> 1; r; p++, r >>= 1)
+			if (r & 1)
+				*p += va_shift;
+		place += 8 * sizeof(*relr) - 1;
+	}
+
+	for (int i = 0; i < relsize / sizeof(*rel); i++) {
+		place = (unsigned long *)(rel[i].r_offset + p2v_offset);
+		*place += va_shift;
+	}
+
+}
+
+static size_t parse_elf(void *output, u64 va_shift)
 {
 #ifdef CONFIG_X86_64
 	Elf64_Ehdr ehdr;
@@ -320,6 +381,12 @@ static size_t parse_elf(void *output)
 				dest += (unsigned long)output - LOAD_PHYSICAL_ADDR;
 			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
+		case PT_DYNAMIC:
+			if (!va_shift)
+				break;
+			dest = (void *)(output + phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+			handle_dynamic(dest, (unsigned long)dest - phdr->p_vaddr, va_shift);
+			break;
 		default: /* Ignore other PT_* */ break;
 		}
 	}
@@ -351,7 +418,9 @@ unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
 			 NULL, error) < 0)
 		return ULONG_MAX;
 
-	entry = parse_elf(outbuf);
+	if (IS_ENABLED(CONFIG_X86_32))
+		virt_addr = (unsigned long)outbuf;
+	entry = parse_elf(outbuf, virt_addr - LOAD_PHYSICAL_ADDR);
 	handle_relocations(outbuf, output_len, virt_addr);
 
 	return entry;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 819ded2d39de..868cd67f0ea7 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -103,6 +103,9 @@ typedef __u16	Elf64_Versym;
 #define DT_TEXTREL	22
 #define DT_JMPREL	23
 #define DT_ENCODING	32
+#define DT_RELRSZ	35
+#define DT_RELR		36
+#define DT_RELRENT	37
 #define OLD_DT_LOOS	0x60000000
 #define DT_LOOS		0x6000000d
 #define DT_HIOS		0x6ffff000
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC/RFT PATCH 19/19] x86/kernel: Switch to PIE linking for the relocatable kernel
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 18/19] x86/boot: Implement support for RELA/RELR/REL runtime relocations Ard Biesheuvel
@ 2026-01-08  9:25 ` Ard Biesheuvel
  2026-01-08 16:35 ` [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Alexander Lobakin
  2026-01-09  0:36 ` H. Peter Anvin
  20 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08  9:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

If the toolchain supports RELR relocation packing, build the virtually
relocatable kernels as Position Independent (PIE) Executables. This
results in more efficient relocation processing for the virtual
displacement of the kernel applied at boot, using RELR relocations that
take up only a fraction of the space occupied by ordinary RELA
relocations.

More importantly, it instructs the linker to generate a binary that is
really meant to be relocated at boot, using data structures that are
intended for this purpose. Doing so is important for a couple of
reasons:

- Relying on --emit-relocs is problematic, because it produces the
  static relocations that are consumed by the linker as input, and these
  are not meant for describing a runtime relocatable image. For example,
  the linker may apply relaxations that result in the code and the
  static relocation going out of sync (and ld.bfd and ld.lld already
  handle this in a different way).

- The 'relocs' tool relies on manually kept allow/deny lists of symbol
  names. These are needed because ELF absolute/relative symbol
  designations are often inaccurate.

- x86 deviates from other architectures in the kernel when it comes to
  its implementation of boot-time relocation, making it difficult to
  implement further enhancements (e.g., fgkaslr, EFI zboot) in a
  portable manner.

Note that this means that all codegen on x86_64 should be position
independent, to be compatible with PIE linking, but only if KASLR is
enabled. On i386, no changes to the codegen are needed, as the ordinary
position dependent relocation model is supported by the linker when
operating in PIE mode.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/Kconfig              |  3 ++-
 arch/x86/Makefile             |  5 +++++
 arch/x86/kernel/vmlinux.lds.S | 18 ++++++++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b3a64cfe04cf..2aa50aa8dc68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -103,6 +103,7 @@ config X86
 	select ARCH_HAS_NONLEAF_PMD_YOUNG	if PGTABLE_LEVELS > 2
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
 	select ARCH_HAS_COPY_MC			if X86_64
+	select ARCH_HAS_RELR
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SET_DIRECT_MAP
 	select ARCH_HAS_STRICT_KERNEL_RWX
@@ -2129,7 +2130,7 @@ config RANDOMIZE_BASE
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
-	depends on RELOCATABLE
+	depends on RELOCATABLE && !TOOLS_SUPPORT_RELR
 	select ARCH_VMLINUX_NEEDS_RELOCS
 
 config PHYSICAL_ALIGN
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b211d6c950aa..7eac705c4ff4 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -258,6 +258,11 @@ endif
 
 KBUILD_LDFLAGS += -m elf_$(UTS_MACHINE)
 
+ldflags-pie-$(CONFIG_LD_IS_LLD)		:= --apply-dynamic-relocs
+ldflags-pie-$(CONFIG_LD_IS_BFD)		:= -z call-nop=suffix-nop
+ldflags-$(CONFIG_RELOCATABLE_PIE)	:= --pie -z notext $(ldflags-pie-y)
+LDFLAGS_vmlinux				+= $(ldflags-y)
+
 #
 # The 64-bit kernel must be aligned to 2MB.  Pass -z max-page-size=0x200000 to
 # the linker to force 2MB page size regardless of the default page size used
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 6772fe9a9957..cfaf6ab80684 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -127,6 +127,9 @@ PHDRS {
 	text PT_LOAD FLAGS(5);          /* R_E */
 	data PT_LOAD FLAGS(6);          /* RW_ */
 	note PT_NOTE FLAGS(0);          /* ___ */
+#ifdef CONFIG_RELOCATABLE_PIE
+	dynamic PT_DYNAMIC;
+#endif
 }
 
 SECTIONS
@@ -201,6 +204,21 @@ SECTIONS
 	DATA_SEGMENT_START
 	INIT_DATA_SECTION(16) :data
 
+#ifdef CONFIG_RELOCATABLE_PIE
+	/DISCARD/ : {
+		*(.interp .dynbss .eh_frame .sframe .relr.auth.dyn)
+	}
+
+	.dynamic	: { *(.dynamic) } :dynamic :data
+	.dynstr		: { *(.dynstr) } :data
+	.dynsym		: { *(.dynsym) }
+	.gnu.hash	: { *(.gnu.hash) }
+	.hash		: { *(.hash) }
+	.init.rela	: { *(.rela.*) *(.rela_*) }
+	.init.rel	: { *(.rel.*) *(.rel_*) }
+	.init.relr	: { *(.relr.*) }
+#endif
+
 	.x86_cpu_dev.init : AT(ADDR(.x86_cpu_dev.init) - LOAD_OFFSET) {
 		__x86_cpu_dev_start = .;
 		*(.x86_cpu_dev.init)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address
  2026-01-08  9:25 ` [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address Ard Biesheuvel
@ 2026-01-08 12:08   ` David Laight
  2026-01-08 12:10     ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: David Laight @ 2026-01-08 12:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

On Thu,  8 Jan 2026 09:25:38 +0000
Ard Biesheuvel <ardb@kernel.org> wrote:

> Pushing an immediate absolute address to the stack is not permitted when
> linking x86_64 code in PIE mode. Usually, the address can be taken using
> a RIP-relative LEA instruction, but this is not possible here as there
> are no available registers.
> 
> So instead, take the address into a static global, and push it onto the
> stack using a RIP-relative memory operand.

The comment implies the address is 'fake'.
Does that mean it could just be a constant?
Clearly the unwinder would need the same change.

	David

> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/rethook.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
> index 85e2f2d16a90..50812ac718b0 100644
> --- a/arch/x86/kernel/rethook.c
> +++ b/arch/x86/kernel/rethook.c
> @@ -11,6 +11,10 @@
>  
>  __visible void arch_rethook_trampoline_callback(struct pt_regs *regs);
>  
> +#ifdef CONFIG_X86_64
> +static __used void * const __arch_rethook_trampoline = &arch_rethook_trampoline;
> +#endif
> +
>  #ifndef ANNOTATE_NOENDBR
>  #define ANNOTATE_NOENDBR
>  #endif
> @@ -27,7 +31,7 @@ asm(
>  #ifdef CONFIG_X86_64
>  	ANNOTATE_NOENDBR "\n"	/* This is only jumped from ret instruction */
>  	/* Push a fake return address to tell the unwinder it's a rethook. */
> -	"	pushq $arch_rethook_trampoline\n"
> +	"	pushq __arch_rethook_trampoline(%rip)\n"
>  	UNWIND_HINT_FUNC
>  	"       pushq $" __stringify(__KERNEL_DS) "\n"
>  	/* Save the 'sp - 16', this will be fixed later. */


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address
  2026-01-08 12:08   ` David Laight
@ 2026-01-08 12:10     ` Ard Biesheuvel
  2026-01-08 12:19       ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08 12:10 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

On Thu, 8 Jan 2026 at 13:08, David Laight <david.laight.linux@gmail.com> wrote:
>
> On Thu,  8 Jan 2026 09:25:38 +0000
> Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > Pushing an immediate absolute address to the stack is not permitted when
> > linking x86_64 code in PIE mode. Usually, the address can be taken using
> > a RIP-relative LEA instruction, but this is not possible here as there
> > are no available registers.
> >
> > So instead, take the address into a static global, and push it onto the
> > stack using a RIP-relative memory operand.
>
> The comment implies the address is 'fake'.
> Does that mean it could just be a constant?

It could be, but it isn't, across all architectures.

> Clearly the unwinder would need the same change.
>

Why? The value being pushed is the same.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address
  2026-01-08 12:10     ` Ard Biesheuvel
@ 2026-01-08 12:19       ` Ard Biesheuvel
  0 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-08 12:19 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

On Thu, 8 Jan 2026 at 13:10, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 8 Jan 2026 at 13:08, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > On Thu,  8 Jan 2026 09:25:38 +0000
> > Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > Pushing an immediate absolute address to the stack is not permitted when
> > > linking x86_64 code in PIE mode. Usually, the address can be taken using
> > > a RIP-relative LEA instruction, but this is not possible here as there
> > > are no available registers.
> > >
> > > So instead, take the address into a static global, and push it onto the
> > > stack using a RIP-relative memory operand.
> >
> > The comment implies the address is 'fake'.
> > Does that mean it could just be a constant?
>
> It could be, but it isn't, across all architectures.
>
> > Clearly the unwinder would need the same change.
> >
>
> Why? The value being pushed is the same.

Never mind - I guess you meant 'the unwinder would need to use the constant too'

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2026-01-08  9:25 ` [RFC/RFT PATCH 19/19] x86/kernel: Switch to PIE linking for the relocatable kernel Ard Biesheuvel
@ 2026-01-08 16:35 ` Alexander Lobakin
  2026-01-09  0:36 ` H. Peter Anvin
  20 siblings, 0 replies; 54+ messages in thread
From: Alexander Lobakin @ 2026-01-08 16:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

From: Ard Biesheuvel <ardb@kernel.org>
Date: Thu,  8 Jan 2026 09:25:27 +0000

> This series is a follow-up to a series I sent a bit more than a year
> ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
> for further hardening measures, such as fg-kaslr [1], as well as further
> harmonization of the boot protocols between architectures [2].
> 
> The main sticking point is the fact that PIE linking on x86_64 requires
> PIE codegen, and that was shot down before on the basis that
> a) GOTs in fully linked binaries are stupid
> b) the code size increase would be prohibitive
> c) the performance would suffer.
> 
> This series implements PIE codegen without permitting the use of GOT
> slots. The code size increase is between 0.2% (clang) and 0.5% (gcc),
> and I could not identify any performance regressions (using hackbench)
> on various different micro-architectures that I tried it on.
> (Suggestions for other benchmarks/test cases are welcome)
> 
> So now that we have some actual numbers, I would like to try and revisit
> this discussion, and get a conclusion on whether this is really a
> non-starter. Note that only the KASLR kernel would rely on this, and
> disabling CONFIG_RANDOMIZE_BASE will revert to the current situation
> (provided that patch #4 is applied)
> 
> Some minor asm tweaks are needed too (patches #9 - #17), but those all
> seem uncontroversial to me. 
> 
> The first 5 patches are general cleanup, and could be taken into
> consideration independently of the discussion around PIC codegen.
> 
> [1] There have been a few attempts at landing fine grained KASLR for
> x86, but the main problem is that it was tied to the x86 relocation
> format, which deviates from how fully linked relocatable ELF binaries
> are generally constructed (using PIE). Implementing fgkaslr in the ELF
> domain would make it suitable for other architectures too, as well as
> other use cases (bare metal or hosted) where no dynamic linking is
> performed (firmware, hypervisors). In order to implement this properly,
> i.e., with debugging support etc, it needs support from the tooling
> side. (Fine grained KASLR in combination with execute-only code mappings
> makes it extremely difficult for an attacker to subvert the control flow
> in the kernel in a way that can be meaningfully exploited).

In case anybody is interested...
The latest (to my knowledge) experiments with FG-KALSR was my side
project reviving Kristen's old series (and then rewriting it
completely): [0]

I haven't worked on it since then, as I work in an
XDP/netmem/whatever team, i.e. networking, not x86, and free time for
side projects shrunk severely since 2022.

Maybe someone would pick it up again some day, just like I picked up
Kristen's series back then...

[0] https://github.com/alobakin/linux/commits/fgkaslr

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
  2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
                   ` (19 preceding siblings ...)
  2026-01-08 16:35 ` [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Alexander Lobakin
@ 2026-01-09  0:36 ` H. Peter Anvin
  2026-01-09  9:21   ` Ard Biesheuvel
  20 siblings, 1 reply; 54+ messages in thread
From: H. Peter Anvin @ 2026-01-09  0:36 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-kernel
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Kees Cook, Uros Bizjak,
	Brian Gerst, linux-hardening

On 2026-01-08 01:25, Ard Biesheuvel wrote:
> This series is a follow-up to a series I sent a bit more than a year
> ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
> for further hardening measures, such as fg-kaslr [1], as well as further
> harmonization of the boot protocols between architectures [2].

Kristin Accardi had fg-kasrl running without that, didn't she?

From your footnotes, it looks like what you are *really* asking for is to
pessimize x86 code to benefit other architectures. That isn't inherently
wrong, but stating it as you have above is dishonest.

> The main sticking point is the fact that PIE linking on x86_64 requires
> PIE codegen, and that was shot down before on the basis that
> a) GOTs in fully linked binaries are stupid
> b) the code size increase would be prohibitive
> c) the performance would suffer.
> 
> This series implements PIE codegen without permitting the use of GOT
> slots. The code size increase is between 0.2% (clang) and 0.5% (gcc),
> and I could not identify any performance regressions (using hackbench)
> on various different micro-architectures that I tried it on.
> (Suggestions for other benchmarks/test cases are welcome)

Could you show some examples of how the code changes?

	-hpa
> 
> [1] There have been a few attempts at landing fine grained KASLR for
> x86, but the main problem is that it was tied to the x86 relocation
> format, which deviates from how fully linked relocatable ELF binaries
> are generally constructed (using PIE). Implementing fgkaslr in the ELF
> domain would make it suitable for other architectures too, as well as
> other use cases (bare metal or hosted) where no dynamic linking is
> performed (firmware, hypervisors). In order to implement this properly,
> i.e., with debugging support etc, it needs support from the tooling
> side. (Fine grained KASLR in combination with execute-only code mappings
> makes it extremely difficult for an attacker to subvert the control flow
> in the kernel in a way that can be meaningfully exploited).
> 
> [2] EFI zboot is already used by various architectures that have no
> decompressor stage at all (arm64, RISC-V, LoongArch), and this format
> can be combined with an ELF payload too. EFI zboot accommodates non-EFI
> boot chains by describing the size, offset, payload type and compression
> type in its header, so that it can be extracted and booted by other
> means.

The bzImage format already have that for all practical purposes. We *really*
don't want to introduce a new binary format for the x86 kernel. A bunch of
such attempts have been done in the past, and it is nothing but a mess that
breaks things, because now you are encouraging different bootloaders to
support a non-overlapping set of binary formats.

STRONG NAK on that one.

	-hpa



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-08  9:25 ` [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
@ 2026-01-09  5:01   ` Brian Gerst
  2026-01-09  7:59     ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Gerst @ 2026-01-09  5:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, linux-hardening

On Thu, Jan 8, 2026 at 4:28 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Use ordinary RIP-relative references to make the code compatible with
> running the linker in PIE mode.
>
> Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
> so there is no need to record the address of .Lresume_point in a global
> variable. And fix the comment while at it.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> index 04f561f75e99..15233a4e1c95 100644
> --- a/arch/x86/kernel/acpi/wakeup_64.S
> +++ b/arch/x86/kernel/acpi/wakeup_64.S
> @@ -14,7 +14,7 @@
>
>  .code64
>         /*
> -        * Hooray, we are in Long 64-bit mode (but still running in low memory)
> +        * Hooray, we are in Long 64-bit mode
>          */
>  SYM_FUNC_START(wakeup_long64)
>         ANNOTATE_NOENDBR
> @@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
>         movq    saved_rsi(%rip), %rsi
>         movq    saved_rbp(%rip), %rbp
>
> -       movq    saved_rip(%rip), %rax
> +       leaq    .Lresume_point(%rip), %rax
>         ANNOTATE_RETPOLINE_SAFE
>         jmp     *%rax

If this is already running on the virtual mapping, this can simply be
changed to a direct jump.


Brian Gerst

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-09  5:01   ` Brian Gerst
@ 2026-01-09  7:59     ` Ard Biesheuvel
  2026-01-09 11:46       ` Brian Gerst
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-09  7:59 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, linux-hardening

On Fri, 9 Jan 2026 at 06:01, Brian Gerst <brgerst@gmail.com> wrote:
>
> On Thu, Jan 8, 2026 at 4:28 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Use ordinary RIP-relative references to make the code compatible with
> > running the linker in PIE mode.
> >
> > Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
> > so there is no need to record the address of .Lresume_point in a global
> > variable. And fix the comment while at it.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> > index 04f561f75e99..15233a4e1c95 100644
> > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > @@ -14,7 +14,7 @@
> >
> >  .code64
> >         /*
> > -        * Hooray, we are in Long 64-bit mode (but still running in low memory)
> > +        * Hooray, we are in Long 64-bit mode
> >          */
> >  SYM_FUNC_START(wakeup_long64)
> >         ANNOTATE_NOENDBR
> > @@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
> >         movq    saved_rsi(%rip), %rsi
> >         movq    saved_rbp(%rip), %rbp
> >
> > -       movq    saved_rip(%rip), %rax
> > +       leaq    .Lresume_point(%rip), %rax
> >         ANNOTATE_RETPOLINE_SAFE
> >         jmp     *%rax
>
> If this is already running on the virtual mapping, this can simply be
> changed to a direct jump.
>

Indeed, but I couldn't figure out how to do so without making objtool unhappy.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
  2026-01-09  0:36 ` H. Peter Anvin
@ 2026-01-09  9:21   ` Ard Biesheuvel
  2026-01-14 18:16     ` Kees Cook
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-09  9:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Fri, 9 Jan 2026 at 01:37, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2026-01-08 01:25, Ard Biesheuvel wrote:
> > This series is a follow-up to a series I sent a bit more than a year
> > ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
> > for further hardening measures, such as fg-kaslr [1], as well as further
> > harmonization of the boot protocols between architectures [2].
>
> Kristin Accardi had fg-kasrl running without that, didn't she?
>

Yes, as a proof of concept. But it is tied to the x86 approach of
performing runtime relocations based on build time relocation data,
which is problematic now that linkers have started to perform
relaxations, as these cannot always be translated 1:1. For instance,
we already have a latent bug in the x86 relocs tool, which ignores
GOTPCREL relocations on the basis that the relocation is relative.
However, this is only true for Clang/lld, which does not update the
static relocation tables after performing relaxations. ld.bfd does
attempt to keep those tables in sync, and so a GOTPCREL relocation
should be flagged as a bug when encountered, because it means there is
a GOT slot somewhere with no relocation associated with it.

One could argue that this example is just a Clang bug, but it is very
difficult to make that case with the toolchain developers, given that
--emit-relocs (which is what tells the linker to emit the relocations
that it received as input) has no specification, and some linker
relaxations are not representable as static relocations to begin with
(but to be fair, that currently mostly affects other architectures,
but there is no reason this could never happen on x86)

Doing fgkaslr properly (IMHO) means supporting things like live patch
and debug seamlessly, and in a portable manner. Toolchain support is
critical, and securing that for a one-off x86 implementation rather
than one that can be used across architectures and other bare-metal
projects is going to be difficult.

> From your footnotes, it looks like what you are *really* asking for is to
> pessimize x86 code to benefit other architectures. That isn't inherently
> wrong, but stating it as you have above is dishonest.
>

I was hoping to save the ad-hominems for later in the thread, when
things *really* heat up.

The point is not to benefit other architectures. The point is to
implement something once, and deploy it on all architectures in the
same way. ELF is the greatest common denominator across the entire
ecosystem, and so using idiomatic ELF to describe how to load the
image and how to move it around in the virtual address space is on
obvious choice.

> > The main sticking point is the fact that PIE linking on x86_64 requires
> > PIE codegen, and that was shot down before on the basis that
> > a) GOTs in fully linked binaries are stupid
> > b) the code size increase would be prohibitive
> > c) the performance would suffer.
> >
> > This series implements PIE codegen without permitting the use of GOT
> > slots. The code size increase is between 0.2% (clang) and 0.5% (gcc),
> > and I could not identify any performance regressions (using hackbench)
> > on various different micro-architectures that I tried it on.
> > (Suggestions for other benchmarks/test cases are welcome)
>
> Could you show some examples of how the code changes?
>

Taking the address of a symbol (same code size)

   0: 48 c7 c0 00 00 00 00 mov    $0x0,%rax
3: R_X86_64_32S sym


   7: 48 8d 05 00 00 00 00 lea    0x0(%rip),%rax        # 0xe
a: R_X86_64_PC32


Loading a global variable from memory (one byte shorter in PIC)

   e: 48 8b 04 25 00 00 00 mov    0x0,%rax
  15: 00
12: R_X86_64_32S sym


  16: 48 8b 05 00 00 00 00 mov    0x0(%rip),%rax        # 0x1d
19: R_X86_64_PC32 sym-0x4


Indexing a global array (3 bytes longer in PIC, needs an additional
GPR if source and destination are the same)

  1d: 48 8b 04 c5 00 00 00 mov    0x0(,%rax,8),%rax
  24: 00
21: R_X86_64_32S array


  25: 48 8d 15 00 00 00 00 lea    0x0(%rip),%rdx        # 0x2c
28: R_X86_64_PC32 array-0x4
  2c: 48 8b 04 c2          mov    (%rdx,%rax,8),%rax


Pushing the address of a symbol to the stack ((3 bytes longer in PIC,
needs an additional GPR)

  30: 68 00 00 00 00        push   $0x0
31: R_X86_64_32S sym


  35: 48 8d 05 00 00 00 00 lea    0x0(%rip),%rax        # 0x3c
38: R_X86_64_PC32 sym-0x4
  3c: 50                    push   %rax


Jump tables look completely different, but the table itself is only
half the size. Even for non-PIC, jump tables are problematic for
objtool, and so these need to be annotated by the compiler. I have
some unfinished Clang patches that implement this, which I hope to get
back to soon.

The asm patches in the series should give a good impression of how the
code changes.


> >
> > [1] There have been a few attempts at landing fine grained KASLR for
> > x86, but the main problem is that it was tied to the x86 relocation
> > format, which deviates from how fully linked relocatable ELF binaries
> > are generally constructed (using PIE). Implementing fgkaslr in the ELF
> > domain would make it suitable for other architectures too, as well as
> > other use cases (bare metal or hosted) where no dynamic linking is
> > performed (firmware, hypervisors). In order to implement this properly,
> > i.e., with debugging support etc, it needs support from the tooling
> > side. (Fine grained KASLR in combination with execute-only code mappings
> > makes it extremely difficult for an attacker to subvert the control flow
> > in the kernel in a way that can be meaningfully exploited).
> >
> > [2] EFI zboot is already used by various architectures that have no
> > decompressor stage at all (arm64, RISC-V, LoongArch), and this format
> > can be combined with an ELF payload too. EFI zboot accommodates non-EFI
> > boot chains by describing the size, offset, payload type and compression
> > type in its header, so that it can be extracted and booted by other
> > means.
>
> The bzImage format already have that for all practical purposes. We *really*
> don't want to introduce a new binary format for the x86 kernel. A bunch of
> such attempts have been done in the past, and it is nothing but a mess that
> breaks things, because now you are encouraging different bootloaders to
> support a non-overlapping set of binary formats.
>
> STRONG NAK on that one.
>

I think it should be feasible to implement a hybrid bzImage/EFI zboot
format. There is already prior art in loaders that decompress the ELF
payload directly (Xen).

Given that a x86_64 bootloader running in long mode needs to do very
little beyond loading the ELF at some arbitrary 2M aligned offset and
calling the entrypoint with a struct bootparams in %RDI, most of the
logic in the decompressor is really only needed when booting in 32-bit
mode.

So I think there is value in having a generic boot format that can be
consumed by EFI directly, or by a generic ELF vmlinux loader (library)
that understands the EFI zboot format and knows how to extract the ELF
payload. I'd strongly prefer only a single idiom for describing the
relocations in the image.

On other architectures (i.e., without decompressor), EFI zboot would
be a prerequisite for fgkaslr, but it is up to the platform to decide
whether to boot via EFI or load the ELF and apply the relocations. On
x86_64, the same tooling would work seamlessly, but the decompressor
could apply the relocations itself as well.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-09  7:59     ` Ard Biesheuvel
@ 2026-01-09 11:46       ` Brian Gerst
  2026-01-09 12:09         ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Brian Gerst @ 2026-01-09 11:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, linux-hardening

On Fri, Jan 9, 2026 at 2:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 9 Jan 2026 at 06:01, Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Thu, Jan 8, 2026 at 4:28 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Use ordinary RIP-relative references to make the code compatible with
> > > running the linker in PIE mode.
> > >
> > > Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
> > > so there is no need to record the address of .Lresume_point in a global
> > > variable. And fix the comment while at it.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
> > >  1 file changed, 4 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> > > index 04f561f75e99..15233a4e1c95 100644
> > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > @@ -14,7 +14,7 @@
> > >
> > >  .code64
> > >         /*
> > > -        * Hooray, we are in Long 64-bit mode (but still running in low memory)
> > > +        * Hooray, we are in Long 64-bit mode
> > >          */
> > >  SYM_FUNC_START(wakeup_long64)
> > >         ANNOTATE_NOENDBR
> > > @@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
> > >         movq    saved_rsi(%rip), %rsi
> > >         movq    saved_rbp(%rip), %rbp
> > >
> > > -       movq    saved_rip(%rip), %rax
> > > +       leaq    .Lresume_point(%rip), %rax
> > >         ANNOTATE_RETPOLINE_SAFE
> > >         jmp     *%rax
> >
> > If this is already running on the virtual mapping, this can simply be
> > changed to a direct jump.
> >
>
> Indeed, but I couldn't figure out how to do so without making objtool unhappy.

I replaced it with a simple "jmp .Lresume_point" and objtool seemed
fine with it on a defconfig build.  What error did you see?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-09 11:46       ` Brian Gerst
@ 2026-01-09 12:09         ` Ard Biesheuvel
  2026-01-09 12:10           ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-09 12:09 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, linux-hardening

On Fri, 9 Jan 2026 at 12:46, Brian Gerst <brgerst@gmail.com> wrote:
>
> On Fri, Jan 9, 2026 at 2:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Fri, 9 Jan 2026 at 06:01, Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > On Thu, Jan 8, 2026 at 4:28 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > Use ordinary RIP-relative references to make the code compatible with
> > > > running the linker in PIE mode.
> > > >
> > > > Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
> > > > so there is no need to record the address of .Lresume_point in a global
> > > > variable. And fix the comment while at it.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
> > > >  1 file changed, 4 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> > > > index 04f561f75e99..15233a4e1c95 100644
> > > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > > @@ -14,7 +14,7 @@
> > > >
> > > >  .code64
> > > >         /*
> > > > -        * Hooray, we are in Long 64-bit mode (but still running in low memory)
> > > > +        * Hooray, we are in Long 64-bit mode
> > > >          */
> > > >  SYM_FUNC_START(wakeup_long64)
> > > >         ANNOTATE_NOENDBR
> > > > @@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
> > > >         movq    saved_rsi(%rip), %rsi
> > > >         movq    saved_rbp(%rip), %rbp
> > > >
> > > > -       movq    saved_rip(%rip), %rax
> > > > +       leaq    .Lresume_point(%rip), %rax
> > > >         ANNOTATE_RETPOLINE_SAFE
> > > >         jmp     *%rax
> > >
> > > If this is already running on the virtual mapping, this can simply be
> > > changed to a direct jump.
> > >
> >
> > Indeed, but I couldn't figure out how to do so without making objtool unhappy.
>
> I replaced it with a simple "jmp .Lresume_point" and objtool seemed
> fine with it on a defconfig build.  What error did you see?

arch/x86/kernel/acpi/wakeup_64.o: warning: objtool: wakeup_long64()
falls through to next function do_suspend_lowlevel()

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-09 12:09         ` Ard Biesheuvel
@ 2026-01-09 12:10           ` Ard Biesheuvel
  2026-01-09 12:51             ` Brian Gerst
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-09 12:10 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, linux-hardening

On Fri, 9 Jan 2026 at 13:09, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 9 Jan 2026 at 12:46, Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Fri, Jan 9, 2026 at 2:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Fri, 9 Jan 2026 at 06:01, Brian Gerst <brgerst@gmail.com> wrote:
> > > >
> > > > On Thu, Jan 8, 2026 at 4:28 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > Use ordinary RIP-relative references to make the code compatible with
> > > > > running the linker in PIE mode.
> > > > >
> > > > > Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
> > > > > so there is no need to record the address of .Lresume_point in a global
> > > > > variable. And fix the comment while at it.
> > > > >
> > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > ---
> > > > >  arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
> > > > >  1 file changed, 4 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> > > > > index 04f561f75e99..15233a4e1c95 100644
> > > > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > > > @@ -14,7 +14,7 @@
> > > > >
> > > > >  .code64
> > > > >         /*
> > > > > -        * Hooray, we are in Long 64-bit mode (but still running in low memory)
> > > > > +        * Hooray, we are in Long 64-bit mode
> > > > >          */
> > > > >  SYM_FUNC_START(wakeup_long64)
> > > > >         ANNOTATE_NOENDBR
> > > > > @@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
> > > > >         movq    saved_rsi(%rip), %rsi
> > > > >         movq    saved_rbp(%rip), %rbp
> > > > >
> > > > > -       movq    saved_rip(%rip), %rax
> > > > > +       leaq    .Lresume_point(%rip), %rax
> > > > >         ANNOTATE_RETPOLINE_SAFE
> > > > >         jmp     *%rax
> > > >
> > > > If this is already running on the virtual mapping, this can simply be
> > > > changed to a direct jump.
> > > >
> > >
> > > Indeed, but I couldn't figure out how to do so without making objtool unhappy.
> >
> > I replaced it with a simple "jmp .Lresume_point" and objtool seemed
> > fine with it on a defconfig build.  What error did you see?
>
> arch/x86/kernel/acpi/wakeup_64.o: warning: objtool: wakeup_long64()
> falls through to next function do_suspend_lowlevel()

Note that this is x86_64_defconfig with CONFIG_X86_KERNEL_IBT disabled.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S
  2026-01-09 12:10           ` Ard Biesheuvel
@ 2026-01-09 12:51             ` Brian Gerst
  0 siblings, 0 replies; 54+ messages in thread
From: Brian Gerst @ 2026-01-09 12:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, linux-hardening

On Fri, Jan 9, 2026 at 7:10 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 9 Jan 2026 at 13:09, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Fri, 9 Jan 2026 at 12:46, Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > On Fri, Jan 9, 2026 at 2:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Fri, 9 Jan 2026 at 06:01, Brian Gerst <brgerst@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jan 8, 2026 at 4:28 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > >
> > > > > > Use ordinary RIP-relative references to make the code compatible with
> > > > > > running the linker in PIE mode.
> > > > > >
> > > > > > Note that wakeup_long64() runs in the kernel's ordinary virtual mapping
> > > > > > so there is no need to record the address of .Lresume_point in a global
> > > > > > variable. And fix the comment while at it.
> > > > > >
> > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > ---
> > > > > >  arch/x86/kernel/acpi/wakeup_64.S | 11 ++++-------
> > > > > >  1 file changed, 4 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> > > > > > index 04f561f75e99..15233a4e1c95 100644
> > > > > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > > > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > > > > @@ -14,7 +14,7 @@
> > > > > >
> > > > > >  .code64
> > > > > >         /*
> > > > > > -        * Hooray, we are in Long 64-bit mode (but still running in low memory)
> > > > > > +        * Hooray, we are in Long 64-bit mode
> > > > > >          */
> > > > > >  SYM_FUNC_START(wakeup_long64)
> > > > > >         ANNOTATE_NOENDBR
> > > > > > @@ -41,7 +41,7 @@ SYM_FUNC_START(wakeup_long64)
> > > > > >         movq    saved_rsi(%rip), %rsi
> > > > > >         movq    saved_rbp(%rip), %rbp
> > > > > >
> > > > > > -       movq    saved_rip(%rip), %rax
> > > > > > +       leaq    .Lresume_point(%rip), %rax
> > > > > >         ANNOTATE_RETPOLINE_SAFE
> > > > > >         jmp     *%rax
> > > > >
> > > > > If this is already running on the virtual mapping, this can simply be
> > > > > changed to a direct jump.
> > > > >
> > > >
> > > > Indeed, but I couldn't figure out how to do so without making objtool unhappy.
> > >
> > > I replaced it with a simple "jmp .Lresume_point" and objtool seemed
> > > fine with it on a defconfig build.  What error did you see?
> >
> > arch/x86/kernel/acpi/wakeup_64.o: warning: objtool: wakeup_long64()
> > falls through to next function do_suspend_lowlevel()
>
> Note that this is x86_64_defconfig with CONFIG_X86_KERNEL_IBT disabled.

I do see the error now.  I had missed it when building the .o file and
thought it would show when linking vmlinux.

I was able to make objtool happy by embedding wakeup_long64() into
do_suspend_lowlevel() and using SYM_INNER_LABEL_ALIGN().

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel
  2026-01-08  9:25 ` [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel Ard Biesheuvel
@ 2026-01-09 21:34   ` Jan Engelhardt
  2026-01-09 22:07     ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Engelhardt @ 2026-01-09 21:34 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-kernel, x86, linux-hardening


On Thursday 2026-01-08 10:25, Ard Biesheuvel wrote:
>
>As an intermediate step towards enabling PIE linking for the x86_64
>KASLR kernel, enable PIE codegen for all C and Rust objects that are
>linked into the kernel proper. Add a Kconfig option RELOCATABLE_PIE for
>this, depending on RELR support in the linker, as the relocation tables
>will blow up the kernel image otherwise.


> KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch -fno-jump-tables)
>...
>+
>+        pie-ccflags-$(CONFIG_CC_IS_GCC) += $(call cc-option.-mdirect-extern-access)

I think that dot there next to -mdirect-extern-accesss should have been 
a comma.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel
  2026-01-09 21:34   ` Jan Engelhardt
@ 2026-01-09 22:07     ` Ard Biesheuvel
  0 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-09 22:07 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel, x86, linux-hardening

On Fri, 9 Jan 2026 at 22:34, Jan Engelhardt <ej@inai.de> wrote:
>
>
> On Thursday 2026-01-08 10:25, Ard Biesheuvel wrote:
> >
> >As an intermediate step towards enabling PIE linking for the x86_64
> >KASLR kernel, enable PIE codegen for all C and Rust objects that are
> >linked into the kernel proper. Add a Kconfig option RELOCATABLE_PIE for
> >this, depending on RELR support in the linker, as the relocation tables
> >will blow up the kernel image otherwise.
>
>
> > KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch -fno-jump-tables)
> >...
> >+
> >+        pie-ccflags-$(CONFIG_CC_IS_GCC) += $(call cc-option.-mdirect-extern-access)
>
> I think that dot there next to -mdirect-extern-accesss should have been
> a comma.
>

Indeed, thanks for spotting that!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable
  2026-01-08  9:25 ` [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable Ard Biesheuvel
@ 2026-01-12  4:01   ` H. Peter Anvin
  2026-01-12 10:47     ` David Laight
  0 siblings, 1 reply; 54+ messages in thread
From: H. Peter Anvin @ 2026-01-12  4:01 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-kernel
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Kees Cook, Uros Bizjak,
	Brian Gerst, linux-hardening

On 2026-01-08 01:25, Ard Biesheuvel wrote:
> On x86_64, the physical placement of the kernel is independent from its
> mapping in the 'High Kernel Mapping' range. This means that even a
> position dependent kernel built without boot-time relocation support can
> run from any suitably aligned physical address, and there is no need to
> make this behavior dependent on whether or not the kernel is virtually
> relocatable.
> 
> On i386, the situation is different, given that the physical and virtual
> load offsets must be equal, and so only a relocatable kernel can be
> loaded at a physical address that deviates from its build-time default.
> 
> Clarify this in Kconfig and in the code, and advertise the 64-bit
> bzImage as loadable at any physical offset regardless of whether
> CONFIG_RELOCATABLE is set. In practice, this makes little difference,
> given that it defaults to 'y' and is a prerequisite for EFI_STUB and
> RANDOMIZE_BASE, but it will help with some future refactoring of the
> relocation code.
> 

I don't see any reason to support non-relocatable kernels anymore. In fact, in
a patchset I am working on I have already removed it.

	-hpa


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable
  2026-01-12  4:01   ` H. Peter Anvin
@ 2026-01-12 10:47     ` David Laight
  2026-01-12 12:06       ` H. Peter Anvin
  0 siblings, 1 reply; 54+ messages in thread
From: David Laight @ 2026-01-12 10:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ard Biesheuvel, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

On Sun, 11 Jan 2026 20:01:02 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 2026-01-08 01:25, Ard Biesheuvel wrote:
> > On x86_64, the physical placement of the kernel is independent from its
> > mapping in the 'High Kernel Mapping' range. This means that even a
> > position dependent kernel built without boot-time relocation support can
> > run from any suitably aligned physical address, and there is no need to
> > make this behavior dependent on whether or not the kernel is virtually
> > relocatable.
> > 
> > On i386, the situation is different, given that the physical and virtual
> > load offsets must be equal, and so only a relocatable kernel can be
> > loaded at a physical address that deviates from its build-time default.
> > 
> > Clarify this in Kconfig and in the code, and advertise the 64-bit
> > bzImage as loadable at any physical offset regardless of whether
> > CONFIG_RELOCATABLE is set. In practice, this makes little difference,
> > given that it defaults to 'y' and is a prerequisite for EFI_STUB and
> > RANDOMIZE_BASE, but it will help with some future refactoring of the
> > relocation code.
> >   
> 
> I don't see any reason to support non-relocatable kernels anymore. In fact, in
> a patchset I am working on I have already removed it.

For just 64bit, or 32bit as well?
The 'bloat' for 32bit will be higher due to the lack of pc-relative
addressing.

	David

> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable
  2026-01-12 10:47     ` David Laight
@ 2026-01-12 12:06       ` H. Peter Anvin
  0 siblings, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2026-01-12 12:06 UTC (permalink / raw)
  To: David Laight
  Cc: Ard Biesheuvel, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

On January 12, 2026 2:47:46 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Sun, 11 Jan 2026 20:01:02 -0800
>"H. Peter Anvin" <hpa@zytor.com> wrote:
>
>> On 2026-01-08 01:25, Ard Biesheuvel wrote:
>> > On x86_64, the physical placement of the kernel is independent from its
>> > mapping in the 'High Kernel Mapping' range. This means that even a
>> > position dependent kernel built without boot-time relocation support can
>> > run from any suitably aligned physical address, and there is no need to
>> > make this behavior dependent on whether or not the kernel is virtually
>> > relocatable.
>> > 
>> > On i386, the situation is different, given that the physical and virtual
>> > load offsets must be equal, and so only a relocatable kernel can be
>> > loaded at a physical address that deviates from its build-time default.
>> > 
>> > Clarify this in Kconfig and in the code, and advertise the 64-bit
>> > bzImage as loadable at any physical offset regardless of whether
>> > CONFIG_RELOCATABLE is set. In practice, this makes little difference,
>> > given that it defaults to 'y' and is a prerequisite for EFI_STUB and
>> > RANDOMIZE_BASE, but it will help with some future refactoring of the
>> > relocation code.
>> >   
>> 
>> I don't see any reason to support non-relocatable kernels anymore. In fact, in
>> a patchset I am working on I have already removed it.
>
>For just 64bit, or 32bit as well?
>The 'bloat' for 32bit will be higher due to the lack of pc-relative
>addressing.
>
>	David
>
>> 
>> 	-hpa
>> 
>> 
>

Either. The bloat is strictly boot time.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
  2026-01-09  9:21   ` Ard Biesheuvel
@ 2026-01-14 18:16     ` Kees Cook
  2026-01-20 20:45       ` H. Peter Anvin
  0 siblings, 1 reply; 54+ messages in thread
From: Kees Cook @ 2026-01-14 18:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: H. Peter Anvin, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Josh Poimboeuf, Peter Zijlstra,
	Uros Bizjak, Brian Gerst, linux-hardening

On Fri, Jan 09, 2026 at 10:21:49AM +0100, Ard Biesheuvel wrote:
> On Fri, 9 Jan 2026 at 01:37, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > On 2026-01-08 01:25, Ard Biesheuvel wrote:
> > > This series is a follow-up to a series I sent a bit more than a year
> > > ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
> > > for further hardening measures, such as fg-kaslr [1], as well as further
> > > harmonization of the boot protocols between architectures [2].
> >
> > Kristin Accardi had fg-kasrl running without that, didn't she?

I understand "such as fg-kaslr" to have been just a terse way of saying
"such as a complete multi-architectural fg-kaslr"

> Yes, as a proof of concept. But it is tied to the x86 approach of
> performing runtime relocations based on build time relocation data,
> which is problematic now that linkers have started to perform
> relaxations, as these cannot always be translated 1:1. For instance,
> we already have a latent bug in the x86 relocs tool, which ignores
> GOTPCREL relocations on the basis that the relocation is relative.
> However, this is only true for Clang/lld, which does not update the
> static relocation tables after performing relaxations. ld.bfd does
> attempt to keep those tables in sync, and so a GOTPCREL relocation
> should be flagged as a bug when encountered, because it means there is
> a GOT slot somewhere with no relocation associated with it.

Another historical bit of context is that one of the main reasons
Kristen's fg-kaslr got stuck was the linker support needed for (the 65k
worth of) section pass-through. That never got resolved, and the solutions
either required huge linker files (that tickled performance flaws in the
linkers) that resulted in 10 minute linking times, or to disable all the
orphan section handling, which was a regression in our sanity checking
and bug-finding.

So, getting a well-behaved fg-kaslr still needs toolchain support,
and getting there is going to need further design work. As far as PIE,
this just makes the fg-kaslr toolchain work easier (fewer special cases),
along with all the other benefits of moving to PIE.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing
  2026-01-08  9:25 ` [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
@ 2026-01-20 17:04   ` Sean Christopherson
  2026-01-20 19:43     ` David Laight
  0 siblings, 1 reply; 54+ messages in thread
From: Sean Christopherson @ 2026-01-20 17:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra,
	Kees Cook, Uros Bizjak, Brian Gerst, linux-hardening

On Thu, Jan 08, 2026, Ard Biesheuvel wrote:
> Replace absolute references in inline asm with RIP-relative ones, to
> avoid the need for relocation fixups at boot time. This is a
> prerequisite for PIE linking, which only permits 64-bit wide
> loader-visible absolute references.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/kvm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index df78ddee0abb..1a0335f328e1 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -807,8 +807,9 @@ extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
>   * restoring to/from the stack.
>   */
>  #define PV_VCPU_PREEMPTED_ASM						     \
> - "movq   __per_cpu_offset(,%rdi,8), %rax\n\t"				     \
> - "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
> + "0:leaq 0b(%rip), %rax\n\t"						     \

Please use something other than '0' for the label, it took me forever (and looking
at disassembly) to realize "0b" was just a backwards label and not some fancy
syntax I didn't know.

It might also be worth calling out in the changelog that this function is called
across CPUs, e.g. from kvm_smp_send_call_func_ipi(), and thus can't use gs:
or any other "normal" method for accessing per-CPU data.

> + "addq   __per_cpu_offset - 0b(%rax,%rdi,8), %rax\n\t"			     \
> + "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time-0b(%rax)\n\t" \
>   "setne  %al\n\t"
>  
>  DEFINE_ASM_FUNC(__raw_callee_save___kvm_vcpu_is_preempted,
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing
  2026-01-20 17:04   ` Sean Christopherson
@ 2026-01-20 19:43     ` David Laight
  2026-01-20 20:54       ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: David Laight @ 2026-01-20 19:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ard Biesheuvel, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
	Peter Zijlstra, Kees Cook, Uros Bizjak, Brian Gerst,
	linux-hardening

On Tue, 20 Jan 2026 09:04:26 -0800
Sean Christopherson <seanjc@google.com> wrote:

> On Thu, Jan 08, 2026, Ard Biesheuvel wrote:
> > Replace absolute references in inline asm with RIP-relative ones, to
> > avoid the need for relocation fixups at boot time. This is a
> > prerequisite for PIE linking, which only permits 64-bit wide
> > loader-visible absolute references.
> > 
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/kvm.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index df78ddee0abb..1a0335f328e1 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -807,8 +807,9 @@ extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
> >   * restoring to/from the stack.
> >   */
> >  #define PV_VCPU_PREEMPTED_ASM						     \
> > - "movq   __per_cpu_offset(,%rdi,8), %rax\n\t"				     \
> > - "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
> > + "0:leaq 0b(%rip), %rax\n\t"						     \  
> 
> Please use something other than '0' for the label, it took me forever (and looking
> at disassembly) to realize "0b" was just a backwards label and not some fancy
> syntax I didn't know.

I remember taking a while to grok that as well.

Can't you just use . as in:
	leaq	.(%rip), %rax

shame the assembler doesn't understand:
	movq	%rip, %rax
(maybe it does...)

	David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
  2026-01-14 18:16     ` Kees Cook
@ 2026-01-20 20:45       ` H. Peter Anvin
  2026-01-21  8:56         ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: H. Peter Anvin @ 2026-01-20 20:45 UTC (permalink / raw)
  To: Kees Cook, Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Josh Poimboeuf, Peter Zijlstra, Uros Bizjak,
	Brian Gerst, linux-hardening

On 2026-01-14 10:16, Kees Cook wrote:
> On Fri, Jan 09, 2026 at 10:21:49AM +0100, Ard Biesheuvel wrote:
>> On Fri, 9 Jan 2026 at 01:37, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> On 2026-01-08 01:25, Ard Biesheuvel wrote:
>>>> This series is a follow-up to a series I sent a bit more than a year
>>>> ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
>>>> for further hardening measures, such as fg-kaslr [1], as well as further
>>>> harmonization of the boot protocols between architectures [2].
>>>
>>> Kristin Accardi had fg-kasrl running without that, didn't she?
> 
> I understand "such as fg-kaslr" to have been just a terse way of saying
> "such as a complete multi-architectural fg-kaslr"
> 
>> Yes, as a proof of concept. But it is tied to the x86 approach of
>> performing runtime relocations based on build time relocation data,
>> which is problematic now that linkers have started to perform
>> relaxations, as these cannot always be translated 1:1. For instance,
>> we already have a latent bug in the x86 relocs tool, which ignores
>> GOTPCREL relocations on the basis that the relocation is relative.
>> However, this is only true for Clang/lld, which does not update the
>> static relocation tables after performing relaxations. ld.bfd does
>> attempt to keep those tables in sync, and so a GOTPCREL relocation
>> should be flagged as a bug when encountered, because it means there is
>> a GOT slot somewhere with no relocation associated with it.
> 
> Another historical bit of context is that one of the main reasons
> Kristen's fg-kaslr got stuck was the linker support needed for (the 65k
> worth of) section pass-through. That never got resolved, and the solutions
> either required huge linker files (that tickled performance flaws in the
> linkers) that resulted in 10 minute linking times, or to disable all the
> orphan section handling, which was a regression in our sanity checking
> and bug-finding.
> 
> So, getting a well-behaved fg-kaslr still needs toolchain support,
> and getting there is going to need further design work. As far as PIE,
> this just makes the fg-kaslr toolchain work easier (fewer special cases),
> along with all the other benefits of moving to PIE.
> 

As I *explicitly* stated earlier, there isn't anything inherently wrong with
putting a small onus on x86 in order to make the general Linux code better --
but please, be honest about it *so we know what the actual tradeoffs are*.

For x86, we really do want to maintain the kernel memory model, which allows
us to directly reference symbols in complex address expressions and to
directly jump across modules. This means the "PIE" will need to be different
from the way PIE works in user space, which is in part designed to avoid
needing to dirty readonly pages, which would inhibit sharing -- which is
explicitly NOT a concern for the kernel.

So that is one thing that the toolchain needs to be able to do.

I fully expect that we will continue to need to have some kinds of overrides
for specific symbols, too, because there aren't any really sane ways to
express them to the toolchain; this especially applies to linker-script and
some assembly symbols. For example, the real-mode code (which uses the reloc
tool as well) has to support segment and segbase-relative relocations, which
are something that ELF simply has no concept of.

I have a lot more of an issue with trying to change the x86 boot protocol,
simply because the way booting works in x86 has been incredibly successful;
yes, the bzImage file format is ugly as ****, but that is a direct result of
34 years of continuous backwards compatibility. One of the reasons we have
been able to do that is that we have *explicitly* rejected other boot models,
such as Grub's self-declared Multiboot "standard" (which they have had to
revise multiple times by now) and the early Xen boot model of booting vmlinux
directly. We have added *many* capabilities to bzImage as needed, and it has
turned out to be quite flexible in the end.

That, in turn, has been possible exactly *because* the Linux kernel provides a
"prekernel". I don't even really like calling it the "decompressor" anymore;
it really has developed far beyond that.

Every time you introduce a new boot model you take a serious risk that a boot
loader author will say "hey, I'll just support this new model", and you *also*
take the serious risk that the boot model isn't adequate. When the Grub
authors says "we started using the 'modern' Linux entry point" -- meaning they
would bypass the entry stub and call the kernel entry point directly -- they
*reduced* the overall functionality, because history has shown that:

a. The kernel image is far easier for the end user to update than the
potentially many bootloaders, because the bootloader depends on the deployment
model (consider network booting, for example, or when the bootloader is in
firmware.)

b. The kernel image provides ONE CENTRAL PLACE to add features, fix bugs, and
develop workarounds for strange systems.

Thus, I will do anything I can to continue to veto changes to the x86 boot
model, unless they come with a VERY VERY good motivation.

We originally made the mistake with EFI to leave too much to the bootloader;
because of the downsides above we really needed to backpedal and take over
control much earlier in the flow, just as with BIOS -- apparently to the Grub
developers' utterly inexplicable chagrin.

	-hpa


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing
  2026-01-20 19:43     ` David Laight
@ 2026-01-20 20:54       ` Ard Biesheuvel
  2026-01-20 22:00         ` David Laight
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-20 20:54 UTC (permalink / raw)
  To: David Laight
  Cc: Sean Christopherson, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Josh Poimboeuf, Peter Zijlstra, Kees Cook, Uros Bizjak,
	Brian Gerst, linux-hardening

On Tue, 20 Jan 2026 at 20:44, David Laight <david.laight.linux@gmail.com> wrote:
>
> On Tue, 20 Jan 2026 09:04:26 -0800
> Sean Christopherson <seanjc@google.com> wrote:
>
> > On Thu, Jan 08, 2026, Ard Biesheuvel wrote:
> > > Replace absolute references in inline asm with RIP-relative ones, to
> > > avoid the need for relocation fixups at boot time. This is a
> > > prerequisite for PIE linking, which only permits 64-bit wide
> > > loader-visible absolute references.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/x86/kernel/kvm.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index df78ddee0abb..1a0335f328e1 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -807,8 +807,9 @@ extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
> > >   * restoring to/from the stack.
> > >   */
> > >  #define PV_VCPU_PREEMPTED_ASM                                                   \
> > > - "movq   __per_cpu_offset(,%rdi,8), %rax\n\t"                                   \
> > > - "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
> > > + "0:leaq 0b(%rip), %rax\n\t"                                                    \
> >
> > Please use something other than '0' for the label, it took me forever (and looking
> > at disassembly) to realize "0b" was just a backwards label and not some fancy
> > syntax I didn't know.
>
> I remember taking a while to grok that as well.
>
> Can't you just use . as in:
>         leaq    .(%rip), %rax
>

How would the other two instructions referring to '0b' in their
immediates refer to '.' in that case?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing
  2026-01-20 20:54       ` Ard Biesheuvel
@ 2026-01-20 22:00         ` David Laight
  0 siblings, 0 replies; 54+ messages in thread
From: David Laight @ 2026-01-20 22:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Sean Christopherson, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Josh Poimboeuf, Peter Zijlstra, Kees Cook, Uros Bizjak,
	Brian Gerst, linux-hardening

On Tue, 20 Jan 2026 21:54:30 +0100
Ard Biesheuvel <ardb@kernel.org> wrote:

> On Tue, 20 Jan 2026 at 20:44, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > On Tue, 20 Jan 2026 09:04:26 -0800
> > Sean Christopherson <seanjc@google.com> wrote:
> >  
> > > On Thu, Jan 08, 2026, Ard Biesheuvel wrote:  
> > > > Replace absolute references in inline asm with RIP-relative ones, to
> > > > avoid the need for relocation fixups at boot time. This is a
> > > > prerequisite for PIE linking, which only permits 64-bit wide
> > > > loader-visible absolute references.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  arch/x86/kernel/kvm.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > > index df78ddee0abb..1a0335f328e1 100644
> > > > --- a/arch/x86/kernel/kvm.c
> > > > +++ b/arch/x86/kernel/kvm.c
> > > > @@ -807,8 +807,9 @@ extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
> > > >   * restoring to/from the stack.
> > > >   */
> > > >  #define PV_VCPU_PREEMPTED_ASM                                                   \
> > > > - "movq   __per_cpu_offset(,%rdi,8), %rax\n\t"                                   \
> > > > - "cmpb   $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax)\n\t" \
> > > > + "0:leaq 0b(%rip), %rax\n\t"                                                    \  
> > >
> > > Please use something other than '0' for the label, it took me forever (and looking
> > > at disassembly) to realize "0b" was just a backwards label and not some fancy
> > > syntax I didn't know.  
> >
> > I remember taking a while to grok that as well.
> >
> > Can't you just use . as in:
> >         leaq    .(%rip), %rax
> >  
> 
> How would the other two instructions referring to '0b' in their
> immediates refer to '.' in that case?

I'd forgotten about those, not in the quoted bit of patch :-(

Traditionally (going back to MACRO-11) numeric labels would start at 10
and go up in 10s, TECO had a nice macro to renumber them.
(Showing my age again)

	David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE
  2026-01-20 20:45       ` H. Peter Anvin
@ 2026-01-21  8:56         ` Ard Biesheuvel
  0 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-21  8:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kees Cook, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Josh Poimboeuf, Peter Zijlstra,
	Uros Bizjak, Brian Gerst, linux-hardening

On Tue, 20 Jan 2026 at 21:46, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2026-01-14 10:16, Kees Cook wrote:
> > On Fri, Jan 09, 2026 at 10:21:49AM +0100, Ard Biesheuvel wrote:
> >> On Fri, 9 Jan 2026 at 01:37, H. Peter Anvin <hpa@zytor.com> wrote:
> >>>
> >>> On 2026-01-08 01:25, Ard Biesheuvel wrote:
> >>>> This series is a follow-up to a series I sent a bit more than a year
> >>>> ago, to switch to PIE linking of x86_64 vmlinux, which is a prerequisite
> >>>> for further hardening measures, such as fg-kaslr [1], as well as further
> >>>> harmonization of the boot protocols between architectures [2].
> >>>
> >>> Kristin Accardi had fg-kasrl running without that, didn't she?
> >
> > I understand "such as fg-kaslr" to have been just a terse way of saying
> > "such as a complete multi-architectural fg-kaslr"
> >
> >> Yes, as a proof of concept. But it is tied to the x86 approach of
> >> performing runtime relocations based on build time relocation data,
> >> which is problematic now that linkers have started to perform
> >> relaxations, as these cannot always be translated 1:1. For instance,
> >> we already have a latent bug in the x86 relocs tool, which ignores
> >> GOTPCREL relocations on the basis that the relocation is relative.
> >> However, this is only true for Clang/lld, which does not update the
> >> static relocation tables after performing relaxations. ld.bfd does
> >> attempt to keep those tables in sync, and so a GOTPCREL relocation
> >> should be flagged as a bug when encountered, because it means there is
> >> a GOT slot somewhere with no relocation associated with it.
> >
> > Another historical bit of context is that one of the main reasons
> > Kristen's fg-kaslr got stuck was the linker support needed for (the 65k
> > worth of) section pass-through. That never got resolved, and the solutions
> > either required huge linker files (that tickled performance flaws in the
> > linkers) that resulted in 10 minute linking times, or to disable all the
> > orphan section handling, which was a regression in our sanity checking
> > and bug-finding.
> >
> > So, getting a well-behaved fg-kaslr still needs toolchain support,
> > and getting there is going to need further design work. As far as PIE,
> > this just makes the fg-kaslr toolchain work easier (fewer special cases),
> > along with all the other benefits of moving to PIE.
> >
>
> As I *explicitly* stated earlier, there isn't anything inherently wrong with
> putting a small onus on x86 in order to make the general Linux code better --
> but please, be honest about it *so we know what the actual tradeoffs are*.
>

It is not just about the general Linux code. The x86 fgkaslr
implementation was never merged because the toolchain side needs
changes. And convincing the toolchain maintainers to take our changes
is difficult if we keep using relocation tables that are not fit for
purpose to perform runtime fixups on code that was built using the
'kernel' code model, which is explicitly position dependent.

> For x86, we really do want to maintain the kernel memory model, which allows
> us to directly reference symbols in complex address expressions and to
> directly jump across modules.

AIUI, those complex address expressions are mostly indexed loads from
global arrays, which do get slightly less efficient, but not in a way
that was noticeable in any benchmarking I did (or LKP for that matter,
which generally sniffs out any performance regressions). This includes
jump tables, but as I already explained, RIP-relative jump tables have
an upside too, given that the table itself is only half the size.

The ability to directly jump across modules is not affected at all by
these changes.

> This means the "PIE" will need to be different
> from the way PIE works in user space, which is in part designed to avoid
> needing to dirty readonly pages, which would inhibit sharing -- which is
> explicitly NOT a concern for the kernel.
>

This is already implemented in this series: no GOT entries are
permitted, and text relocations are allowed.

> So that is one thing that the toolchain needs to be able to do.
>

It already can, and this series makes use of it.

Note that the size of the relocation table taken from an allmodconfig
bzImage drops from 7.3 M to 2.4 M (defconfig goes from 800k to 45k),
so there is a minor intrinsic benefit to these changes as well. But it
is mostly about moving away from bespoke tooling and formats that are
becoming more of a maintenance burden as the number of supported
toolchains and languages increases.

> I fully expect that we will continue to need to have some kinds of overrides
> for specific symbols, too, because there aren't any really sane ways to
> express them to the toolchain; this especially applies to linker-script and
> some assembly symbols. For example, the real-mode code (which uses the reloc
> tool as well) has to support segment and segbase-relative relocations, which
> are something that ELF simply has no concept of.
>

The real mode trampoline is not affected at all by these changes,
given that it is built as a separate executable. Using a bespoke
relocation format there is fine, because it is internal ABI.

> I have a lot more of an issue with trying to change the x86 boot protocol,
> simply because the way booting works in x86 has been incredibly successful;
> yes, the bzImage file format is ugly as ****, but that is a direct result of
> 34 years of continuous backwards compatibility. One of the reasons we have
> been able to do that is that we have *explicitly* rejected other boot models,
> such as Grub's self-declared Multiboot "standard" (which they have had to
> revise multiple times by now) and the early Xen boot model of booting vmlinux
> directly. We have added *many* capabilities to bzImage as needed, and it has
> turned out to be quite flexible in the end.
>
> That, in turn, has been possible exactly *because* the Linux kernel provides a
> "prekernel". I don't even really like calling it the "decompressor" anymore;
> it really has developed far beyond that.
>

The decompressor is needed when booting the 64-bit kernel from a boot
loader that calls it in 32-bit mode.

When entering in long mode, with all memory mapped 1:1 (or at least,
the kernel image itself, and all assets in memory that the bootloader
exposes to the kernel), the decompressor does nothing useful, and all
the problems it solves (by doing demand paging etc) only exist because
it created them in the first place.

SEV-SNP confidential compute made an even bigger mess of this, because
it can trigger #VC exceptions too, which also need to be handled.

Note that the EFI stub does not bother with the decompressor anymore,
and unpacks and boots vmlinux directly. This was needed because the
decompressor fundamentally relies on memory that is both writable and
executable (as it moves its own executable image around in memory),
which is difficult to reconcile with recent PC firmware
implementations that are pedantic about mapping memory RWX.

But actually, I am not proposing to get rid of bzImage. I am proposing
to make it more transparent so generic bootloader components can be
constructed that consume the ELF directly.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-08  9:25 ` [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section Ard Biesheuvel
@ 2026-01-22 13:08   ` Borislav Petkov
  2026-01-22 13:48     ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2026-01-22 13:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> Currently, idt_table is allocated as page-aligned .bss, and remapped
> read-only after init. This breaks a 2 MiB large page into 4k page
> mappings, which defeats some of the effort done at boot to map the
> kernel image using large pages, for improved TLB efficiency.
> 
> Mark this allocation as __ro_after_init instead, so it will be made
> read-only automatically after boot, without breaking up large page
> mappings.
> 
> This also fixes a latent bug on i386, where the size of idt_table is
> less than a page, and so remapping it read-only could potentially affect
> other read-write variables too, if those are not page-aligned as well.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/idt.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index f445bec516a0..d6da25d7964f 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
>  };
>  
>  /* Must be page-aligned because the real IDT is used in the cpu entry area */
> -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
>  
>  static struct desc_ptr idt_descr __ro_after_init = {
>  	.size		= IDT_TABLE_SIZE - 1,
> @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
>  	idt_map_in_cea();
>  	load_idt(&idt_descr);
>  
> -	/* Make the IDT table read only */
> -	set_memory_ro((unsigned long)&idt_table, 1);
> -
>  	idt_setup_done = true;
>  }

Good idea, except my guest shows me something else:

before:

[    0.186281] IDT table: 0xffffffff89c7f000

0xffffffff89c00000-0xffffffff89c7f000         508K     RW                 GLB NX pte
0xffffffff89c7f000-0xffffffff89c80000           4K     ro                 GLB NX pte
0xffffffff89c80000-0xffffffff89e00000        1536K     RW                 GLB NX pte
0xffffffff89e00000-0xffffffff8be00000          32M     RW         PSE     GLB NX pmd

This is clearly a single, 4K RO pageframe right in the middle of a splintered
2M page.

after:

[    0.180635] IDT table: 0xffffffff822cf000

0xffffffff81e00000-0xffffffff82200000           4M     ro         PSE     GLB NX pmd
0xffffffff82200000-0xffffffff8236f000        1468K     ro                 GLB NX pte
0xffffffff8236f000-0xffffffff82400000         580K     RW                 GLB NX pte
0xffffffff82400000-0xffffffff89800000         116M     RW         PSE     GLB NX pmd

but after applying your patch it looks like it still broke the 2M mapping as
the remaining piece is RW.

If I do this:

static gate_desc idt_table[IDT_ENTRIES] __aligned(PMD_SIZE) __ro_after_init;

it still doesn't help:

[    0.197808] IDT table: 0xffffffff82800000

0xffffffff81e00000-0xffffffff82800000          10M     ro         PSE     GLB NX pmd
0xffffffff82800000-0xffffffff828a0000         640K     ro                 GLB NX pte
0xffffffff828a0000-0xffffffff82a00000        1408K     RW                 GLB NX pte
0xffffffff82a00000-0xffffffff89e00000         116M     RW         PSE     GLB NX pmd

because that trailing piece of the 2M page is still RW.

And who knows what else am I breaking when doing this:

[    2.368601] ------------[ cut here ]------------
[    2.389816] [CRTC:35:crtc-0] vblank wait timed out
[    2.396676] WARNING: drivers/gpu/drm/drm_atomic_helper.c:1920 at drm_atomic_helper_wait_for_vblanks.part.0+0x1ba/0x1e0, CPU#1: kworker/1:0/57
[    2.406715] Modules linked in:
[    2.408462] CPU: 1 UID: 0 PID: 57 Comm: kworker/1:0 Not tainted 6.19.0-rc6+ #4 PREEMPT(full)
...

I don't know, sacrificing a 2M page just for the idt_table and so that it
doesn't get splintered, not sure it is worth it.

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-22 13:08   ` Borislav Petkov
@ 2026-01-22 13:48     ` Ard Biesheuvel
  2026-01-22 13:58       ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-22 13:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, 22 Jan 2026 at 14:09, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> > Currently, idt_table is allocated as page-aligned .bss, and remapped
> > read-only after init. This breaks a 2 MiB large page into 4k page
> > mappings, which defeats some of the effort done at boot to map the
> > kernel image using large pages, for improved TLB efficiency.
> >
> > Mark this allocation as __ro_after_init instead, so it will be made
> > read-only automatically after boot, without breaking up large page
> > mappings.
> >
> > This also fixes a latent bug on i386, where the size of idt_table is
> > less than a page, and so remapping it read-only could potentially affect
> > other read-write variables too, if those are not page-aligned as well.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/idt.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> > index f445bec516a0..d6da25d7964f 100644
> > --- a/arch/x86/kernel/idt.c
> > +++ b/arch/x86/kernel/idt.c
> > @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
> >  };
> >
> >  /* Must be page-aligned because the real IDT is used in the cpu entry area */
> > -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> > +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
> >
> >  static struct desc_ptr idt_descr __ro_after_init = {
> >       .size           = IDT_TABLE_SIZE - 1,
> > @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
> >       idt_map_in_cea();
> >       load_idt(&idt_descr);
> >
> > -     /* Make the IDT table read only */
> > -     set_memory_ro((unsigned long)&idt_table, 1);
> > -
> >       idt_setup_done = true;
> >  }
>
> Good idea, except my guest shows me something else:
>
> before:
>
> [    0.186281] IDT table: 0xffffffff89c7f000
>
> 0xffffffff89c00000-0xffffffff89c7f000         508K     RW                 GLB NX pte
> 0xffffffff89c7f000-0xffffffff89c80000           4K     ro                 GLB NX pte
> 0xffffffff89c80000-0xffffffff89e00000        1536K     RW                 GLB NX pte
> 0xffffffff89e00000-0xffffffff8be00000          32M     RW         PSE     GLB NX pmd
>
> This is clearly a single, 4K RO pageframe right in the middle of a splintered
> 2M page.
>
> after:
>
> [    0.180635] IDT table: 0xffffffff822cf000
>
> 0xffffffff81e00000-0xffffffff82200000           4M     ro         PSE     GLB NX pmd
> 0xffffffff82200000-0xffffffff8236f000        1468K     ro                 GLB NX pte
> 0xffffffff8236f000-0xffffffff82400000         580K     RW                 GLB NX pte
> 0xffffffff82400000-0xffffffff89800000         116M     RW         PSE     GLB NX pmd
>
> but after applying your patch it looks like it still broke the 2M mapping as
> the remaining piece is RW.
>

That is because the init region is between .data and .bss, which are
not 2M aligned, and so freeing/remapping the init region breaks the
hugepage mapping of the region between them. This is addressed in
patch #3.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-22 13:48     ` Ard Biesheuvel
@ 2026-01-22 13:58       ` Borislav Petkov
  2026-01-22 14:09         ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2026-01-22 13:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, Jan 22, 2026 at 02:48:14PM +0100, Ard Biesheuvel wrote:
> That is because the init region is between .data and .bss, which are
> not 2M aligned, and so freeing/remapping the init region breaks the
> hugepage mapping of the region between them. This is addressed in
> patch #3.

Ok, I'll continue looking but this commit message should not mislead in
stating that it prevents the "breaking up large page mappings." I'll fix it up
if I end up picking up this or you fix it up in your next revision please.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-22 13:58       ` Borislav Petkov
@ 2026-01-22 14:09         ` Ard Biesheuvel
  2026-01-22 14:16           ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-22 14:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, 22 Jan 2026 at 14:58, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 22, 2026 at 02:48:14PM +0100, Ard Biesheuvel wrote:
> > That is because the init region is between .data and .bss, which are
> > not 2M aligned, and so freeing/remapping the init region breaks the
> > hugepage mapping of the region between them. This is addressed in
> > patch #3.
>
> Ok, I'll continue looking but this commit message should not mislead in
> stating that it prevents the "breaking up large page mappings."

It does. It just doesn't prevent if from happening for other reasons.

> I'll fix it up
> if I end up picking up this or you fix it up in your next revision please.
>

Ack.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-22 14:09         ` Ard Biesheuvel
@ 2026-01-22 14:16           ` Borislav Petkov
  2026-01-22 14:20             ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2026-01-22 14:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, Jan 22, 2026 at 03:09:19PM +0100, Ard Biesheuvel wrote:
> It does. It just doesn't prevent if from happening for other reasons.

So this reads to me like "it does but it doesn't". Huh?!?

C'mon Ard, let's be more precise pls.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-22 14:16           ` Borislav Petkov
@ 2026-01-22 14:20             ` Ard Biesheuvel
  2026-01-22 14:25               ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Ard Biesheuvel @ 2026-01-22 14:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, 22 Jan 2026 at 15:16, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 22, 2026 at 03:09:19PM +0100, Ard Biesheuvel wrote:
> > It does. It just doesn't prevent if from happening for other reasons.
>
> So this reads to me like "it does but it doesn't". Huh?!?
>
> C'mon Ard, let's be more precise pls.
>

Fair enough. What I meant to say is that if your BSS placed on a 2M
boundary or is sufficiently larger than that, some of it will be
mapped using PMDs, and this change will prevent those from being
broken up into page mappings. If your BSS is too small to be covered
by huge page mappings in the first place, this change obviously does
nothing (except fixing a bug on i386)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
  2026-01-22 14:20             ` Ard Biesheuvel
@ 2026-01-22 14:25               ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2026-01-22 14:25 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, Jan 22, 2026 at 03:20:39PM +0100, Ard Biesheuvel wrote:
> Fair enough. What I meant to say is that if your BSS placed on a 2M
> boundary or is sufficiently larger than that, some of it will be
> mapped using PMDs, and this change will prevent those from being
> broken up into page mappings. If your BSS is too small to be covered
> by huge page mappings in the first place, this change obviously does
> nothing (except fixing a bug on i386)

That makes more sense. :)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [tip: x86/sev] x86/sev: Don't emit BSS_DECRYPTED section unless it is in use
  2026-01-08  9:25 ` [RFC/RFT PATCH 02/19] x86/sev: Don't emit BSS_DECRYPT section unless it is in use Ard Biesheuvel
@ 2026-01-31 14:09   ` tip-bot2 for Ard Biesheuvel
  0 siblings, 0 replies; 54+ messages in thread
From: tip-bot2 for Ard Biesheuvel @ 2026-01-31 14:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ard Biesheuvel, Borislav Petkov (AMD), x86, linux-kernel

The following commit has been merged into the x86/sev branch of tip:

Commit-ID:     8c89d3ad3095808ac130c535ad7ed3d1344d5986
Gitweb:        https://git.kernel.org/tip/8c89d3ad3095808ac130c535ad7ed3d1344d5986
Author:        Ard Biesheuvel <ardb@kernel.org>
AuthorDate:    Thu, 08 Jan 2026 09:25:29 
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Sat, 31 Jan 2026 14:42:53 +01:00

x86/sev: Don't emit BSS_DECRYPTED section unless it is in use

The BSS_DECRYPTED section that gets emitted into .bss will be empty if
CONFIG_AMD_MEM_ENCRYPT is not defined. However, due to the fact that it
is injected into .bss rather than emitted as a separate section, the
2 MiB alignment that it specifies is still taken into account
unconditionally, pushing .bss out to the next 2 MiB boundary, leaving
a gap that is never freed.

So only emit a non-empty BSS_DECRYPTED section if it is going to be
used.  In that case, it would still be nice to free the padding, but
that is left for later.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260108092526.28586-23-ardb@kernel.org
---
 arch/x86/kernel/vmlinux.lds.S | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index d7af4a6..3a24a3f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -67,7 +67,18 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
+#else
+
+#define X86_ALIGN_RODATA_BEGIN
+#define X86_ALIGN_RODATA_END					\
+		. = ALIGN(PAGE_SIZE);				\
+		__end_rodata_aligned = .;
 
+#define ALIGN_ENTRY_TEXT_BEGIN
+#define ALIGN_ENTRY_TEXT_END
+#endif
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 /*
  * This section contains data which will be mapped as decrypted. Memory
  * encryption operates on a page basis. Make this section PMD-aligned
@@ -88,17 +99,9 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 	__pi___end_bss_decrypted = .;				\
 
 #else
-
-#define X86_ALIGN_RODATA_BEGIN
-#define X86_ALIGN_RODATA_END					\
-		. = ALIGN(PAGE_SIZE);				\
-		__end_rodata_aligned = .;
-
-#define ALIGN_ENTRY_TEXT_BEGIN
-#define ALIGN_ENTRY_TEXT_END
 #define BSS_DECRYPTED
-
 #endif
+
 #if defined(CONFIG_X86_64) && defined(CONFIG_KEXEC_CORE)
 #define KEXEC_RELOCATE_KERNEL					\
 	. = ALIGN(0x100);					\

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping
  2026-01-08  9:25 ` [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping Ard Biesheuvel
@ 2026-03-06 19:07   ` Borislav Petkov
  2026-03-09 14:11     ` Ard Biesheuvel
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2026-03-06 19:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening

On Thu, Jan 08, 2026 at 09:25:30AM +0000, Ard Biesheuvel wrote:
> The primary mapping of the kernel image is made using huge pages where
> possible, mostly to minimize TLB pressure (Only the entry text section
> requires alignment to 2 MiB). This involves some rounding and padding of
> the .text and .rodata sections, resulting in gaps.  These gaps are
> smaller than a huge page, and are remapped using different permissions,
> resulting in fragmentation of the huge page mappings at the edges of
> those regions.
> 
> Similarly, there is a gap between .data and .bss, where the init text
> and data regions reside. This means that the end of the .data region and
> the start of the .bss region are not covered by huge page mappings
> either, even though both regions use the same permissions (RW+NX).
> 
> Improve the situation, by placing .data and .bss adjacently in the
> linker map, and putting the init text and data regions after .rodata,
> taking the place of the rodata/data gap. This results in one fewer gap,
> and a more efficient mapping of the .data and .bss regions.
> 
> To preserve the x86_64 ELF layout with PT_LOAD regions aligned to 2 MiB,
> start the second ELF segment at .init.data and align it to 2 MiB.  The
> resulting padding will be covered by the init region and will be freed
> along with it after boot.
> 
> defconfig + Clang 19:
> 
> Before:
> 
>   0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
>   0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
>   0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
>   0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
>   0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE  GLB NX pmd
>   0xffffffff83400000-0xffffffff83800000     4M  RW       GLB NX pte
> 
> After:
> 
>   0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
>   0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
>   0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
>   0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
>   0xffffffff82c00000-0xffffffff82e00000     2M  RW  PSE  GLB NX pmd
>   0xffffffff82e00000-0xffffffff83000000     2M  RW       GLB NX pte
>   0xffffffff83000000-0xffffffff83800000     8M  RW  PSE  GLB NX pmd
> 
> With the gaps removed/unmapped (pti=on)
> 
> Before:
> 
>   0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
>   0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
>   0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
>   0xffffffff8231c000-0xffffffff82400000   912K                  pte
>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
>   0xffffffff82b40000-0xffffffff82c00000   768K                  pte
>   0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE      NX pmd
>   0xffffffff83400000-0xffffffff8342a000   168K  RW           NX pte
>   0xffffffff8342a000-0xffffffff836f3000  2852K                  pte
>   0xffffffff836f3000-0xffffffff83800000  1076K  RW           NX pte
> 
> After:
> 
>   0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
>   0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
>   0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
>   0xffffffff8231c000-0xffffffff82400000   912K                  pte
>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
>   0xffffffff82b40000-0xffffffff82e3d000  3060K                  pte
>   0xffffffff82e3d000-0xffffffff83000000  1804K  RW           NX pte
>   0xffffffff83000000-0xffffffff83800000     8M  RW  PSE      NX pmd
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/vmlinux.lds.S | 91 +++++++++++---------
>  arch/x86/mm/init_64.c         |  5 +-
>  arch/x86/mm/pat/set_memory.c  |  2 +-
>  3 files changed, 52 insertions(+), 46 deletions(-)

I guess we could do this - I don't see why not... we'll have to take it for
a longer spin tho.

> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 3a24a3fc55f5..1dee2987c42b 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -61,12 +61,15 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
>  #define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
>  
>  #define X86_ALIGN_RODATA_END					\
> -		. = ALIGN(HPAGE_SIZE);				\
> -		__end_rodata_hpage_align = .;			\

$ git grep __end_rodata_hpage_align
arch/x86/include/asm/sections.h:13:extern char __end_rodata_hpage_align[];
arch/x86/tools/relocs.c:93:     "__end_rodata_hpage_align|"

I guess you wanna remove those too and say that that marker is unused. Better
yet do that in a pre-patch.

> -		__end_rodata_aligned = .;
> +		. = ALIGN(PAGE_SIZE);				\
> +		__end_rodata_aligned = ALIGN(HPAGE_SIZE);
>  
>  #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
>  #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
> +
> +#define DATA_SEGMENT_START					\
> +	. = ALIGN(HPAGE_SIZE);					\
> +	__data_segment_start = .;
>  #else
>  
>  #define X86_ALIGN_RODATA_BEGIN

...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping
  2026-03-06 19:07   ` Borislav Petkov
@ 2026-03-09 14:11     ` Ard Biesheuvel
  0 siblings, 0 replies; 54+ messages in thread
From: Ard Biesheuvel @ 2026-03-09 14:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H . Peter Anvin, Josh Poimboeuf, Peter Zijlstra, Kees Cook,
	Uros Bizjak, Brian Gerst, linux-hardening



On Fri, 6 Mar 2026, at 20:07, Borislav Petkov wrote:
> On Thu, Jan 08, 2026 at 09:25:30AM +0000, Ard Biesheuvel wrote:
>> The primary mapping of the kernel image is made using huge pages where
>> possible, mostly to minimize TLB pressure (Only the entry text section
>> requires alignment to 2 MiB). This involves some rounding and padding of
>> the .text and .rodata sections, resulting in gaps.  These gaps are
>> smaller than a huge page, and are remapped using different permissions,
>> resulting in fragmentation of the huge page mappings at the edges of
>> those regions.
>> 
>> Similarly, there is a gap between .data and .bss, where the init text
>> and data regions reside. This means that the end of the .data region and
>> the start of the .bss region are not covered by huge page mappings
>> either, even though both regions use the same permissions (RW+NX).
>> 
>> Improve the situation, by placing .data and .bss adjacently in the
>> linker map, and putting the init text and data regions after .rodata,
>> taking the place of the rodata/data gap. This results in one fewer gap,
>> and a more efficient mapping of the .data and .bss regions.
>> 
>> To preserve the x86_64 ELF layout with PT_LOAD regions aligned to 2 MiB,
>> start the second ELF segment at .init.data and align it to 2 MiB.  The
>> resulting padding will be covered by the init region and will be freed
>> along with it after boot.
>> 
>> defconfig + Clang 19:
>> 
>> Before:
>> 
>>   0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
>>   0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
>>   0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE  GLB NX pmd
>>   0xffffffff83400000-0xffffffff83800000     4M  RW       GLB NX pte
>> 
>> After:
>> 
>>   0xffffffff81000000-0xffffffff82200000    18M  ro  PSE  GLB x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro       GLB x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K  RW       GLB NX pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE  GLB NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro       GLB NX pte
>>   0xffffffff82b40000-0xffffffff82c00000   768K  RW       GLB NX pte
>>   0xffffffff82c00000-0xffffffff82e00000     2M  RW  PSE  GLB NX pmd
>>   0xffffffff82e00000-0xffffffff83000000     2M  RW       GLB NX pte
>>   0xffffffff83000000-0xffffffff83800000     8M  RW  PSE  GLB NX pmd
>> 
>> With the gaps removed/unmapped (pti=on)
>> 
>> Before:
>> 
>>   0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
>>   0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K                  pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
>>   0xffffffff82b40000-0xffffffff82c00000   768K                  pte
>>   0xffffffff82c00000-0xffffffff83400000     8M  RW  PSE      NX pmd
>>   0xffffffff83400000-0xffffffff8342a000   168K  RW           NX pte
>>   0xffffffff8342a000-0xffffffff836f3000  2852K                  pte
>>   0xffffffff836f3000-0xffffffff83800000  1076K  RW           NX pte
>> 
>> After:
>> 
>>   0xffffffff81000000-0xffffffff81200000     2M  ro  PSE  GLB x  pmd
>>   0xffffffff81200000-0xffffffff82200000    16M  ro  PSE      x  pmd
>>   0xffffffff82200000-0xffffffff8231c000  1136K  ro           x  pte
>>   0xffffffff8231c000-0xffffffff82400000   912K                  pte
>>   0xffffffff82400000-0xffffffff82a00000     6M  ro  PSE      NX pmd
>>   0xffffffff82a00000-0xffffffff82b40000  1280K  ro           NX pte
>>   0xffffffff82b40000-0xffffffff82e3d000  3060K                  pte
>>   0xffffffff82e3d000-0xffffffff83000000  1804K  RW           NX pte
>>   0xffffffff83000000-0xffffffff83800000     8M  RW  PSE      NX pmd
>> 
>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> ---
>>  arch/x86/kernel/vmlinux.lds.S | 91 +++++++++++---------
>>  arch/x86/mm/init_64.c         |  5 +-
>>  arch/x86/mm/pat/set_memory.c  |  2 +-
>>  3 files changed, 52 insertions(+), 46 deletions(-)
>
> I guess we could do this - I don't see why not... we'll have to take it for
> a longer spin tho.
>
>> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>> index 3a24a3fc55f5..1dee2987c42b 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -61,12 +61,15 @@ const_cpu_current_top_of_stack = cpu_current_top_of_stack;
>>  #define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
>>  
>>  #define X86_ALIGN_RODATA_END					\
>> -		. = ALIGN(HPAGE_SIZE);				\
>> -		__end_rodata_hpage_align = .;			\
>
> $ git grep __end_rodata_hpage_align
> arch/x86/include/asm/sections.h:13:extern char __end_rodata_hpage_align[];
> arch/x86/tools/relocs.c:93:     "__end_rodata_hpage_align|"
>
> I guess you wanna remove those too and say that that marker is unused. Better
> yet do that in a pre-patch.
>

Indeed. When __end_rodata_hpage_align exists, it is always equal to __end_rodata_aligned, so it can just be dropped entirely.


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2026-03-09 14:11 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-08  9:25 [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section Ard Biesheuvel
2026-01-22 13:08   ` Borislav Petkov
2026-01-22 13:48     ` Ard Biesheuvel
2026-01-22 13:58       ` Borislav Petkov
2026-01-22 14:09         ` Ard Biesheuvel
2026-01-22 14:16           ` Borislav Petkov
2026-01-22 14:20             ` Ard Biesheuvel
2026-01-22 14:25               ` Borislav Petkov
2026-01-08  9:25 ` [RFC/RFT PATCH 02/19] x86/sev: Don't emit BSS_DECRYPT section unless it is in use Ard Biesheuvel
2026-01-31 14:09   ` [tip: x86/sev] x86/sev: Don't emit BSS_DECRYPTED " tip-bot2 for Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 03/19] x86: Combine .data with .bss in kernel mapping Ard Biesheuvel
2026-03-06 19:07   ` Borislav Petkov
2026-03-09 14:11     ` Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 04/19] x86: Make the 64-bit bzImage always physically relocatable Ard Biesheuvel
2026-01-12  4:01   ` H. Peter Anvin
2026-01-12 10:47     ` David Laight
2026-01-12 12:06       ` H. Peter Anvin
2026-01-08  9:25 ` [RFC/RFT PATCH 05/19] x86/efistub: Simplify early remapping of kernel text Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 06/19] alloc_tag: Use __ prefixed ELF section names Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 07/19] tools/objtool: Treat indirect ftrace calls as direct calls Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 08/19] x86: Use PIE codegen for the relocatable 64-bit kernel Ard Biesheuvel
2026-01-09 21:34   ` Jan Engelhardt
2026-01-09 22:07     ` Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 09/19] x86/pm-trace: Use RIP-relative accesses for .tracedata Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 10/19] x86/kvm: Use RIP-relative addressing Ard Biesheuvel
2026-01-20 17:04   ` Sean Christopherson
2026-01-20 19:43     ` David Laight
2026-01-20 20:54       ` Ard Biesheuvel
2026-01-20 22:00         ` David Laight
2026-01-08  9:25 ` [RFC/RFT PATCH 11/19] x86/rethook: Use RIP-relative reference for fake return address Ard Biesheuvel
2026-01-08 12:08   ` David Laight
2026-01-08 12:10     ` Ard Biesheuvel
2026-01-08 12:19       ` Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 12/19] x86/sync_core: Use RIP-relative addressing Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 13/19] x86/entry_64: " Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 14/19] x86/hibernate: Prefer RIP-relative accesses Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 15/19] x64/acpi: Use PIC-compatible references in wakeup_64.S Ard Biesheuvel
2026-01-09  5:01   ` Brian Gerst
2026-01-09  7:59     ` Ard Biesheuvel
2026-01-09 11:46       ` Brian Gerst
2026-01-09 12:09         ` Ard Biesheuvel
2026-01-09 12:10           ` Ard Biesheuvel
2026-01-09 12:51             ` Brian Gerst
2026-01-08  9:25 ` [RFC/RFT PATCH 16/19] x86/kexec: Use 64-bit wide absolute reference from relocated code Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 17/19] x86/head64: Avoid absolute references in startup asm Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 18/19] x86/boot: Implement support for RELA/RELR/REL runtime relocations Ard Biesheuvel
2026-01-08  9:25 ` [RFC/RFT PATCH 19/19] x86/kernel: Switch to PIE linking for the relocatable kernel Ard Biesheuvel
2026-01-08 16:35 ` [RFC/RFT PATCH 00/19] Link the relocatable x86 kernel as PIE Alexander Lobakin
2026-01-09  0:36 ` H. Peter Anvin
2026-01-09  9:21   ` Ard Biesheuvel
2026-01-14 18:16     ` Kees Cook
2026-01-20 20:45       ` H. Peter Anvin
2026-01-21  8:56         ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox