[RFC PATCH 0/2] Fix more head.text bugs and improve diagnostic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] Fix more head.text bugs and improve diagnostic
@ 2025-01-27 11:43 Ard Biesheuvel
  2025-01-27 11:43 ` [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references Ard Biesheuvel
  2025-01-27 11:43 ` [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code Ard Biesheuvel
  0 siblings, 2 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2025-01-27 11:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds, Tom Lendacky,
	Nathan Chancellor

From: Ard Biesheuvel <ardb@kernel.org>

Linus reports [0] a build issue when using clang, using a config that
presumably has IBT and retpolines disabled.

In this case, the following diagnostic is emitted.

  Absolute reference to symbol '.rodata' not permitted in .head.text

As Linus explains, this message is not very helpful to the unsuspecting
developer, as it does not really pin down the culprit. So improve this
by adding the addend and the offset to the error message, and demote it
to a warning so the build can complete. This also works around the
unhelpful deleting of vmlinux by make when this condition triggers.

Then, fix the actual issue being reported here, which is a true positive
and indicates that the resulting build is fatally broken on SEV-SNP.

[0] https://lore.kernel.org/all/CAHk-=wj7k9nvJn6cpa3-5Ciwn2RGyE605BMkjWE4MqnvC9E92A@mail.gmail.com/

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>

Ard Biesheuvel (2):
  x86/relocs: Improve diagnostic for rejected absolute references
  x86/sev: Disable jump tables in SEV startup code

 arch/x86/coco/sev/Makefile | 4 ++++
 arch/x86/tools/relocs.c    | 7 ++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

-- 
2.48.1.262.g85cc9f2d1e-goog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-01-27 11:43 [RFC PATCH 0/2] Fix more head.text bugs and improve diagnostic Ard Biesheuvel
@ 2025-01-27 11:43 ` Ard Biesheuvel
  2025-01-27 16:57   ` Linus Torvalds
  2025-01-27 11:43 ` [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code Ard Biesheuvel
  1 sibling, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-01-27 11:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds, Tom Lendacky,
	Nathan Chancellor

From: Ard Biesheuvel <ardb@kernel.org>

Compiler emitted absolute references are often section-relative, as the
objects in question sometimes don't even exist in the C code (e.g., jump
tables) or have static linkage.

Enhance the diagnostic that is printed when detecting absolute
references in .head.text, but printing the addend of the symbol
reference, and the location in vmlinux where the reference can be found.

So instead of printing

  Absolute reference to symbol '.rodata' detected not permitted in .head.text

and failing the build, print the below but only as a warning.

  Absolute reference to symbol '.rodata+0x180' detected in .head.text (0xffffffff820cb4ba).
  This kernel may might not boot.

Not failing the build also works around the issue that the file vmlinux
will be deleted by make when an error occurs, which is not very helpful
in trying to narrow down the problem.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/tools/relocs.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index e937be979ec8..134cf5cfe7bd 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -901,9 +901,10 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		}

 		if (headtext) {
-			die("Absolute reference to symbol '%s' not permitted in .head.text\n",
-			    symname);
-			break;
+			fprintf(stderr,
+				"Absolute reference to symbol '%s+0x%lx' detected in .head.text (0x%lx).\n"
+				"This kernel might not boot.\n",
+				symname, rel->r_addend, offset);
 		}

 		/*
-- 
2.48.1.262.g85cc9f2d1e-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code
  2025-01-27 11:43 [RFC PATCH 0/2] Fix more head.text bugs and improve diagnostic Ard Biesheuvel
  2025-01-27 11:43 ` [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references Ard Biesheuvel
@ 2025-01-27 11:43 ` Ard Biesheuvel
  2025-01-27 17:09   ` Linus Torvalds
  2025-01-28 22:22   ` [tip: x86/urgent] " tip-bot2 for Ard Biesheuvel
  1 sibling, 2 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2025-01-27 11:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds, Tom Lendacky,
	Nathan Chancellor

From: Ard Biesheuvel <ardb@kernel.org>

When retpolines and IBT are both disabled, the compiler is free to use
jump tables to optimize switch instructions. However, these are emitted
by Clang as absolute references into .rodata, e.g.,

        jmp    *-0x7dfffe90(,%r9,8)
                        R_X86_64_32S    .rodata+0x170

Given that this code will execute before that address in .rodata has even
been mapped, it is guaranteed to crash a SEV-SNP guest in a way that is
difficult to diagnose.

So disable jump tables when building this code. It would be better if we
could attach this annotation to the __head macro but this appears to be
impossible.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/coco/sev/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
index 08de37559307..dcb06dc8b5ae 100644
--- a/arch/x86/coco/sev/Makefile
+++ b/arch/x86/coco/sev/Makefile
@@ -2,6 +2,10 @@
 
 obj-y += core.o
 
+# jump tables are emitted using absolute references in non-PIC code
+# so they cannot be used in the early SEV startup code
+CFLAGS_core.o += -fno-jump-tables
+
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_core.o = -pg
 endif
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-01-27 11:43 ` [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references Ard Biesheuvel
@ 2025-01-27 16:57   ` Linus Torvalds
  2025-01-27 22:01     ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-01-27 16:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Ard Biesheuvel, Ingo Molnar, Tom Lendacky,
	Nathan Chancellor

On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
>
>   Absolute reference to symbol '.rodata+0x180' detected in .head.text (0xffffffff820cb4ba).

Do we have any symbol name lookup logic anywhere?

Because it would be much more helpful if it also printed out the
source symbol and target symbol name, not just the offsets in the
section..

Yes, with the offsets - and the lack of fatal error, so that the
vmlinux file stays around - you can now much more easily do some gdb
or objdump magic to then pinpoint what's up, but it still ends up
being very much a "you _really_ need to understand what's up" kind of
thing.

                  Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code
  2025-01-27 11:43 ` [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code Ard Biesheuvel
@ 2025-01-27 17:09   ` Linus Torvalds
  2025-01-27 22:15     ` Ard Biesheuvel
  2025-01-28 22:22   ` [tip: x86/urgent] " tip-bot2 for Ard Biesheuvel
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2025-01-27 17:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Ard Biesheuvel, Ingo Molnar, Tom Lendacky,
	Nathan Chancellor

On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
>
> +# jump tables are emitted using absolute references in non-PIC code
> +# so they cannot be used in the early SEV startup code
> +CFLAGS_core.o += -fno-jump-tables

I can confirm that this looks like it fixes the problem for my
particular config.

Is the SEV code the only thing that needs this? And on the other hand,
isn't it just a _part_ of core.c that needs this? Maybe the "_head"
parts should be in a separate file?

I'm looking at (for example) arch/x86/mm/mem_encrypt_identity.c and
arch/x86/kernel/head64.c, and it looks like it really should have that
-fno-jump-tables thing too.

It just randomly may not have any switch tables or other things that
makes the compiler generate that code pattern.

(I did this just by a grep, maybe there's something else protecting
those files that I just didn't see)

           Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-01-27 16:57   ` Linus Torvalds
@ 2025-01-27 22:01     ` Ard Biesheuvel
  2025-02-03  9:40       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-01-27 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ard Biesheuvel, linux-kernel, x86, Ingo Molnar, Tom Lendacky,
	Nathan Chancellor

On Mon, 27 Jan 2025 at 17:57, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> >   Absolute reference to symbol '.rodata+0x180' detected in .head.text (0xffffffff820cb4ba).
>
> Do we have any symbol name lookup logic anywhere?
>

I can look into that. In this particular case, though, there is no
symbol to look up as it is a anonymous jump table generated by the
compiler. And the function name would be inaccurate too, as
snp_cpuid_postprocess() got inlined into its caller. But I guess with
the right DWARF data, at least the call site could be narrowed down a
bit better.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code
  2025-01-27 17:09   ` Linus Torvalds
@ 2025-01-27 22:15     ` Ard Biesheuvel
  2025-01-27 22:28       ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-01-27 22:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ard Biesheuvel, linux-kernel, x86, Ingo Molnar, Tom Lendacky,
	Nathan Chancellor

On Mon, 27 Jan 2025 at 18:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > +# jump tables are emitted using absolute references in non-PIC code
> > +# so they cannot be used in the early SEV startup code
> > +CFLAGS_core.o += -fno-jump-tables
>
> I can confirm that this looks like it fixes the problem for my
> particular config.
>

Great.

> Is the SEV code the only thing that needs this? And on the other hand,
> isn't it just a _part_ of core.c that needs this? Maybe the "_head"
> parts should be in a separate file?
>
> I'm looking at (for example) arch/x86/mm/mem_encrypt_identity.c and
> arch/x86/kernel/head64.c, and it looks like it really should have that
> -fno-jump-tables thing too.
>
> It just randomly may not have any switch tables or other things that
> makes the compiler generate that code pattern.
>

Yes. This is essentially whack-a-mole. And note that placing functions
in .head.text is a manual process, and so there may be other code that
is reachable via early code paths that remain unchecked.

The right way to go about this would be to duplicate what I did for
arm64 in arch/arm64/kernel/pi/, i.e., to create a special set of
source files that can be called into, but cannot call out to other
code unless it is part of the same pi/ corpus. These files are built
with an entirely different set of compiler options, -fPIC being one of
the notable ones. The resulting objects are checked for absolute
relocations in a similar fashion, and are therefore guaranteed to be
truly position independent (hence the 'pi') without any need for
runtime fixups. (Note that these objects are still linked into vmlinux
as usual, but due to the symbol name prefixes added using objcopy,
they are part of an isolated namespace)

The use case is also quite similar to the x86 one, as a matter of
fact: the initial mapping of the arm64 kernel is also 1:1, and some
non-trivial work is needed before the kernel's virtual mapping can be
created, and doing all of that in C was becoming intractable.

If this seems like a sensible approach for x86 as well, I can do some
exploration into the feasibility (keeping in mind that a lot of this
code is already being shared with the decompressor and the EFI stub
too, so it might take some effort to untangle)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code
  2025-01-27 22:15     ` Ard Biesheuvel
@ 2025-01-27 22:28       ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2025-01-27 22:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ard Biesheuvel, linux-kernel, x86, Ingo Molnar, Tom Lendacky,
	Nathan Chancellor

On Mon, 27 Jan 2025 at 23:15, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 27 Jan 2025 at 18:10, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > > +# jump tables are emitted using absolute references in non-PIC code
> > > +# so they cannot be used in the early SEV startup code
> > > +CFLAGS_core.o += -fno-jump-tables
> >
> > I can confirm that this looks like it fixes the problem for my
> > particular config.
> >
>
> Great.
>
> > Is the SEV code the only thing that needs this? And on the other hand,
> > isn't it just a _part_ of core.c that needs this? Maybe the "_head"
> > parts should be in a separate file?
> >
> > I'm looking at (for example) arch/x86/mm/mem_encrypt_identity.c and
> > arch/x86/kernel/head64.c, and it looks like it really should have that
> > -fno-jump-tables thing too.
> >
> > It just randomly may not have any switch tables or other things that
> > makes the compiler generate that code pattern.
> >
>
...
> The use case is also quite similar to the x86 one, as a matter of
> fact: the initial mapping of the arm64 kernel is also 1:1, and some
> non-trivial work is needed before the kernel's virtual mapping can be
> created, and doing all of that in C was becoming intractable.
>

Doing it in *assembler* was becoming intractable, in case that wasn't clear.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip: x86/urgent] x86/sev: Disable jump tables in SEV startup code
  2025-01-27 11:43 ` [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code Ard Biesheuvel
  2025-01-27 17:09   ` Linus Torvalds
@ 2025-01-28 22:22   ` tip-bot2 for Ard Biesheuvel
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot2 for Ard Biesheuvel @ 2025-01-28 22:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Linus Torvalds, Ard Biesheuvel, Ingo Molnar, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     1105ab42a84bc11c62597005f78ccad2434fbd66
Gitweb:        https://git.kernel.org/tip/1105ab42a84bc11c62597005f78ccad2434fbd66
Author:        Ard Biesheuvel <ardb@kernel.org>
AuthorDate:    Mon, 27 Jan 2025 12:43:37 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 28 Jan 2025 23:10:29 +01:00

x86/sev: Disable jump tables in SEV startup code

When retpolines and IBT are both disabled, the compiler is free to use
jump tables to optimize switch instructions. However, these are emitted
by Clang as absolute references into .rodata:

        jmp    *-0x7dfffe90(,%r9,8)
                        R_X86_64_32S    .rodata+0x170

Given that this code will execute before that address in .rodata has even
been mapped, it is guaranteed to crash a SEV-SNP guest in a way that is
difficult to diagnose.

So disable jump tables when building this code. It would be better if we
could attach this annotation to the __head macro but this appears to be
impossible.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250127114334.1045857-6-ardb+git@google.com
---
 arch/x86/coco/sev/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
index 08de375..dcb06dc 100644
--- a/arch/x86/coco/sev/Makefile
+++ b/arch/x86/coco/sev/Makefile
@@ -2,6 +2,10 @@
 
 obj-y += core.o
 
+# jump tables are emitted using absolute references in non-PIC code
+# so they cannot be used in the early SEV startup code
+CFLAGS_core.o += -fno-jump-tables
+
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_core.o = -pg
 endif

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-01-27 22:01     ` Ard Biesheuvel
@ 2025-02-03  9:40       ` Ingo Molnar
  2025-02-03 10:00         ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-02-03  9:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linus Torvalds, Ard Biesheuvel, linux-kernel, x86, Tom Lendacky,
	Nathan Chancellor


* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Mon, 27 Jan 2025 at 17:57, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > >   Absolute reference to symbol '.rodata+0x180' detected in .head.text (0xffffffff820cb4ba).
> >
> > Do we have any symbol name lookup logic anywhere?
> >
> 
> I can look into that. In this particular case, though, there is no
> symbol to look up as it is a anonymous jump table generated by the
> compiler. And the function name would be inaccurate too, as
> snp_cpuid_postprocess() got inlined into its caller. But I guess with
> the right DWARF data, at least the call site could be narrowed down a
> bit better.

So patch #2 is now upstream, but should I apply this diagnostic patch 
as-is, or will there be a -v2?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-02-03  9:40       ` Ingo Molnar
@ 2025-02-03 10:00         ` Ard Biesheuvel
  2025-02-22 12:01           ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-02-03 10:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Ard Biesheuvel, linux-kernel, x86, Tom Lendacky,
	Nathan Chancellor

On Mon, 3 Feb 2025 at 10:40, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > On Mon, 27 Jan 2025 at 17:57, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
> > > >
> > > >   Absolute reference to symbol '.rodata+0x180' detected in .head.text (0xffffffff820cb4ba).
> > >
> > > Do we have any symbol name lookup logic anywhere?
> > >
> >
> > I can look into that. In this particular case, though, there is no
> > symbol to look up as it is a anonymous jump table generated by the
> > compiler. And the function name would be inaccurate too, as
> > snp_cpuid_postprocess() got inlined into its caller. But I guess with
> > the right DWARF data, at least the call site could be narrowed down a
> > bit better.
>
> So patch #2 is now upstream, but should I apply this diagnostic patch
> as-is, or will there be a -v2?
>

I'm looking into this. But give the points above, I'm reaching the
conclusion that producing a better diagnostic based solely on vmlinux
(which may be built without debug info) is intractable, and not even
the DWARF metadata will describe a compiler generated jump table using
a named ELF symbol.

So I am also looking into isolating the startup code like I did for
arm64 (and which has been adopted by RISC-V as well), but this is
rather hairy on x86 so it will take some time. But once that lands,
this diagnostic can be removed.

So I will leave it up to you to decide whether to merge this
improvement for now, or revert the diagnostic as you suggested before.
This code has already identified some issues that were subsequently
fixed, so it has already served its purpose.

As for the related changes: .head.text can be removed in its entirety
once this is all done, so no need to revert the linker script changes
at this point. (The startup code that only executes at boot on the BSP
can move into .init.text where it belongs)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-02-03 10:00         ` Ard Biesheuvel
@ 2025-02-22 12:01           ` Ingo Molnar
  2025-02-22 12:03             ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-02-22 12:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linus Torvalds, Ard Biesheuvel, linux-kernel, x86, Tom Lendacky,
	Nathan Chancellor


* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Mon, 3 Feb 2025 at 10:40, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > On Mon, 27 Jan 2025 at 17:57, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > On Mon, 27 Jan 2025 at 03:43, Ard Biesheuvel <ardb+git@google.com> wrote:
> > > > >
> > > > >   Absolute reference to symbol '.rodata+0x180' detected in .head.text (0xffffffff820cb4ba).
> > > >
> > > > Do we have any symbol name lookup logic anywhere?
> > > >
> > >
> > > I can look into that. In this particular case, though, there is no
> > > symbol to look up as it is a anonymous jump table generated by the
> > > compiler. And the function name would be inaccurate too, as
> > > snp_cpuid_postprocess() got inlined into its caller. But I guess with
> > > the right DWARF data, at least the call site could be narrowed down a
> > > bit better.
> >
> > So patch #2 is now upstream, but should I apply this diagnostic patch
> > as-is, or will there be a -v2?
> >
> 
> I'm looking into this. But give the points above, I'm reaching the
> conclusion that producing a better diagnostic based solely on vmlinux
> (which may be built without debug info) is intractable, and not even
> the DWARF metadata will describe a compiler generated jump table using
> a named ELF symbol.
> 
> So I am also looking into isolating the startup code like I did for
> arm64 (and which has been adopted by RISC-V as well), but this is
> rather hairy on x86 so it will take some time. But once that lands,
> this diagnostic can be removed.
> 
> So I will leave it up to you to decide whether to merge this
> improvement for now, or revert the diagnostic as you suggested before.
> This code has already identified some issues that were subsequently
> fixed, so it has already served its purpose.

So after another 2 weeks there's been no new upstream regressions I'm 
aware of, so - knock on wood - it seems we can leave the die() in 
place?

But could we perhaps make it more debuggable, should it trigger - such 
as not removing the relevant object file and improving the message? 
I.e. make the build failure experience Linus had somewhat more 
palatable...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-02-22 12:01           ` Ingo Molnar
@ 2025-02-22 12:03             ` Ingo Molnar
  2025-02-22 13:47               ` Ard Biesheuvel
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2025-02-22 12:03 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linus Torvalds, Ard Biesheuvel, linux-kernel, x86, Tom Lendacky,
	Nathan Chancellor


* Ingo Molnar <mingo@kernel.org> wrote:

> So after another 2 weeks there's been no new upstream regressions I'm 
> aware of, so - knock on wood - it seems we can leave the die() in 
> place?
> 
> But could we perhaps make it more debuggable, should it trigger - 
> such as not removing the relevant object file and improving the 
> message? I.e. make the build failure experience Linus had somewhat 
> more palatable...

For example, the new message is far better, even when combined with a 
die() build failure:

-                       die("Absolute reference to symbol '%s' not permitted in .head.text\n",
-                           symname);
-                       break;
+                       fprintf(stderr,
+                               "Absolute reference to symbol '%s+0x%lx' detected in .head.text (0x%lx).\n"
+                               "This kernel might not boot.\n",
+                               symname, rel->r_addend, offset);

as it points out that the underlying bug might result in an unbootable 
kernel image. So the user at least knows what the pain is about ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-02-22 12:03             ` Ingo Molnar
@ 2025-02-22 13:47               ` Ard Biesheuvel
  2025-02-22 13:49                 ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2025-02-22 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Ard Biesheuvel, linux-kernel, x86, Tom Lendacky,
	Nathan Chancellor

On Sat, 22 Feb 2025 at 13:03, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
> > So after another 2 weeks there's been no new upstream regressions I'm
> > aware of, so - knock on wood - it seems we can leave the die() in
> > place?
> >
> > But could we perhaps make it more debuggable, should it trigger -
> > such as not removing the relevant object file and improving the
> > message? I.e. make the build failure experience Linus had somewhat
> > more palatable...
>
> For example, the new message is far better, even when combined with a
> die() build failure:
>
> -                       die("Absolute reference to symbol '%s' not permitted in .head.text\n",
> -                           symname);
> -                       break;
> +                       fprintf(stderr,
> +                               "Absolute reference to symbol '%s+0x%lx' detected in .head.text (0x%lx).\n"
> +                               "This kernel might not boot.\n",
> +                               symname, rel->r_addend, offset);
>
> as it points out that the underlying bug might result in an unbootable
> kernel image. So the user at least knows what the pain is about ...
>

Ultimately, it is the die() that results in vmlinux to be deleted. And
this is actually a result of the slightly dubious way the
Makefile.postlink logic works: usually, artifacts are created once by
the Makefile rule that defines how they are built, and if that rule
fails, no output is produced but the input is preserved. In the
vmlinux case, the file is modified by a separate rule that executes
Makefile.postlink in an entirely separate make invocation, which
splits off the static ELF relocations, using vmlinux both as input and
output.

I can have a stab at fixing that instead. That way, we can use the
improved diagnostic message, and leave the die() in place without it
resulting in vmlinux to be deleted.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references
  2025-02-22 13:47               ` Ard Biesheuvel
@ 2025-02-22 13:49                 ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2025-02-22 13:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linus Torvalds, Ard Biesheuvel, linux-kernel, x86, Tom Lendacky,
	Nathan Chancellor


* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Sat, 22 Feb 2025 at 13:03, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ingo Molnar <mingo@kernel.org> wrote:
> >
> > > So after another 2 weeks there's been no new upstream regressions I'm
> > > aware of, so - knock on wood - it seems we can leave the die() in
> > > place?
> > >
> > > But could we perhaps make it more debuggable, should it trigger -
> > > such as not removing the relevant object file and improving the
> > > message? I.e. make the build failure experience Linus had somewhat
> > > more palatable...
> >
> > For example, the new message is far better, even when combined with a
> > die() build failure:
> >
> > -                       die("Absolute reference to symbol '%s' not permitted in .head.text\n",
> > -                           symname);
> > -                       break;
> > +                       fprintf(stderr,
> > +                               "Absolute reference to symbol '%s+0x%lx' detected in .head.text (0x%lx).\n"
> > +                               "This kernel might not boot.\n",
> > +                               symname, rel->r_addend, offset);
> >
> > as it points out that the underlying bug might result in an unbootable
> > kernel image. So the user at least knows what the pain is about ...
> >
> 
> Ultimately, it is the die() that results in vmlinux to be deleted. And
> this is actually a result of the slightly dubious way the
> Makefile.postlink logic works: usually, artifacts are created once by
> the Makefile rule that defines how they are built, and if that rule
> fails, no output is produced but the input is preserved. In the
> vmlinux case, the file is modified by a separate rule that executes
> Makefile.postlink in an entirely separate make invocation, which
> splits off the static ELF relocations, using vmlinux both as input and
> output.
> 
> I can have a stab at fixing that instead. That way, we can use the
> improved diagnostic message, and leave the die() in place without it
> resulting in vmlinux to be deleted.

This sounds like the right approach to me too!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-22 13:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-27 11:43 [RFC PATCH 0/2] Fix more head.text bugs and improve diagnostic Ard Biesheuvel
2025-01-27 11:43 ` [RFC PATCH 1/2] x86/relocs: Improve diagnostic for rejected absolute references Ard Biesheuvel
2025-01-27 16:57   ` Linus Torvalds
2025-01-27 22:01     ` Ard Biesheuvel
2025-02-03  9:40       ` Ingo Molnar
2025-02-03 10:00         ` Ard Biesheuvel
2025-02-22 12:01           ` Ingo Molnar
2025-02-22 12:03             ` Ingo Molnar
2025-02-22 13:47               ` Ard Biesheuvel
2025-02-22 13:49                 ` Ingo Molnar
2025-01-27 11:43 ` [RFC PATCH 2/2] x86/sev: Disable jump tables in SEV startup code Ard Biesheuvel
2025-01-27 17:09   ` Linus Torvalds
2025-01-27 22:15     ` Ard Biesheuvel
2025-01-27 22:28       ` Ard Biesheuvel
2025-01-28 22:22   ` [tip: x86/urgent] " tip-bot2 for Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox