All of lore.kernel.org

* Re: [PATCH] objtool: ignore .L prefixed local symbols
From: Arvind Sankar @ 2020-02-14 20:42 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Arvind Sankar, Nick Desaulniers, jpoimboe, peterz,
	clang-built-linux, Nathan Chancellor, linux-kernel
In-Reply-To: <20200214180527.z44b4bmzn336mff2@google.com>

On Fri, Feb 14, 2020 at 10:05:27AM -0800, Fangrui Song wrote:
> I know little about objtool, but if it may be used by other
> architectures, hope the following explanations don't appear to be too
> off-topic:)
> 
> On 2020-02-14, Arvind Sankar wrote:
> >Can you describe what case the clang change is supposed to optimize?
> >AFAICT, it kicks in when the symbol is known by the compiler to be local
> >to the DSO and defined in the same translation unit.
> >
> >But then there are two cases:
> >(a) we have call foo, where foo is defined in the same section as the
> >call instruction. In this case the assembler should be able to fully
> >resolve foo and not generate any relocation, regardless of whether foo
> >is global or local.
> 
> If foo is STB_GLOBAL or STB_WEAK, the assembler cannot fully resolve a
> reference to foo in the same section, unless the assembler can assume
> (the codegen tells it) the call to foo cannot be interposed by another
> foo definition at runtime.

I was testing with hidden/protected visibility, I see you want this for
the no-semantic-interposition case. Actually a bit more testing shows
some peculiarities even with hidden visibility. With the below, the call
and lea create relocations in the object file, but the jmp doesn't. ld
does avoid creating a plt for this though.

	.text
	.globl foo, bar
	.hidden foo
	bar:
		call	foo
		leaq	foo(%rip), %rax
		jmp	foo

	foo:	ret

> 
> >(b) we have call foo, where foo is defined in a different section from
> >the call instruction. In this case the assembler must generate a
> >relocation regardless of whether foo is global or local, and the linker
> >should eliminate it.
> >In what case does does replacing call foo with call .Lfoo$local help?
> 
> For -fPIC -fno-semantic-interposition, the assembly emitter can perform
> the following optimization:
> 
>    void foo() {}
>    void bar() { foo(); }
> 
>    .globl foo, bar
>    foo:
>    .Lfoo$local:
>      ret
>    bar:
>      call foo  --> call .Lfoo$local
>      ret
> 
> call foo generates an R_X86_64_PLT32. In a -shared link, it creates an
> unneeded PLT entry for foo.
> 
> call .Lfoo$local generates an R_X86_64_PLT32. In a -shared link, .Lfoo$local is
> non-preemptible => no PLT entry is created.
> 
> For -fno-PIC and -fPIE, the final link is expected to be -no-pie or
> -pie. This optimization does not save anything, because PLT entries will
> not be generated. With clang's integrated assembler, it may increase the
> number of STT_SECTION symbols (because .Lfoo$local will be turned to a
> STT_SECTION relative relocation), but the size increase is very small.
> 
> 
> I want to teach clang -fPIC to use -fno-semantic-interposition by
> default. (It is currently an LLVM optimization, not realized in clang.)
> clang traditionally makes various -fno-semantic-interposition
> assumptions and can perform interprocedural optimizations even if the
> strict ELF rule disallows them.

FWIW, gcc with no-semantic-interposition also uses local aliases, but
rather than using .L labels, it creates a local alias by
	.set foo.localalias, foo
This makes the type of foo.localalias the same as foo, which I gather
should placate objtool as it'll still see an STT_FUNC no matter whether
it picks up foo.localalias or foo.

^ permalink raw reply