CO-RE builtins purity and other compiler optimizations

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* CO-RE builtins purity and other compiler optimizations
@ 2023-07-05 18:07 Jose E. Marchesi
  2023-07-06  0:02 ` Andrii Nakryiko
  0 siblings, 1 reply; 4+ messages in thread
From: Jose E. Marchesi @ 2023-07-05 18:07 UTC (permalink / raw)
  To: bpf; +Cc: cupertino.miranda, david.faust

Hello BPF people!

We are still working in supporting the pending CO-RE built-ins in GCC.
The trick of hooking in the parser to avoid constant folding, as
discussed during LSFMMBPF, seems to work well.  Almost there!

So, most of the CO-RE associated C built-ins have the side effect of
emiting a CO-RE relocation in the .BTF.ext section.  This is for example
the case of __builtin_preserve_enum_value.

Like calls to regular functions, calls to C built-ins are also
candidates to certain optimizations.  For example, given this code:

: int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
: int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);

The compiler may very well decide to optimize out the second call to the
built-in if it is to be considered "pure", i.e. given exactly the same
arguments it produces the same results.

We observed that clang indeed seems to optimize that way.  See
https://godbolt.org/z/zqe9Kfrrj.

That kind of optimizations have an impact on the number of CO-RE
relocations emitted.

Question:

Is the BPF loader, the BPF verifier or any other core component sensible
in any way to the number (and ordering) of CO-RE relocations for some
given BPF C program?  i.e. compiling the same BPF C program above with
and without that optimization, will it work in both cases?

If no, then perfect!  Different compilers can optimize slightly
differently (or not optimize at all) and we can mark these built-ins as
pure in GCC as well, benefiting from optimizations without worrying to
have to emit exactly what clang emits.

If yes, wouldn't it be better to disable that kind of optimization in
all C BPF compilers, i.e. to make the compilers aware of the side-effect
so they will not optimize built-in calls out (or replicate them.) and to
make this mandatory in the CO-RE spec?  Making a compiler to optimize
exactly like another compiler is difficult and sometimes even not
feasible.

Thanks in advance for the clarification/info!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CO-RE builtins purity and other compiler optimizations
  2023-07-05 18:07 CO-RE builtins purity and other compiler optimizations Jose E. Marchesi
@ 2023-07-06  0:02 ` Andrii Nakryiko
  2023-07-06  5:57   ` Yonghong Song
  2023-07-06  9:03   ` Jose E. Marchesi
  0 siblings, 2 replies; 4+ messages in thread
From: Andrii Nakryiko @ 2023-07-06  0:02 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: bpf, cupertino.miranda, david.faust

On Wed, Jul 5, 2023 at 11:07 AM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> Hello BPF people!
>
> We are still working in supporting the pending CO-RE built-ins in GCC.
> The trick of hooking in the parser to avoid constant folding, as
> discussed during LSFMMBPF, seems to work well.  Almost there!
>
> So, most of the CO-RE associated C built-ins have the side effect of
> emiting a CO-RE relocation in the .BTF.ext section.  This is for example
> the case of __builtin_preserve_enum_value.
>
> Like calls to regular functions, calls to C built-ins are also
> candidates to certain optimizations.  For example, given this code:
>
> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>
> The compiler may very well decide to optimize out the second call to the
> built-in if it is to be considered "pure", i.e. given exactly the same
> arguments it produces the same results.
>
> We observed that clang indeed seems to optimize that way.  See
> https://godbolt.org/z/zqe9Kfrrj.
>
> That kind of optimizations have an impact on the number of CO-RE
> relocations emitted.
>
> Question:
>
> Is the BPF loader, the BPF verifier or any other core component sensible
> in any way to the number (and ordering) of CO-RE relocations for some
> given BPF C program?  i.e. compiling the same BPF C program above with
> and without that optimization, will it work in both cases?

Yes, it should.

>
> If no, then perfect!  Different compilers can optimize slightly

Did you mean "if yes, then perfect"? Because otherwise it makes no sense :)

> differently (or not optimize at all) and we can mark these built-ins as
> pure in GCC as well, benefiting from optimizations without worrying to
> have to emit exactly what clang emits.

Yes, it should be fine, as long as the compiler doesn't assume any
specific value returned by CO-RE relocation (and doesn't perform any
optimizations based on that assumed value). From the BPF verifier
side, it's just a constant, so the BPF verifier itself doesn't care.
From the libbpf/BPF loader standpoint, all that matters is that there
is CO-RE relocation information that specifies how some BPF
instruction needs to be adjusted to match the host kernel properly.
Whether CO-RE relocation is repeated many times, or performed just
once and that constant value is just reused in the code many times,
shouldn't matter at all.

>
> If yes, wouldn't it be better to disable that kind of optimization in
> all C BPF compilers, i.e. to make the compilers aware of the side-effect
> so they will not optimize built-in calls out (or replicate them.) and to
> make this mandatory in the CO-RE spec?  Making a compiler to optimize
> exactly like another compiler is difficult and sometimes even not
> feasible.
>
> Thanks in advance for the clarification/info!
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CO-RE builtins purity and other compiler optimizations
  2023-07-06  0:02 ` Andrii Nakryiko
@ 2023-07-06  5:57   ` Yonghong Song
  2023-07-06  9:03   ` Jose E. Marchesi
  1 sibling, 0 replies; 4+ messages in thread
From: Yonghong Song @ 2023-07-06  5:57 UTC (permalink / raw)
  To: Andrii Nakryiko, Jose E. Marchesi; +Cc: bpf, cupertino.miranda, david.faust



On 7/5/23 5:02 PM, Andrii Nakryiko wrote:
> On Wed, Jul 5, 2023 at 11:07 AM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
>>
>>
>> Hello BPF people!
>>
>> We are still working in supporting the pending CO-RE built-ins in GCC.
>> The trick of hooking in the parser to avoid constant folding, as
>> discussed during LSFMMBPF, seems to work well.  Almost there!
>>
>> So, most of the CO-RE associated C built-ins have the side effect of
>> emiting a CO-RE relocation in the .BTF.ext section.  This is for example
>> the case of __builtin_preserve_enum_value.
>>
>> Like calls to regular functions, calls to C built-ins are also
>> candidates to certain optimizations.  For example, given this code:
>>
>> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>>
>> The compiler may very well decide to optimize out the second call to the
>> built-in if it is to be considered "pure", i.e. given exactly the same
>> arguments it produces the same results.
>>
>> We observed that clang indeed seems to optimize that way.  See
>> https://godbolt.org/z/zqe9Kfrrj .
>>
>> That kind of optimizations have an impact on the number of CO-RE
>> relocations emitted.
>>
>> Question:
>>
>> Is the BPF loader, the BPF verifier or any other core component sensible
>> in any way to the number (and ordering) of CO-RE relocations for some
>> given BPF C program?  i.e. compiling the same BPF C program above with
>> and without that optimization, will it work in both cases?
> 
> Yes, it should.
> 
>>
>> If no, then perfect!  Different compilers can optimize slightly
> 
> Did you mean "if yes, then perfect"? Because otherwise it makes no sense :)
> 
>> differently (or not optimize at all) and we can mark these built-ins as
>> pure in GCC as well, benefiting from optimizations without worrying to
>> have to emit exactly what clang emits.
> 
> Yes, it should be fine, as long as the compiler doesn't assume any
> specific value returned by CO-RE relocation (and doesn't perform any
> optimizations based on that assumed value). From the BPF verifier
> side, it's just a constant, so the BPF verifier itself doesn't care.
>  From the libbpf/BPF loader standpoint, all that matters is that there
> is CO-RE relocation information that specifies how some BPF
> instruction needs to be adjusted to match the host kernel properly.
> Whether CO-RE relocation is repeated many times, or performed just
> once and that constant value is just reused in the code many times,
> shouldn't matter at all.


For cases like this:

 >> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, 
BPF_ENUMVAL_VALUE);
 >> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, 
BPF_ENUMVAL_VALUE);

Internally llvm (one bpf backend pass) will converts
   __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE)
to a global variable based on the captured info, builtin,
type, value, etc.

Since 'int a = ...' and 'int b = ...' have the same value,
the same bpf backend pass will only creates one global variable,
hence effectively doing CSE.

gcc might implement different way, but for the same
built in type + its same source representation, CSE
should be okay.

> 
>>
>> If yes, wouldn't it be better to disable that kind of optimization in
>> all C BPF compilers, i.e. to make the compilers aware of the side-effect
>> so they will not optimize built-in calls out (or replicate them.) and to
>> make this mandatory in the CO-RE spec?  Making a compiler to optimize
>> exactly like another compiler is difficult and sometimes even not
>> feasible.
>>
>> Thanks in advance for the clarification/info!
>>
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CO-RE builtins purity and other compiler optimizations
  2023-07-06  0:02 ` Andrii Nakryiko
  2023-07-06  5:57   ` Yonghong Song
@ 2023-07-06  9:03   ` Jose E. Marchesi
  1 sibling, 0 replies; 4+ messages in thread
From: Jose E. Marchesi @ 2023-07-06  9:03 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, cupertino.miranda, david.faust


> On Wed, Jul 5, 2023 at 11:07 AM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
>>
>>
>> Hello BPF people!
>>
>> We are still working in supporting the pending CO-RE built-ins in GCC.
>> The trick of hooking in the parser to avoid constant folding, as
>> discussed during LSFMMBPF, seems to work well.  Almost there!
>>
>> So, most of the CO-RE associated C built-ins have the side effect of
>> emiting a CO-RE relocation in the .BTF.ext section.  This is for example
>> the case of __builtin_preserve_enum_value.
>>
>> Like calls to regular functions, calls to C built-ins are also
>> candidates to certain optimizations.  For example, given this code:
>>
>> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>>
>> The compiler may very well decide to optimize out the second call to the
>> built-in if it is to be considered "pure", i.e. given exactly the same
>> arguments it produces the same results.
>>
>> We observed that clang indeed seems to optimize that way.  See
>> https://godbolt.org/z/zqe9Kfrrj.
>>
>> That kind of optimizations have an impact on the number of CO-RE
>> relocations emitted.
>>
>> Question:
>>
>> Is the BPF loader, the BPF verifier or any other core component sensible
>> in any way to the number (and ordering) of CO-RE relocations for some
>> given BPF C program?  i.e. compiling the same BPF C program above with
>> and without that optimization, will it work in both cases?
>
> Yes, it should.
>
>>
>> If no, then perfect!  Different compilers can optimize slightly
>
> Did you mean "if yes, then perfect"? Because otherwise it makes no sense :)

Yeah I was referring to the first question not the second :)

>> differently (or not optimize at all) and we can mark these built-ins as
>> pure in GCC as well, benefiting from optimizations without worrying to
>> have to emit exactly what clang emits.
>
> Yes, it should be fine, as long as the compiler doesn't assume any
> specific value returned by CO-RE relocation (and doesn't perform any
> optimizations based on that assumed value). From the BPF verifier
> side, it's just a constant, so the BPF verifier itself doesn't care.
> From the libbpf/BPF loader standpoint, all that matters is that there
> is CO-RE relocation information that specifies how some BPF
> instruction needs to be adjusted to match the host kernel properly.
> Whether CO-RE relocation is repeated many times, or performed just
> once and that constant value is just reused in the code many times,
> shouldn't matter at all.

Ok, this is good.  Thanks for confirming!

>>
>> If yes, wouldn't it be better to disable that kind of optimization in
>> all C BPF compilers, i.e. to make the compilers aware of the side-effect
>> so they will not optimize built-in calls out (or replicate them.) and to
>> make this mandatory in the CO-RE spec?  Making a compiler to optimize
>> exactly like another compiler is difficult and sometimes even not
>> feasible.
>>
>> Thanks in advance for the clarification/info!
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-06  9:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-05 18:07 CO-RE builtins purity and other compiler optimizations Jose E. Marchesi
2023-07-06  0:02 ` Andrii Nakryiko
2023-07-06  5:57   ` Yonghong Song
2023-07-06  9:03   ` Jose E. Marchesi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox