* Re: CO-RE builtins purity and other compiler optimizations
2023-07-06 0:02 ` Andrii Nakryiko
@ 2023-07-06 5:57 ` Yonghong Song
2023-07-06 9:03 ` Jose E. Marchesi
1 sibling, 0 replies; 4+ messages in thread
From: Yonghong Song @ 2023-07-06 5:57 UTC (permalink / raw)
To: Andrii Nakryiko, Jose E. Marchesi; +Cc: bpf, cupertino.miranda, david.faust
On 7/5/23 5:02 PM, Andrii Nakryiko wrote:
> On Wed, Jul 5, 2023 at 11:07 AM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
>>
>>
>> Hello BPF people!
>>
>> We are still working in supporting the pending CO-RE built-ins in GCC.
>> The trick of hooking in the parser to avoid constant folding, as
>> discussed during LSFMMBPF, seems to work well. Almost there!
>>
>> So, most of the CO-RE associated C built-ins have the side effect of
>> emiting a CO-RE relocation in the .BTF.ext section. This is for example
>> the case of __builtin_preserve_enum_value.
>>
>> Like calls to regular functions, calls to C built-ins are also
>> candidates to certain optimizations. For example, given this code:
>>
>> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>>
>> The compiler may very well decide to optimize out the second call to the
>> built-in if it is to be considered "pure", i.e. given exactly the same
>> arguments it produces the same results.
>>
>> We observed that clang indeed seems to optimize that way. See
>> https://godbolt.org/z/zqe9Kfrrj .
>>
>> That kind of optimizations have an impact on the number of CO-RE
>> relocations emitted.
>>
>> Question:
>>
>> Is the BPF loader, the BPF verifier or any other core component sensible
>> in any way to the number (and ordering) of CO-RE relocations for some
>> given BPF C program? i.e. compiling the same BPF C program above with
>> and without that optimization, will it work in both cases?
>
> Yes, it should.
>
>>
>> If no, then perfect! Different compilers can optimize slightly
>
> Did you mean "if yes, then perfect"? Because otherwise it makes no sense :)
>
>> differently (or not optimize at all) and we can mark these built-ins as
>> pure in GCC as well, benefiting from optimizations without worrying to
>> have to emit exactly what clang emits.
>
> Yes, it should be fine, as long as the compiler doesn't assume any
> specific value returned by CO-RE relocation (and doesn't perform any
> optimizations based on that assumed value). From the BPF verifier
> side, it's just a constant, so the BPF verifier itself doesn't care.
> From the libbpf/BPF loader standpoint, all that matters is that there
> is CO-RE relocation information that specifies how some BPF
> instruction needs to be adjusted to match the host kernel properly.
> Whether CO-RE relocation is repeated many times, or performed just
> once and that constant value is just reused in the code many times,
> shouldn't matter at all.
For cases like this:
>> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB,
BPF_ENUMVAL_VALUE);
>> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB,
BPF_ENUMVAL_VALUE);
Internally llvm (one bpf backend pass) will converts
__builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE)
to a global variable based on the captured info, builtin,
type, value, etc.
Since 'int a = ...' and 'int b = ...' have the same value,
the same bpf backend pass will only creates one global variable,
hence effectively doing CSE.
gcc might implement different way, but for the same
built in type + its same source representation, CSE
should be okay.
>
>>
>> If yes, wouldn't it be better to disable that kind of optimization in
>> all C BPF compilers, i.e. to make the compilers aware of the side-effect
>> so they will not optimize built-in calls out (or replicate them.) and to
>> make this mandatory in the CO-RE spec? Making a compiler to optimize
>> exactly like another compiler is difficult and sometimes even not
>> feasible.
>>
>> Thanks in advance for the clarification/info!
>>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: CO-RE builtins purity and other compiler optimizations
2023-07-06 0:02 ` Andrii Nakryiko
2023-07-06 5:57 ` Yonghong Song
@ 2023-07-06 9:03 ` Jose E. Marchesi
1 sibling, 0 replies; 4+ messages in thread
From: Jose E. Marchesi @ 2023-07-06 9:03 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf, cupertino.miranda, david.faust
> On Wed, Jul 5, 2023 at 11:07 AM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
>>
>>
>> Hello BPF people!
>>
>> We are still working in supporting the pending CO-RE built-ins in GCC.
>> The trick of hooking in the parser to avoid constant folding, as
>> discussed during LSFMMBPF, seems to work well. Almost there!
>>
>> So, most of the CO-RE associated C built-ins have the side effect of
>> emiting a CO-RE relocation in the .BTF.ext section. This is for example
>> the case of __builtin_preserve_enum_value.
>>
>> Like calls to regular functions, calls to C built-ins are also
>> candidates to certain optimizations. For example, given this code:
>>
>> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
>>
>> The compiler may very well decide to optimize out the second call to the
>> built-in if it is to be considered "pure", i.e. given exactly the same
>> arguments it produces the same results.
>>
>> We observed that clang indeed seems to optimize that way. See
>> https://godbolt.org/z/zqe9Kfrrj.
>>
>> That kind of optimizations have an impact on the number of CO-RE
>> relocations emitted.
>>
>> Question:
>>
>> Is the BPF loader, the BPF verifier or any other core component sensible
>> in any way to the number (and ordering) of CO-RE relocations for some
>> given BPF C program? i.e. compiling the same BPF C program above with
>> and without that optimization, will it work in both cases?
>
> Yes, it should.
>
>>
>> If no, then perfect! Different compilers can optimize slightly
>
> Did you mean "if yes, then perfect"? Because otherwise it makes no sense :)
Yeah I was referring to the first question not the second :)
>> differently (or not optimize at all) and we can mark these built-ins as
>> pure in GCC as well, benefiting from optimizations without worrying to
>> have to emit exactly what clang emits.
>
> Yes, it should be fine, as long as the compiler doesn't assume any
> specific value returned by CO-RE relocation (and doesn't perform any
> optimizations based on that assumed value). From the BPF verifier
> side, it's just a constant, so the BPF verifier itself doesn't care.
> From the libbpf/BPF loader standpoint, all that matters is that there
> is CO-RE relocation information that specifies how some BPF
> instruction needs to be adjusted to match the host kernel properly.
> Whether CO-RE relocation is repeated many times, or performed just
> once and that constant value is just reused in the code many times,
> shouldn't matter at all.
Ok, this is good. Thanks for confirming!
>>
>> If yes, wouldn't it be better to disable that kind of optimization in
>> all C BPF compilers, i.e. to make the compilers aware of the side-effect
>> so they will not optimize built-in calls out (or replicate them.) and to
>> make this mandatory in the CO-RE spec? Making a compiler to optimize
>> exactly like another compiler is difficult and sometimes even not
>> feasible.
>>
>> Thanks in advance for the clarification/info!
>>
^ permalink raw reply [flat|nested] 4+ messages in thread