BPF List
 help / color / mirror / Atom feed
* BPF GCC status - Nov 2023
@ 2023-11-28 16:23 Jose E. Marchesi
  2023-11-29  5:50 ` Yonghong Song
  2023-11-30 18:27 ` Andrii Nakryiko
  0 siblings, 2 replies; 13+ messages in thread
From: Jose E. Marchesi @ 2023-11-28 16:23 UTC (permalink / raw)
  To: bpf


[During LPC 2023 we talked about improving communication between the GCC
 BPF toolchain port and the kernel side.  This is the first periodical
 report that we plan to publish in the GCC wiki and send to interested
 parties.  Hopefully this will help.]

GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
IRC channel: #gccbpf at irc.oftc.net.
Help on using the port: gcc@gcc.gnu.org
Patches and/or development discussions: gcc-patches@gnu.org

Assembler
=========

- The BPF assembler was sometimes generating spurious symbols. The
  problem was that supporting the pseudo-C assembly syntax for BPF makes
  it impossible to use the traditional technique of hashing on
  mnemonics.  Instead, we are forced to attempt parsing entries in our
  opcodes table until some instruction template matches.  In some cases
  this was causing the parser to incorrectly parse part of an
  instruction opcode as an expression, which led to the creation of a
  new undefined symbol.

  David Faust installed a fix for this upstream:
  https://sourceware.org/pipermail/binutils/2023-November/130668.html

- gas: change meaning of ; in the BPF assembler.

  The clang assembler interprets semicolons as a statement/directive
  separator.  In the GNU BPF assembler that character was being
  interpreted as the beginning of a line comment, as it is usual in
  assembly languages.  We detected this discrepancy with snippets like:

	asm volatile ("					\
	if r1 >= 0 goto l0_%=;				\
	r0 = 1;						\
	r0 += 2;					\
l0_%=:	exit;						\
"	::: __clobber_all);

  We installed a patch upstream that makes GAS to behave like the clang
  assembler when interpreting semicolons in the assembly programs:
  Jose E. Marchesi
  https://sourceware.org/pipermail/binutils/2023-November/130867.html

  The simulator tests have been updated accordingly:
  Jose E. Marchesi
  https://sourceware.org/pipermail/gdb-patches/2023-November/204581.html

- In the Pseudo-C syntax register names are not preceded by % characters
  nor any other prefix.  A consequence of that is that in contexts like
  instruction operands, where both register names and expressions
  involving symbols are expected, there is no way to disambiguate
  between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
  contexts where no registers were expected, such as in:

    r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error

  The clang assembler wasn't allowing that.  During LPC we agreed that
  the simplest approach is to not allow any symbol to have the same name
  than a register, in any context.  So we changed GAS so it now doesn't
  allow to use register names as symbols in any expression, such as:

    r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
    r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.

  We installed a patch in GAS for this.
  Jose E. Marchesi
  https://sourceware.org/pipermail/binutils/2023-November/130684.html

- Cupertino Miranda fixed a GAS bug in the parsing of registers in
  pseudo-c syntax mode:
  https://sourceware.org/pipermail/binutils/2023-November/130732.html

Compiler
========

 - Remove bpf-helpers.h.

   Now that we are finally able to use the kernel provided bpf_helpers.h
   file and associated machinery, there is no longer need to distribute
   our own version.

   Jose E. Marchesi
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638226.html

 - Restore BPF build, always_inline in libgcc
   Jose E. Marchesi
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637948.html

 - Fix expected regexp in gcc.target/bpf/ldxdw.c test
   Jose E. Marchesi
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635892.html

 - Fix pseudoc-c asm emitted for *mulsidi3_zeroextend
   Jose E. Marchesi
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635896.html

 - Corrected condition in core_mark_as_access index.
   Cupertino Miranda
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636389.html

 - Delayed the removal of the parser enum plugin handler.
   Cupertino Miranda
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636388.html

 - Force inlining __builtin_memcmp upto data sizes of 1024 bytes.
   Cupertino Miranda
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636390.html

 - Emit errors for libcalls and builtin-generated libcalls, like clang
   does.
   Cupertino Miranda
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638117.html

 - GCC was emitting funcall external declarations corresponding to
   attempted but eventually discarded code.  This happened for example
   when GCC tried some particular code that got discarded because there
   was another more performance alternative.  This was a problem with
   the BPF instruction set <= v3, because of lack of signed division.
   This is now fixed upstream.
   Jose E. Marchesi
   BZ 109253
   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638143.html

 - Indu Bhagat is investigating a BTF generation problem that is
   resulting in non-anonymous FUNC_PROTO entries, which are not allowed
   in BTF and rejected by the BPF loader.  This apparently happens when
   functions get inlined.

Pending Patches for bpf-next
============================

These are the current patches we still have to submit to bpf@vger for
bpf-next.  We are in the process of testing them:

- bpf: add more options for gcc-bpf to selftests/bpf/Makefile

  This patch passes the following extra options to BPF_GCC in
  GCC_BPF_BUILD_RULE:

  -masm=pseudoc
  -mco-re
  -Wno-unknown-pragmas
  -Wno-unused-variable
  -Wno-error=attributes
  -Wno-error=address-of-packed-member
  -Wno-compare-distinct-pointer-types
  -fno-strict-aliasing

  Most of them disable interpreting certain warnings as errors.  Code
  like:

    #define __imm_insn(name, expr) [name]"i"(*(long *)&(expr))

  where `expr' is something like a pointer to a bpf_insn, requires
  disabling strict aliasing, which is activated by default with -O2 in
  GCC.

- bpf: use r constraint instead of p constraint

  This was discussed in bpf@vger and it was decided that we would stop
  using the "p" constraint in the BPF kernel selftests.  That constraint
  is not really meant to be used externally to the compiler.

  https://lore.kernel.org/bpf/87edkbnq14.fsf@oracle.com/

- bpf_core_read.h: GCC specific macro for preserve_enum_value

  This patch adds a version of the bpf_core_enum_value macro to be used
  by GCC.  The implementations of CO-RE built-ins in clang and GCC
  require different "magical expressions" to be passed to the built-ins.
  These macros hide the complexity from the user.

- bpf: avoid VLAs in progs/test_xdp_dynptr.c

  In the progs/test_xdp_dynptr.c there are a bunch of VLAs in the
  handle_ipv4 and handle_ipv6 functions:

    const size_t tcphdr_sz = sizeof(struct tcphdr);
    const size_t udphdr_sz = sizeof(struct udphdr);
    const size_t ethhdr_sz = sizeof(struct ethhdr);
    const size_t iphdr_sz = sizeof(struct iphdr);
    const size_t ipv6hdr_sz = sizeof(struct ipv6hdr);
    
    [...]
    
    static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr)
    {
	__u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz];
	__u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz];
	__u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz];
  	[...]
    }
    
    static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr)
    {
  	__u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz];
	__u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz];
	__u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz];
	[...]
    }

  In both GCC and clang we are not allowing dynamic stack allocation (we
  used to support it in GCC using one register as an auxiliary stack
  pointer, but not any longer).

  The above code builds with clang but not with GCC:

    progs/test_xdp_dynptr.c:79:14: error: BPF does not support dynamic stack allocation
       79 |         __u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz];
          |              ^~~~~~~~~~

  We are guessing that clang turns these arrays from VLAs into normal
  statically sized arrays because ethhdr_sz and friends are constant and
  set to sizeof, which is always known at compile time.  This patch
  changes the selftest to use preprocessor constants instead of
  variables:

    #define tcphdr_sz sizeof(struct tcphdr)
    #define udphdr_sz sizeof(struct udphdr)
    #define ethhdr_sz sizeof(struct ethhdr)
    #define iphdr_sz sizeof(struct iphdr)
    #define ipv6hdr_sz sizeof(struct ipv6hdr)

- bpf_helpers.h: define bpf_tail_call_static when building with GCC

- bpf: fix constraint in test_tcpbpf_kern.c

  GCC emits a warning:

    progs/test_tcpbpf_kern.c:60:9: error: ‘op’ is used uninitialized [-Werror=uninitialized]

  when the uninitialized automatic `op' is used with a "+r" constraint
  in:

	asm volatile (
		"%[op] = *(u32 *)(%[skops] +96)"
		: [op] "+r"(op)
		: [skops] "r"(skops)
		:);

  The constraint shall be "=r" instead.


Open Questions
==============

- BPF programs including libc headers.

  BPF programs run on their own without an operating system or a C
  library.  Implementing C implies providing certain definitions and
  headers, such as stdint.h and stdarg.h.  For such targets, known as
  "bare metal targets", the compiler has to provide these definitions
  and headers in order to implement the language.

  GCC provides the following C headers for BPF targets:

    float.h
    gcov.h
    iso646.h
    limits.h
    stdalign.h
    stdarg.h
    stdatomic.h
    stdbool.h
    stdckdint.h
    stddef.h
    stdfix.h
    stdint.h
    stdnoreturn.h
    syslimits.h
    tgmath.h
    unwind.h
    varargs.h

  However, we have found that there is at least one BPF kernel self test
  that include glibc headers that, indirectly, include glibc's own
  definitions of stdint.h and friends.  This leads to compile-time
  errors due to conflicting types.  We think that including headers from
  a glibc built for some host target is very questionable.  For example,
  in BPF a C `char' is defined to be signed.  But if a BPF program
  includes glibc headers in an android system, that code will assume an
  unsigned char instead.

Other Updates
=============

- Brian Witte has adapted the Waldo 80211 debug/test/trace wireless
  analyzer tool to be built with GCC BPF.  This includes CI that uses
  the latest GCC git version, which is quite useful for us.

  https://git.sr.ht/~brianwitte/waldo-80211

- Brian has also published a tested and documented very simple bpf
  program example, with the goal of providing an accessible and
  instructive example for those interested in BPF development with the
  GNU toolchain.

  https://git.sr.ht/~brianwitte/gcc-bpf-example

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-28 16:23 BPF GCC status - Nov 2023 Jose E. Marchesi
@ 2023-11-29  5:50 ` Yonghong Song
  2023-11-29  7:08   ` Jose E. Marchesi
  2023-11-30 18:27 ` Andrii Nakryiko
  1 sibling, 1 reply; 13+ messages in thread
From: Yonghong Song @ 2023-11-29  5:50 UTC (permalink / raw)
  To: Jose E. Marchesi, bpf


On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
> [During LPC 2023 we talked about improving communication between the GCC
>   BPF toolchain port and the kernel side.  This is the first periodical
>   report that we plan to publish in the GCC wiki and send to interested
>   parties.  Hopefully this will help.]
>
> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
> IRC channel: #gccbpf at irc.oftc.net.
> Help on using the port: gcc@gcc.gnu.org
> Patches and/or development discussions: gcc-patches@gnu.org

Thanks a lot for detailed report. Really helpful to nail down
issues facing one or both compilers. See comments below for
some mentioned issues.

>
> Assembler
> =========

[...]

> - In the Pseudo-C syntax register names are not preceded by % characters
>    nor any other prefix.  A consequence of that is that in contexts like
>    instruction operands, where both register names and expressions
>    involving symbols are expected, there is no way to disambiguate
>    between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>    contexts where no registers were expected, such as in:
>
>      r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>
>    The clang assembler wasn't allowing that.  During LPC we agreed that
>    the simplest approach is to not allow any symbol to have the same name
>    than a register, in any context.  So we changed GAS so it now doesn't
>    allow to use register names as symbols in any expression, such as:
>
>      r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>      r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.

Could you provide a reproducible case above for llvm? llvm does not
support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
'r1 += r2' or 'r1 += 100' syntax.

>
>    We installed a patch in GAS for this.
>    Jose E. Marchesi
>    https://sourceware.org/pipermail/binutils/2023-November/130684.html
>
>
> Pending Patches for bpf-next
> ============================
>
>
> - bpf: avoid VLAs in progs/test_xdp_dynptr.c
>
>    In the progs/test_xdp_dynptr.c there are a bunch of VLAs in the
>    handle_ipv4 and handle_ipv6 functions:
>
>      const size_t tcphdr_sz = sizeof(struct tcphdr);
>      const size_t udphdr_sz = sizeof(struct udphdr);
>      const size_t ethhdr_sz = sizeof(struct ethhdr);
>      const size_t iphdr_sz = sizeof(struct iphdr);
>      const size_t ipv6hdr_sz = sizeof(struct ipv6hdr);
>      
>      [...]
>      
>      static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr)
>      {
> 	__u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz];
> 	__u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz];
> 	__u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz];
>    	[...]
>      }
>      
>      static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr)
>      {
>    	__u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz];
> 	__u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz];
> 	__u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz];
> 	[...]
>      }
>
>    In both GCC and clang we are not allowing dynamic stack allocation (we
>    used to support it in GCC using one register as an auxiliary stack
>    pointer, but not any longer).
>
>    The above code builds with clang but not with GCC:
>
>      progs/test_xdp_dynptr.c:79:14: error: BPF does not support dynamic stack allocation
>         79 |         __u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz];
>            |              ^~~~~~~~~~
>
>    We are guessing that clang turns these arrays from VLAs into normal
>    statically sized arrays because ethhdr_sz and friends are constant and
>    set to sizeof, which is always known at compile time.  This patch
>    changes the selftest to use preprocessor constants instead of
>    variables:
>
>      #define tcphdr_sz sizeof(struct tcphdr)
>      #define udphdr_sz sizeof(struct udphdr)
>      #define ethhdr_sz sizeof(struct ethhdr)
>      #define iphdr_sz sizeof(struct iphdr)
>      #define ipv6hdr_sz sizeof(struct ipv6hdr)

Indeed, clang frontend (before generating IR) did some optimization
and calculates the real array size and that is why dynamic stack
allocation didn't happen. Since this is an optimizaiton, there is
no guarantee that frontend is able to calculate the precise
array size in all cases. See llvm patch https://reviews.llvm.org/D111897.

So your above change looks good to me.

>
> - bpf_helpers.h: define bpf_tail_call_static when building with GCC
>
> - bpf: fix constraint in test_tcpbpf_kern.c
>
>    GCC emits a warning:
>
>      progs/test_tcpbpf_kern.c:60:9: error: ‘op’ is used uninitialized [-Werror=uninitialized]
>
>    when the uninitialized automatic `op' is used with a "+r" constraint
>    in:
>
> 	asm volatile (
> 		"%[op] = *(u32 *)(%[skops] +96)"
> 		: [op] "+r"(op)
> 		: [skops] "r"(skops)
> 		:);
>
>    The constraint shall be "=r" instead.

We may miss an error case like above in llvm. Will double check.

>
>
> Open Questions
> ==============
>
> - BPF programs including libc headers.
>
>    BPF programs run on their own without an operating system or a C
>    library.  Implementing C implies providing certain definitions and
>    headers, such as stdint.h and stdarg.h.  For such targets, known as
>    "bare metal targets", the compiler has to provide these definitions
>    and headers in order to implement the language.
>
>    GCC provides the following C headers for BPF targets:
>
>      float.h
>      gcov.h
>      iso646.h
>      limits.h
>      stdalign.h
>      stdarg.h
>      stdatomic.h
>      stdbool.h
>      stdckdint.h
>      stddef.h
>      stdfix.h
>      stdint.h
>      stdnoreturn.h
>      syslimits.h
>      tgmath.h
>      unwind.h
>      varargs.h
>
>    However, we have found that there is at least one BPF kernel self test
>    that include glibc headers that, indirectly, include glibc's own
>    definitions of stdint.h and friends.  This leads to compile-time
>    errors due to conflicting types.  We think that including headers from
>    a glibc built for some host target is very questionable.  For example,
>    in BPF a C `char' is defined to be signed.  But if a BPF program
>    includes glibc headers in an android system, that code will assume an
>    unsigned char instead.

Currently clang side does not have compiler side bpf specific header so
we do not have this issues. We do encourage users to use vmlinux.h, e.g.,
for tracing programs, or for all kinds of programs using kfunc's
where parameters likely being kernel structures. In the future, kfunc
definitions are likely to be included in vmlinux.h itself.
For selftests, we also slowly move to vmlinux.h.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-29  5:50 ` Yonghong Song
@ 2023-11-29  7:08   ` Jose E. Marchesi
  2023-11-29 16:44     ` Yonghong Song
  0 siblings, 1 reply; 13+ messages in thread
From: Jose E. Marchesi @ 2023-11-29  7:08 UTC (permalink / raw)
  To: Yonghong Song; +Cc: bpf


> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>> [During LPC 2023 we talked about improving communication between the GCC
>>   BPF toolchain port and the kernel side.  This is the first periodical
>>   report that we plan to publish in the GCC wiki and send to interested
>>   parties.  Hopefully this will help.]
>>
>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>> IRC channel: #gccbpf at irc.oftc.net.
>> Help on using the port: gcc@gcc.gnu.org
>> Patches and/or development discussions: gcc-patches@gnu.org
>
> Thanks a lot for detailed report. Really helpful to nail down
> issues facing one or both compilers. See comments below for
> some mentioned issues.
>
>>
>> Assembler
>> =========
>
> [...]
>
>> - In the Pseudo-C syntax register names are not preceded by % characters
>>    nor any other prefix.  A consequence of that is that in contexts like
>>    instruction operands, where both register names and expressions
>>    involving symbols are expected, there is no way to disambiguate
>>    between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>    contexts where no registers were expected, such as in:
>>
>>      r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>
>>    The clang assembler wasn't allowing that.  During LPC we agreed that
>>    the simplest approach is to not allow any symbol to have the same name
>>    than a register, in any context.  So we changed GAS so it now doesn't
>>    allow to use register names as symbols in any expression, such as:
>>
>>      r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>      r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>
> Could you provide a reproducible case above for llvm? llvm does not
> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
> 'r1 += r2' or 'r1 += 100' syntax.

It is a 128-bit load with an expression.  In compiler explorer, clang:

  int
  foo ()
  {
    asm volatile ("r1 = 10 + w3 ll");
    return 0;
  }

I get:

  foo:                                    # @foo
          r1 = 10+w3 ll
          r0 = 0
          exit

i.e. `10 + w3' is interpreted as an expression with two operands: the
literal number 10 and a symbol (not a register) `w3'.

If the expression is `w3+10' instead, your parser recognizes the w3 as a
register name and errors out, as expected.

I suppose llvm allows to hook on the expression parser to handle
individual operands.  That's how we handled this in GAS.

>>
>>    We installed a patch in GAS for this.
>>    Jose E. Marchesi
>>    https://sourceware.org/pipermail/binutils/2023-November/130684.html
>>
>>
>> Pending Patches for bpf-next
>> ============================
>>
>>
>> - bpf: avoid VLAs in progs/test_xdp_dynptr.c
>>
>>    In the progs/test_xdp_dynptr.c there are a bunch of VLAs in the
>>    handle_ipv4 and handle_ipv6 functions:
>>
>>      const size_t tcphdr_sz = sizeof(struct tcphdr);
>>      const size_t udphdr_sz = sizeof(struct udphdr);
>>      const size_t ethhdr_sz = sizeof(struct ethhdr);
>>      const size_t iphdr_sz = sizeof(struct iphdr);
>>      const size_t ipv6hdr_sz = sizeof(struct ipv6hdr);
>>           [...]
>>           static __always_inline int handle_ipv6(struct xdp_md *xdp,
>> struct bpf_dynptr *xdp_ptr)
>>      {
>> 	__u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz];
>> 	__u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz];
>> 	__u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz];
>>    	[...]
>>      }
>>           static __always_inline int handle_ipv6(struct xdp_md *xdp,
>> struct bpf_dynptr *xdp_ptr)
>>      {
>>    	__u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz];
>> 	__u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz];
>> 	__u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz];
>> 	[...]
>>      }
>>
>>    In both GCC and clang we are not allowing dynamic stack allocation (we
>>    used to support it in GCC using one register as an auxiliary stack
>>    pointer, but not any longer).
>>
>>    The above code builds with clang but not with GCC:
>>
>>      progs/test_xdp_dynptr.c:79:14: error: BPF does not support dynamic stack allocation
>>         79 |         __u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz];
>>            |              ^~~~~~~~~~
>>
>>    We are guessing that clang turns these arrays from VLAs into normal
>>    statically sized arrays because ethhdr_sz and friends are constant and
>>    set to sizeof, which is always known at compile time.  This patch
>>    changes the selftest to use preprocessor constants instead of
>>    variables:
>>
>>      #define tcphdr_sz sizeof(struct tcphdr)
>>      #define udphdr_sz sizeof(struct udphdr)
>>      #define ethhdr_sz sizeof(struct ethhdr)
>>      #define iphdr_sz sizeof(struct iphdr)
>>      #define ipv6hdr_sz sizeof(struct ipv6hdr)
>
> Indeed, clang frontend (before generating IR) did some optimization
> and calculates the real array size and that is why dynamic stack
> allocation didn't happen. Since this is an optimizaiton, there is
> no guarantee that frontend is able to calculate the precise
> array size in all cases. See llvm patch https://reviews.llvm.org/D111897.
>
> So your above change looks good to me.

Thanks for confirming.

>
>>
>> - bpf_helpers.h: define bpf_tail_call_static when building with GCC
>>
>> - bpf: fix constraint in test_tcpbpf_kern.c
>>
>>    GCC emits a warning:
>>
>>      progs/test_tcpbpf_kern.c:60:9: error: ‘op’ is used uninitialized [-Werror=uninitialized]
>>
>>    when the uninitialized automatic `op' is used with a "+r" constraint
>>    in:
>>
>> 	asm volatile (
>> 		"%[op] = *(u32 *)(%[skops] +96)"
>> 		: [op] "+r"(op)
>> 		: [skops] "r"(skops)
>> 		:);
>>
>>    The constraint shall be "=r" instead.
>
> We may miss an error case like above in llvm. Will double check.

Ok, thanks.

>>
>>
>> Open Questions
>> ==============
>>
>> - BPF programs including libc headers.
>>
>>    BPF programs run on their own without an operating system or a C
>>    library.  Implementing C implies providing certain definitions and
>>    headers, such as stdint.h and stdarg.h.  For such targets, known as
>>    "bare metal targets", the compiler has to provide these definitions
>>    and headers in order to implement the language.
>>
>>    GCC provides the following C headers for BPF targets:
>>
>>      float.h
>>      gcov.h
>>      iso646.h
>>      limits.h
>>      stdalign.h
>>      stdarg.h
>>      stdatomic.h
>>      stdbool.h
>>      stdckdint.h
>>      stddef.h
>>      stdfix.h
>>      stdint.h
>>      stdnoreturn.h
>>      syslimits.h
>>      tgmath.h
>>      unwind.h
>>      varargs.h
>>
>>    However, we have found that there is at least one BPF kernel self test
>>    that include glibc headers that, indirectly, include glibc's own
>>    definitions of stdint.h and friends.  This leads to compile-time
>>    errors due to conflicting types.  We think that including headers from
>>    a glibc built for some host target is very questionable.  For example,
>>    in BPF a C `char' is defined to be signed.  But if a BPF program
>>    includes glibc headers in an android system, that code will assume an
>>    unsigned char instead.
>
> Currently clang side does not have compiler side bpf specific header so
> we do not have this issues. We do encourage users to use vmlinux.h, e.g.,
> for tracing programs, or for all kinds of programs using kfunc's
> where parameters likely being kernel structures. In the future, kfunc
> definitions are likely to be included in vmlinux.h itself.
> For selftests, we also slowly move to vmlinux.h.

We could change GCC in order to not install these headers, but then BPF
programs will need to get the missing standard C definitions from
somewhere else, be it vmlinux.h or some other source.  It will also make
testing the compiler much more difficult, as lots of the GCC testsuite
assume the avaiability of these standard headers.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-29  7:08   ` Jose E. Marchesi
@ 2023-11-29 16:44     ` Yonghong Song
  2023-11-29 17:01       ` Alexei Starovoitov
  2023-11-30 12:13       ` Jose E. Marchesi
  0 siblings, 2 replies; 13+ messages in thread
From: Yonghong Song @ 2023-11-29 16:44 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: bpf


On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>>> [During LPC 2023 we talked about improving communication between the GCC
>>>    BPF toolchain port and the kernel side.  This is the first periodical
>>>    report that we plan to publish in the GCC wiki and send to interested
>>>    parties.  Hopefully this will help.]
>>>
>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>>> IRC channel: #gccbpf at irc.oftc.net.
>>> Help on using the port: gcc@gcc.gnu.org
>>> Patches and/or development discussions: gcc-patches@gnu.org
>> Thanks a lot for detailed report. Really helpful to nail down
>> issues facing one or both compilers. See comments below for
>> some mentioned issues.
>>
>>> Assembler
>>> =========
>> [...]
>>
>>> - In the Pseudo-C syntax register names are not preceded by % characters
>>>     nor any other prefix.  A consequence of that is that in contexts like
>>>     instruction operands, where both register names and expressions
>>>     involving symbols are expected, there is no way to disambiguate
>>>     between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>>     contexts where no registers were expected, such as in:
>>>
>>>       r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>>
>>>     The clang assembler wasn't allowing that.  During LPC we agreed that
>>>     the simplest approach is to not allow any symbol to have the same name
>>>     than a register, in any context.  So we changed GAS so it now doesn't
>>>     allow to use register names as symbols in any expression, such as:
>>>
>>>       r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>>       r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>> Could you provide a reproducible case above for llvm? llvm does not
>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
>> 'r1 += r2' or 'r1 += 100' syntax.
> It is a 128-bit load with an expression.  In compiler explorer, clang:
>
>    int
>    foo ()
>    {
>      asm volatile ("r1 = 10 + w3 ll");
>      return 0;
>    }
>
> I get:
>
>    foo:                                    # @foo
>            r1 = 10+w3 ll
>            r0 = 0
>            exit
>
> i.e. `10 + w3' is interpreted as an expression with two operands: the
> literal number 10 and a symbol (not a register) `w3'.
>
> If the expression is `w3+10' instead, your parser recognizes the w3 as a
> register name and errors out, as expected.
>
> I suppose llvm allows to hook on the expression parser to handle
> individual operands.  That's how we handled this in GAS.

Thanks for the code. I can reproduce the result with compiler explorer.
The following is the link https://godbolt.org/z/GEGexf1Pj
where I added -grecord-gcc-switches to dump compilation flags
into .s file.

The following is the compiler explorer compilation command line:
/opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
   -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
   -g -grecord-command-line /app/example.c

I then compile the above C code with
   clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
with identical flags.

I tried locally with llvm16/17/18. They all failed compilation since
'r1 = 10+w3 ll' cannot be recognized by the llvm.
We will investigate why llvm18 in compiler explorer compiles
differently from my local build.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-29 16:44     ` Yonghong Song
@ 2023-11-29 17:01       ` Alexei Starovoitov
  2023-11-29 17:44         ` Yonghong Song
  2023-11-30 12:13       ` Jose E. Marchesi
  1 sibling, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2023-11-29 17:01 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Jose E. Marchesi, bpf

On Wed, Nov 29, 2023 at 8:44 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
> >> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
> >>> [During LPC 2023 we talked about improving communication between the GCC
> >>>    BPF toolchain port and the kernel side.  This is the first periodical
> >>>    report that we plan to publish in the GCC wiki and send to interested
> >>>    parties.  Hopefully this will help.]
> >>>
> >>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
> >>> IRC channel: #gccbpf at irc.oftc.net.
> >>> Help on using the port: gcc@gcc.gnu.org
> >>> Patches and/or development discussions: gcc-patches@gnu.org
> >> Thanks a lot for detailed report. Really helpful to nail down
> >> issues facing one or both compilers. See comments below for
> >> some mentioned issues.
> >>
> >>> Assembler
> >>> =========
> >> [...]
> >>
> >>> - In the Pseudo-C syntax register names are not preceded by % characters
> >>>     nor any other prefix.  A consequence of that is that in contexts like
> >>>     instruction operands, where both register names and expressions
> >>>     involving symbols are expected, there is no way to disambiguate
> >>>     between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
> >>>     contexts where no registers were expected, such as in:
> >>>
> >>>       r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
> >>>
> >>>     The clang assembler wasn't allowing that.  During LPC we agreed that
> >>>     the simplest approach is to not allow any symbol to have the same name
> >>>     than a register, in any context.  So we changed GAS so it now doesn't
> >>>     allow to use register names as symbols in any expression, such as:
> >>>
> >>>       r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
> >>>       r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
> >> Could you provide a reproducible case above for llvm? llvm does not
> >> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
> >> 'r1 += r2' or 'r1 += 100' syntax.
> > It is a 128-bit load with an expression.  In compiler explorer, clang:
> >
> >    int
> >    foo ()
> >    {
> >      asm volatile ("r1 = 10 + w3 ll");
> >      return 0;
> >    }
> >
> > I get:
> >
> >    foo:                                    # @foo
> >            r1 = 10+w3 ll
> >            r0 = 0
> >            exit
> >
> > i.e. `10 + w3' is interpreted as an expression with two operands: the
> > literal number 10 and a symbol (not a register) `w3'.
> >
> > If the expression is `w3+10' instead, your parser recognizes the w3 as a
> > register name and errors out, as expected.
> >
> > I suppose llvm allows to hook on the expression parser to handle
> > individual operands.  That's how we handled this in GAS.
>
> Thanks for the code. I can reproduce the result with compiler explorer.
> The following is the link https://godbolt.org/z/GEGexf1Pj
> where I added -grecord-gcc-switches to dump compilation flags
> into .s file.
>
> The following is the compiler explorer compilation command line:
> /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
>    -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
>    -g -grecord-command-line /app/example.c
>
> I then compile the above C code with
>    clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
> with identical flags.
>
> I tried locally with llvm16/17/18. They all failed compilation since
> 'r1 = 10+w3 ll' cannot be recognized by the llvm.
> We will investigate why llvm18 in compiler explorer compiles
> differently from my local build.

Is that a different issue from:
https://github.com/compiler-explorer/compiler-explorer/issues/5701
?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-29 17:01       ` Alexei Starovoitov
@ 2023-11-29 17:44         ` Yonghong Song
  0 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2023-11-29 17:44 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jose E. Marchesi, bpf


On 11/29/23 12:01 PM, Alexei Starovoitov wrote:
> On Wed, Nov 29, 2023 at 8:44 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
>>>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>>>>> [During LPC 2023 we talked about improving communication between the GCC
>>>>>     BPF toolchain port and the kernel side.  This is the first periodical
>>>>>     report that we plan to publish in the GCC wiki and send to interested
>>>>>     parties.  Hopefully this will help.]
>>>>>
>>>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>>>>> IRC channel: #gccbpf at irc.oftc.net.
>>>>> Help on using the port: gcc@gcc.gnu.org
>>>>> Patches and/or development discussions: gcc-patches@gnu.org
>>>> Thanks a lot for detailed report. Really helpful to nail down
>>>> issues facing one or both compilers. See comments below for
>>>> some mentioned issues.
>>>>
>>>>> Assembler
>>>>> =========
>>>> [...]
>>>>
>>>>> - In the Pseudo-C syntax register names are not preceded by % characters
>>>>>      nor any other prefix.  A consequence of that is that in contexts like
>>>>>      instruction operands, where both register names and expressions
>>>>>      involving symbols are expected, there is no way to disambiguate
>>>>>      between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>>>>      contexts where no registers were expected, such as in:
>>>>>
>>>>>        r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>>>>
>>>>>      The clang assembler wasn't allowing that.  During LPC we agreed that
>>>>>      the simplest approach is to not allow any symbol to have the same name
>>>>>      than a register, in any context.  So we changed GAS so it now doesn't
>>>>>      allow to use register names as symbols in any expression, such as:
>>>>>
>>>>>        r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>>>>        r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>>>> Could you provide a reproducible case above for llvm? llvm does not
>>>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
>>>> 'r1 += r2' or 'r1 += 100' syntax.
>>> It is a 128-bit load with an expression.  In compiler explorer, clang:
>>>
>>>     int
>>>     foo ()
>>>     {
>>>       asm volatile ("r1 = 10 + w3 ll");
>>>       return 0;
>>>     }
>>>
>>> I get:
>>>
>>>     foo:                                    # @foo
>>>             r1 = 10+w3 ll
>>>             r0 = 0
>>>             exit
>>>
>>> i.e. `10 + w3' is interpreted as an expression with two operands: the
>>> literal number 10 and a symbol (not a register) `w3'.
>>>
>>> If the expression is `w3+10' instead, your parser recognizes the w3 as a
>>> register name and errors out, as expected.
>>>
>>> I suppose llvm allows to hook on the expression parser to handle
>>> individual operands.  That's how we handled this in GAS.
>> Thanks for the code. I can reproduce the result with compiler explorer.
>> The following is the link https://godbolt.org/z/GEGexf1Pj
>> where I added -grecord-gcc-switches to dump compilation flags
>> into .s file.
>>
>> The following is the compiler explorer compilation command line:
>> /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
>>     -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
>>     -g -grecord-command-line /app/example.c
>>
>> I then compile the above C code with
>>     clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
>> with identical flags.
>>
>> I tried locally with llvm16/17/18. They all failed compilation since
>> 'r1 = 10+w3 ll' cannot be recognized by the llvm.
>> We will investigate why llvm18 in compiler explorer compiles
>> differently from my local build.
> Is that a different issue from:
> https://github.com/compiler-explorer/compiler-explorer/issues/5701
> ?
Yes, it is a different one. I verified that the issue #5701 has been fixed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-29 16:44     ` Yonghong Song
  2023-11-29 17:01       ` Alexei Starovoitov
@ 2023-11-30 12:13       ` Jose E. Marchesi
  2023-11-30 14:58         ` Yonghong Song
  1 sibling, 1 reply; 13+ messages in thread
From: Jose E. Marchesi @ 2023-11-30 12:13 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Jose E. Marchesi, bpf


> On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
>>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>>>> [During LPC 2023 we talked about improving communication between the GCC
>>>>    BPF toolchain port and the kernel side.  This is the first periodical
>>>>    report that we plan to publish in the GCC wiki and send to interested
>>>>    parties.  Hopefully this will help.]
>>>>
>>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>>>> IRC channel: #gccbpf at irc.oftc.net.
>>>> Help on using the port: gcc@gcc.gnu.org
>>>> Patches and/or development discussions: gcc-patches@gnu.org
>>> Thanks a lot for detailed report. Really helpful to nail down
>>> issues facing one or both compilers. See comments below for
>>> some mentioned issues.
>>>
>>>> Assembler
>>>> =========
>>> [...]
>>>
>>>> - In the Pseudo-C syntax register names are not preceded by % characters
>>>>     nor any other prefix.  A consequence of that is that in contexts like
>>>>     instruction operands, where both register names and expressions
>>>>     involving symbols are expected, there is no way to disambiguate
>>>>     between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>>>     contexts where no registers were expected, such as in:
>>>>
>>>>       r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>>>
>>>>     The clang assembler wasn't allowing that.  During LPC we agreed that
>>>>     the simplest approach is to not allow any symbol to have the same name
>>>>     than a register, in any context.  So we changed GAS so it now doesn't
>>>>     allow to use register names as symbols in any expression, such as:
>>>>
>>>>       r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>>>       r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>>> Could you provide a reproducible case above for llvm? llvm does not
>>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
>>> 'r1 += r2' or 'r1 += 100' syntax.
>> It is a 128-bit load with an expression.  In compiler explorer, clang:
>>
>>    int
>>    foo ()
>>    {
>>      asm volatile ("r1 = 10 + w3 ll");
>>      return 0;
>>    }
>>
>> I get:
>>
>>    foo:                                    # @foo
>>            r1 = 10+w3 ll
>>            r0 = 0
>>            exit
>>
>> i.e. `10 + w3' is interpreted as an expression with two operands: the
>> literal number 10 and a symbol (not a register) `w3'.
>>
>> If the expression is `w3+10' instead, your parser recognizes the w3 as a
>> register name and errors out, as expected.
>>
>> I suppose llvm allows to hook on the expression parser to handle
>> individual operands.  That's how we handled this in GAS.
>
> Thanks for the code. I can reproduce the result with compiler explorer.
> The following is the link https://godbolt.org/z/GEGexf1Pj
> where I added -grecord-gcc-switches to dump compilation flags
> into .s file.
>
> The following is the compiler explorer compilation command line:
> /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
>   -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
>   -g -grecord-command-line /app/example.c
>
> I then compile the above C code with
>   clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
> with identical flags.
>
> I tried locally with llvm16/17/18. They all failed compilation since
> 'r1 = 10+w3 ll' cannot be recognized by the llvm.
> We will investigate why llvm18 in compiler explorer compiles
> differently from my local build.

I updated git llvm master today and I managed to reproduce locally with:

jemarch@termi:~/gnu/src/llvm-project/llvm/build$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 586986a063ee4b9a7490aac102e103bab121c764)
Target: unknown
Thread model: posix
InstalledDir: /usr/local/bin
$ cat foo.c
    int
    foo ()
    {
      asm volatile ("r1 = 10 + w3 ll");
      return 0;
    }
$ clang -target bpf -c foo.c
$ llvm-objdump -dr foo.o

foo.o:	file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
       0:	18 01 00 00 0a 00 00 00 00 00 00 00 00 00 00 00	r1 = 0xa ll
		0000000000000000:  R_BPF_64_64	w3
       2:	b7 00 00 00 00 00 00 00	r0 = 0x0
       3:	95 00 00 00 00 00 00 00	exit

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-30 12:13       ` Jose E. Marchesi
@ 2023-11-30 14:58         ` Yonghong Song
  2023-11-30 15:06           ` Jose E. Marchesi
  0 siblings, 1 reply; 13+ messages in thread
From: Yonghong Song @ 2023-11-30 14:58 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Jose E. Marchesi, bpf


On 11/30/23 7:13 AM, Jose E. Marchesi wrote:
>> On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
>>>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>>>>> [During LPC 2023 we talked about improving communication between the GCC
>>>>>     BPF toolchain port and the kernel side.  This is the first periodical
>>>>>     report that we plan to publish in the GCC wiki and send to interested
>>>>>     parties.  Hopefully this will help.]
>>>>>
>>>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>>>>> IRC channel: #gccbpf at irc.oftc.net.
>>>>> Help on using the port: gcc@gcc.gnu.org
>>>>> Patches and/or development discussions: gcc-patches@gnu.org
>>>> Thanks a lot for detailed report. Really helpful to nail down
>>>> issues facing one or both compilers. See comments below for
>>>> some mentioned issues.
>>>>
>>>>> Assembler
>>>>> =========
>>>> [...]
>>>>
>>>>> - In the Pseudo-C syntax register names are not preceded by % characters
>>>>>      nor any other prefix.  A consequence of that is that in contexts like
>>>>>      instruction operands, where both register names and expressions
>>>>>      involving symbols are expected, there is no way to disambiguate
>>>>>      between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>>>>      contexts where no registers were expected, such as in:
>>>>>
>>>>>        r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>>>>
>>>>>      The clang assembler wasn't allowing that.  During LPC we agreed that
>>>>>      the simplest approach is to not allow any symbol to have the same name
>>>>>      than a register, in any context.  So we changed GAS so it now doesn't
>>>>>      allow to use register names as symbols in any expression, such as:
>>>>>
>>>>>        r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>>>>        r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>>>> Could you provide a reproducible case above for llvm? llvm does not
>>>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
>>>> 'r1 += r2' or 'r1 += 100' syntax.
>>> It is a 128-bit load with an expression.  In compiler explorer, clang:
>>>
>>>     int
>>>     foo ()
>>>     {
>>>       asm volatile ("r1 = 10 + w3 ll");
>>>       return 0;
>>>     }
>>>
>>> I get:
>>>
>>>     foo:                                    # @foo
>>>             r1 = 10+w3 ll
>>>             r0 = 0
>>>             exit
>>>
>>> i.e. `10 + w3' is interpreted as an expression with two operands: the
>>> literal number 10 and a symbol (not a register) `w3'.
>>>
>>> If the expression is `w3+10' instead, your parser recognizes the w3 as a
>>> register name and errors out, as expected.
>>>
>>> I suppose llvm allows to hook on the expression parser to handle
>>> individual operands.  That's how we handled this in GAS.
>> Thanks for the code. I can reproduce the result with compiler explorer.
>> The following is the link https://godbolt.org/z/GEGexf1Pj
>> where I added -grecord-gcc-switches to dump compilation flags
>> into .s file.
>>
>> The following is the compiler explorer compilation command line:
>> /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
>>    -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
>>    -g -grecord-command-line /app/example.c
>>
>> I then compile the above C code with
>>    clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
>> with identical flags.
>>
>> I tried locally with llvm16/17/18. They all failed compilation since
>> 'r1 = 10+w3 ll' cannot be recognized by the llvm.
>> We will investigate why llvm18 in compiler explorer compiles
>> differently from my local build.
> I updated git llvm master today and I managed to reproduce locally with:
>
> jemarch@termi:~/gnu/src/llvm-project/llvm/build$ clang --version
> clang version 18.0.0 (https://github.com/llvm/llvm-project.git 586986a063ee4b9a7490aac102e103bab121c764)
> Target: unknown
> Thread model: posix
> InstalledDir: /usr/local/bin
> $ cat foo.c
>      int
>      foo ()
>      {
>        asm volatile ("r1 = 10 + w3 ll");
>        return 0;
>      }
> $ clang -target bpf -c foo.c
> $ llvm-objdump -dr foo.o
>
> foo.o:	file format elf64-bpf
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
>         0:	18 01 00 00 0a 00 00 00 00 00 00 00 00 00 00 00	r1 = 0xa ll
> 		0000000000000000:  R_BPF_64_64	w3
>         2:	b7 00 00 00 00 00 00 00	r0 = 0x0
>         3:	95 00 00 00 00 00 00 00	exit

Could you share the cmake command line options when you build you clang?
My cmake command line looks like
cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja \
     -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
     -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
     -DLLVM_ENABLE_ASSERTIONS=ON \
     -DLLVM_ENABLE_ZLIB=ON \
     -DCMAKE_INSTALL_PREFIX=$PWD/install

and cannot reproduce the issue.
Thanks!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-30 14:58         ` Yonghong Song
@ 2023-11-30 15:06           ` Jose E. Marchesi
  2023-11-30 17:39             ` Yonghong Song
  0 siblings, 1 reply; 13+ messages in thread
From: Jose E. Marchesi @ 2023-11-30 15:06 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Jose E. Marchesi, bpf


> On 11/30/23 7:13 AM, Jose E. Marchesi wrote:
>>> On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
>>>>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>>>>>> [During LPC 2023 we talked about improving communication between the GCC
>>>>>>     BPF toolchain port and the kernel side.  This is the first periodical
>>>>>>     report that we plan to publish in the GCC wiki and send to interested
>>>>>>     parties.  Hopefully this will help.]
>>>>>>
>>>>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>>>>>> IRC channel: #gccbpf at irc.oftc.net.
>>>>>> Help on using the port: gcc@gcc.gnu.org
>>>>>> Patches and/or development discussions: gcc-patches@gnu.org
>>>>> Thanks a lot for detailed report. Really helpful to nail down
>>>>> issues facing one or both compilers. See comments below for
>>>>> some mentioned issues.
>>>>>
>>>>>> Assembler
>>>>>> =========
>>>>> [...]
>>>>>
>>>>>> - In the Pseudo-C syntax register names are not preceded by % characters
>>>>>>      nor any other prefix.  A consequence of that is that in contexts like
>>>>>>      instruction operands, where both register names and expressions
>>>>>>      involving symbols are expected, there is no way to disambiguate
>>>>>>      between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>>>>>      contexts where no registers were expected, such as in:
>>>>>>
>>>>>>        r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>>>>>
>>>>>>      The clang assembler wasn't allowing that.  During LPC we agreed that
>>>>>>      the simplest approach is to not allow any symbol to have the same name
>>>>>>      than a register, in any context.  So we changed GAS so it now doesn't
>>>>>>      allow to use register names as symbols in any expression, such as:
>>>>>>
>>>>>>        r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>>>>>        r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>>>>> Could you provide a reproducible case above for llvm? llvm does not
>>>>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
>>>>> 'r1 += r2' or 'r1 += 100' syntax.
>>>> It is a 128-bit load with an expression.  In compiler explorer, clang:
>>>>
>>>>     int
>>>>     foo ()
>>>>     {
>>>>       asm volatile ("r1 = 10 + w3 ll");
>>>>       return 0;
>>>>     }
>>>>
>>>> I get:
>>>>
>>>>     foo:                                    # @foo
>>>>             r1 = 10+w3 ll
>>>>             r0 = 0
>>>>             exit
>>>>
>>>> i.e. `10 + w3' is interpreted as an expression with two operands: the
>>>> literal number 10 and a symbol (not a register) `w3'.
>>>>
>>>> If the expression is `w3+10' instead, your parser recognizes the w3 as a
>>>> register name and errors out, as expected.
>>>>
>>>> I suppose llvm allows to hook on the expression parser to handle
>>>> individual operands.  That's how we handled this in GAS.
>>> Thanks for the code. I can reproduce the result with compiler explorer.
>>> The following is the link https://godbolt.org/z/GEGexf1Pj
>>> where I added -grecord-gcc-switches to dump compilation flags
>>> into .s file.
>>>
>>> The following is the compiler explorer compilation command line:
>>> /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
>>>    -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
>>>    -g -grecord-command-line /app/example.c
>>>
>>> I then compile the above C code with
>>>    clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
>>> with identical flags.
>>>
>>> I tried locally with llvm16/17/18. They all failed compilation since
>>> 'r1 = 10+w3 ll' cannot be recognized by the llvm.
>>> We will investigate why llvm18 in compiler explorer compiles
>>> differently from my local build.
>> I updated git llvm master today and I managed to reproduce locally with:
>>
>> jemarch@termi:~/gnu/src/llvm-project/llvm/build$ clang --version
>> clang version 18.0.0 (https://github.com/llvm/llvm-project.git 586986a063ee4b9a7490aac102e103bab121c764)
>> Target: unknown
>> Thread model: posix
>> InstalledDir: /usr/local/bin
>> $ cat foo.c
>>      int
>>      foo ()
>>      {
>>        asm volatile ("r1 = 10 + w3 ll");
>>        return 0;
>>      }
>> $ clang -target bpf -c foo.c
>> $ llvm-objdump -dr foo.o
>>
>> foo.o:	file format elf64-bpf
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <foo>:
>>         0:	18 01 00 00 0a 00 00 00 00 00 00 00 00 00 00 00	r1 = 0xa ll
>> 		0000000000000000:  R_BPF_64_64	w3
>>         2:	b7 00 00 00 00 00 00 00	r0 = 0x0
>>         3:	95 00 00 00 00 00 00 00	exit
>
> Could you share the cmake command line options when you build you clang?
> My cmake command line looks like
> cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja \
>     -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
>     -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
>     -DLLVM_ENABLE_ASSERTIONS=ON \
>     -DLLVM_ENABLE_ZLIB=ON \
>     -DCMAKE_INSTALL_PREFIX=$PWD/install
>
> and cannot reproduce the issue.

I don't have the original cmake command, I executed it long ago
(rebuilding clang/llvm in my laptop takes three days or more so I do it
incrementally.)

I see this in my CMakeCache.txt:

  LLVM_ENABLE_PROJECTS:STRING=clang
  LLVM_TARGETS_TO_BUILD:STRING=BPF
  LLVM_ENABLE_ASSERTIONS:BOOL=OFF
  LLVM_ENABLE_ZLIB:STRING=ON
  CMAKE_INSTALL_PREFIX:PATH=/usr/local

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-30 15:06           ` Jose E. Marchesi
@ 2023-11-30 17:39             ` Yonghong Song
  0 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2023-11-30 17:39 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Jose E. Marchesi, bpf


On 11/30/23 10:06 AM, Jose E. Marchesi wrote:
>> On 11/30/23 7:13 AM, Jose E. Marchesi wrote:
>>>> On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
>>>>>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
>>>>>>> [During LPC 2023 we talked about improving communication between the GCC
>>>>>>>      BPF toolchain port and the kernel side.  This is the first periodical
>>>>>>>      report that we plan to publish in the GCC wiki and send to interested
>>>>>>>      parties.  Hopefully this will help.]
>>>>>>>
>>>>>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
>>>>>>> IRC channel: #gccbpf at irc.oftc.net.
>>>>>>> Help on using the port: gcc@gcc.gnu.org
>>>>>>> Patches and/or development discussions: gcc-patches@gnu.org
>>>>>> Thanks a lot for detailed report. Really helpful to nail down
>>>>>> issues facing one or both compilers. See comments below for
>>>>>> some mentioned issues.
>>>>>>
>>>>>>> Assembler
>>>>>>> =========
>>>>>> [...]
>>>>>>
>>>>>>> - In the Pseudo-C syntax register names are not preceded by % characters
>>>>>>>       nor any other prefix.  A consequence of that is that in contexts like
>>>>>>>       instruction operands, where both register names and expressions
>>>>>>>       involving symbols are expected, there is no way to disambiguate
>>>>>>>       between them.  GAS was allowing symbols like `w3' or `r5' in syntactic
>>>>>>>       contexts where no registers were expected, such as in:
>>>>>>>
>>>>>>>         r0 = w3 ll  ; GAS interpreted w3 as symbol, clang emits error
>>>>>>>
>>>>>>>       The clang assembler wasn't allowing that.  During LPC we agreed that
>>>>>>>       the simplest approach is to not allow any symbol to have the same name
>>>>>>>       than a register, in any context.  So we changed GAS so it now doesn't
>>>>>>>       allow to use register names as symbols in any expression, such as:
>>>>>>>
>>>>>>>         r0 = w3 + 1 ll  ; This now fails for both GAS and llvm.
>>>>>>>         r0 = 1 + w3 ll  ; NOTE this does not fail with llvm, but it should.
>>>>>> Could you provide a reproducible case above for llvm? llvm does not
>>>>>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
>>>>>> 'r1 += r2' or 'r1 += 100' syntax.
>>>>> It is a 128-bit load with an expression.  In compiler explorer, clang:
>>>>>
>>>>>      int
>>>>>      foo ()
>>>>>      {
>>>>>        asm volatile ("r1 = 10 + w3 ll");
>>>>>        return 0;
>>>>>      }
>>>>>
>>>>> I get:
>>>>>
>>>>>      foo:                                    # @foo
>>>>>              r1 = 10+w3 ll
>>>>>              r0 = 0
>>>>>              exit
>>>>>
>>>>> i.e. `10 + w3' is interpreted as an expression with two operands: the
>>>>> literal number 10 and a symbol (not a register) `w3'.
>>>>>
>>>>> If the expression is `w3+10' instead, your parser recognizes the w3 as a
>>>>> register name and errors out, as expected.
>>>>>
>>>>> I suppose llvm allows to hook on the expression parser to handle
>>>>> individual operands.  That's how we handled this in GAS.
>>>> Thanks for the code. I can reproduce the result with compiler explorer.
>>>> The following is the link https://godbolt.org/z/GEGexf1Pj
>>>> where I added -grecord-gcc-switches to dump compilation flags
>>>> into .s file.
>>>>
>>>> The following is the compiler explorer compilation command line:
>>>> /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
>>>>     -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
>>>>     -g -grecord-command-line /app/example.c
>>>>
>>>> I then compile the above C code with
>>>>     clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
>>>> with identical flags.
>>>>
>>>> I tried locally with llvm16/17/18. They all failed compilation since
>>>> 'r1 = 10+w3 ll' cannot be recognized by the llvm.
>>>> We will investigate why llvm18 in compiler explorer compiles
>>>> differently from my local build.
>>> I updated git llvm master today and I managed to reproduce locally with:
>>>
>>> jemarch@termi:~/gnu/src/llvm-project/llvm/build$ clang --version
>>> clang version 18.0.0 (https://github.com/llvm/llvm-project.git 586986a063ee4b9a7490aac102e103bab121c764)
>>> Target: unknown
>>> Thread model: posix
>>> InstalledDir: /usr/local/bin
>>> $ cat foo.c
>>>       int
>>>       foo ()
>>>       {
>>>         asm volatile ("r1 = 10 + w3 ll");
>>>         return 0;
>>>       }
>>> $ clang -target bpf -c foo.c
>>> $ llvm-objdump -dr foo.o
>>>
>>> foo.o:	file format elf64-bpf
>>>
>>> Disassembly of section .text:
>>>
>>> 0000000000000000 <foo>:
>>>          0:	18 01 00 00 0a 00 00 00 00 00 00 00 00 00 00 00	r1 = 0xa ll
>>> 		0000000000000000:  R_BPF_64_64	w3
>>>          2:	b7 00 00 00 00 00 00 00	r0 = 0x0
>>>          3:	95 00 00 00 00 00 00 00	exit
>> Could you share the cmake command line options when you build you clang?
>> My cmake command line looks like
>> cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja \
>>      -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
>>      -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
>>      -DLLVM_ENABLE_ASSERTIONS=ON \
>>      -DLLVM_ENABLE_ZLIB=ON \
>>      -DCMAKE_INSTALL_PREFIX=$PWD/install
>>
>> and cannot reproduce the issue.
> I don't have the original cmake command, I executed it long ago
> (rebuilding clang/llvm in my laptop takes three days or more so I do it
> incrementally.)
>
> I see this in my CMakeCache.txt:
>
>    LLVM_ENABLE_PROJECTS:STRING=clang
>    LLVM_TARGETS_TO_BUILD:STRING=BPF
>    LLVM_ENABLE_ASSERTIONS:BOOL=OFF
>    LLVM_ENABLE_ZLIB:STRING=ON
>    CMAKE_INSTALL_PREFIX:PATH=/usr/local

Thanks for your cmake command line options. Looks like the reason is due to LLVM_ENABLE_ASSERTIONS=OFF while in my case
LLVM_ENABLE_ASSERTIONS=ON.

The related function in llvm is:

static void printExpr(const MCExpr *Expr, raw_ostream &O) {
#ifndef NDEBUG
   const MCSymbolRefExpr *SRE;

   if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
     SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
   else
     SRE = dyn_cast<MCSymbolRefExpr>(Expr);
   assert(SRE && "Unexpected MCExpr type.");

   MCSymbolRefExpr::VariantKind Kind = SRE->getKind();

   assert(Kind == MCSymbolRefExpr::VK_None);
#endif
   O << *Expr;
}

If LLVM_ENABLE_ASSERTIONS=ON, NDEBUG will not be defined
and 'assert' will actually do assertion.
If LLVM_ENABLE_ASSERTIONS=OFF, NDEBUG will be defined
and 'assert' will be a noop.

That is why ASSERTIONS OFF flag is okay while ASSERTIONS ON
will cause the following error:
    $ clang --target=bpf -g -S -O2 t.c
    clang: ../lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp:46: void printExpr(const llvm::MCExpr*, llvm::raw_ostream&):
        Assertion `SRE && "Unexpected MCExpr type."' failed.
    .... stack trace etc. ....

I also tried with my local redhat built clang15 and it didn't produce error either.
   $ /bin/clang --target=bpf -g -S -O2 t.c
   $ rpm -qf /bin/clang
   clang-16.0.6-1.el9.x86_64
Looks like their cmake options does not have LLVM_ENABLE_ASSERTIONS at all
which I assume is OFF. See
   https://gitlab.com/redhat/centos-stream/rpms/clang/-/blob/4fcf8241b99430ba239c5461b962fea1f3107a22/clang.spec

clang really does not support this syntax:
    r1 = 10 + w3 ll

The following clang patch will emit error regardless of LLVM_ENABLE_ASSERTIONS value.

======

diff --git a/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp b/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp
index 15ab55f95e69..c266538bec73 100644
--- a/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp
+++ b/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp
@@ -36,15 +36,16 @@ void BPFInstPrinter::printInst(const MCInst *MI, uint64_t Address,
  }
  
  static void printExpr(const MCExpr *Expr, raw_ostream &O) {
-#ifndef NDEBUG
    const MCSymbolRefExpr *SRE;
  
    if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
      SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
    else
      SRE = dyn_cast<MCSymbolRefExpr>(Expr);
-  assert(SRE && "Unexpected MCExpr type.");
+  if (!SRE)
+    report_fatal_error("Unexpected MCExpr type.");
  
+#ifndef NDEBUG
    MCSymbolRefExpr::VariantKind Kind = SRE->getKind();
  
    assert(Kind == MCSymbolRefExpr::VK_None);

=======


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-28 16:23 BPF GCC status - Nov 2023 Jose E. Marchesi
  2023-11-29  5:50 ` Yonghong Song
@ 2023-11-30 18:27 ` Andrii Nakryiko
  2023-11-30 19:49   ` Jose E. Marchesi
  1 sibling, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2023-11-30 18:27 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: bpf

On Tue, Nov 28, 2023 at 8:23 AM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> [During LPC 2023 we talked about improving communication between the GCC
>  BPF toolchain port and the kernel side.  This is the first periodical
>  report that we plan to publish in the GCC wiki and send to interested
>  parties.  Hopefully this will help.]
>

[...]

> Open Questions
> ==============
>
> - BPF programs including libc headers.
>
>   BPF programs run on their own without an operating system or a C
>   library.  Implementing C implies providing certain definitions and
>   headers, such as stdint.h and stdarg.h.  For such targets, known as
>   "bare metal targets", the compiler has to provide these definitions
>   and headers in order to implement the language.
>
>   GCC provides the following C headers for BPF targets:
>
>     float.h
>     gcov.h
>     iso646.h
>     limits.h
>     stdalign.h
>     stdarg.h
>     stdatomic.h
>     stdbool.h
>     stdckdint.h
>     stddef.h
>     stdfix.h
>     stdint.h
>     stdnoreturn.h
>     syslimits.h
>     tgmath.h
>     unwind.h
>     varargs.h
>
>   However, we have found that there is at least one BPF kernel self test
>   that include glibc headers that, indirectly, include glibc's own
>   definitions of stdint.h and friends.  This leads to compile-time
>   errors due to conflicting types.  We think that including headers from
>   a glibc built for some host target is very questionable.  For example,
>   in BPF a C `char' is defined to be signed.  But if a BPF program
>   includes glibc headers in an android system, that code will assume an
>   unsigned char instead.
>

Do you have a list of those tests?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-30 18:27 ` Andrii Nakryiko
@ 2023-11-30 19:49   ` Jose E. Marchesi
  2023-12-01 21:38     ` Andrii Nakryiko
  0 siblings, 1 reply; 13+ messages in thread
From: Jose E. Marchesi @ 2023-11-30 19:49 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf


> On Tue, Nov 28, 2023 at 8:23 AM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
>>
>>
>> [During LPC 2023 we talked about improving communication between the GCC
>>  BPF toolchain port and the kernel side.  This is the first periodical
>>  report that we plan to publish in the GCC wiki and send to interested
>>  parties.  Hopefully this will help.]
>>
>
> [...]
>
>> Open Questions
>> ==============
>>
>> - BPF programs including libc headers.
>>
>>   BPF programs run on their own without an operating system or a C
>>   library.  Implementing C implies providing certain definitions and
>>   headers, such as stdint.h and stdarg.h.  For such targets, known as
>>   "bare metal targets", the compiler has to provide these definitions
>>   and headers in order to implement the language.
>>
>>   GCC provides the following C headers for BPF targets:
>>
>>     float.h
>>     gcov.h
>>     iso646.h
>>     limits.h
>>     stdalign.h
>>     stdarg.h
>>     stdatomic.h
>>     stdbool.h
>>     stdckdint.h
>>     stddef.h
>>     stdfix.h
>>     stdint.h
>>     stdnoreturn.h
>>     syslimits.h
>>     tgmath.h
>>     unwind.h
>>     varargs.h
>>
>>   However, we have found that there is at least one BPF kernel self test
>>   that include glibc headers that, indirectly, include glibc's own
>>   definitions of stdint.h and friends.  This leads to compile-time
>>   errors due to conflicting types.  We think that including headers from
>>   a glibc built for some host target is very questionable.  For example,
>>   in BPF a C `char' is defined to be signed.  But if a BPF program
>>   includes glibc headers in an android system, that code will assume an
>>   unsigned char instead.
>>
>
> Do you have a list of those tests?

For example:

  progs/test_cls_redirect.c
  progs/test_cls_redirect_dynptr.c
  progs/test_cls_redirect_subprogs.c

they include linux/icmp.h that, in turn:

 linux/icmp.h <- linux/if.h <- sys/socket.h <- bits/socket.h <- sys/types.h

If BPF programs are expected to be able to liberally include kernel
headers that, in turn, may include glibc headers, then it is gonna be
very difficult to consistently avoid these conflicts..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BPF GCC status - Nov 2023
  2023-11-30 19:49   ` Jose E. Marchesi
@ 2023-12-01 21:38     ` Andrii Nakryiko
  0 siblings, 0 replies; 13+ messages in thread
From: Andrii Nakryiko @ 2023-12-01 21:38 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: bpf

On Thu, Nov 30, 2023 at 11:50 AM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> > On Tue, Nov 28, 2023 at 8:23 AM Jose E. Marchesi
> > <jose.marchesi@oracle.com> wrote:
> >>
> >>
> >> [During LPC 2023 we talked about improving communication between the GCC
> >>  BPF toolchain port and the kernel side.  This is the first periodical
> >>  report that we plan to publish in the GCC wiki and send to interested
> >>  parties.  Hopefully this will help.]
> >>
> >
> > [...]
> >
> >> Open Questions
> >> ==============
> >>
> >> - BPF programs including libc headers.
> >>
> >>   BPF programs run on their own without an operating system or a C
> >>   library.  Implementing C implies providing certain definitions and
> >>   headers, such as stdint.h and stdarg.h.  For such targets, known as
> >>   "bare metal targets", the compiler has to provide these definitions
> >>   and headers in order to implement the language.
> >>
> >>   GCC provides the following C headers for BPF targets:
> >>
> >>     float.h
> >>     gcov.h
> >>     iso646.h
> >>     limits.h
> >>     stdalign.h
> >>     stdarg.h
> >>     stdatomic.h
> >>     stdbool.h
> >>     stdckdint.h
> >>     stddef.h
> >>     stdfix.h
> >>     stdint.h
> >>     stdnoreturn.h
> >>     syslimits.h
> >>     tgmath.h
> >>     unwind.h
> >>     varargs.h
> >>
> >>   However, we have found that there is at least one BPF kernel self test
> >>   that include glibc headers that, indirectly, include glibc's own
> >>   definitions of stdint.h and friends.  This leads to compile-time
> >>   errors due to conflicting types.  We think that including headers from
> >>   a glibc built for some host target is very questionable.  For example,
> >>   in BPF a C `char' is defined to be signed.  But if a BPF program
> >>   includes glibc headers in an android system, that code will assume an
> >>   unsigned char instead.
> >>
> >
> > Do you have a list of those tests?
>
> For example:
>
>   progs/test_cls_redirect.c
>   progs/test_cls_redirect_dynptr.c
>   progs/test_cls_redirect_subprogs.c
>
> they include linux/icmp.h that, in turn:
>
>  linux/icmp.h <- linux/if.h <- sys/socket.h <- bits/socket.h <- sys/types.h
>
> If BPF programs are expected to be able to liberally include kernel
> headers that, in turn, may include glibc headers, then it is gonna be
> very difficult to consistently avoid these conflicts..

I don't think anyone ever promised this. I think it should be fine to
convert all those tests to use vmlinux.h, but we'll need to add a
bunch of #defines from UAPI headers (like TC_ACT_SHOT). I see we have
tons of those in progs/xdp_synproxy_kern.c, so maybe just extracting
that into a separate header would be enough.

Ideally TC_ACT_SHOT and other stuff is an enum in UAPI and is just
added into vmlinux.h, but someone would need to propose this upstream
and do the changes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-12-01 21:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-28 16:23 BPF GCC status - Nov 2023 Jose E. Marchesi
2023-11-29  5:50 ` Yonghong Song
2023-11-29  7:08   ` Jose E. Marchesi
2023-11-29 16:44     ` Yonghong Song
2023-11-29 17:01       ` Alexei Starovoitov
2023-11-29 17:44         ` Yonghong Song
2023-11-30 12:13       ` Jose E. Marchesi
2023-11-30 14:58         ` Yonghong Song
2023-11-30 15:06           ` Jose E. Marchesi
2023-11-30 17:39             ` Yonghong Song
2023-11-30 18:27 ` Andrii Nakryiko
2023-11-30 19:49   ` Jose E. Marchesi
2023-12-01 21:38     ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox