From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
To: bpf@vger.kernel.org
Cc: david.faust@oracle.com, elena.zannoni@oracle.com,
David Malcolm <dmalcolm@redhat.com>,
Nick Desaulniers <ndesaulniers@google.com>,
Julia Lawall <julia.lawall@inria.fr>
Subject: Follow up from the btf_type_tag discussion in the BPF office hours
Date: Thu, 15 Dec 2022 19:43:18 +0100 [thread overview]
Message-ID: <87o7s4ece1.fsf@oracle.com> (raw)
Of the two problems discussed:
1. DW_TAG_LLVM_annotation not being able to denote annotations to
non-pointed based types. clang currently ignores these instances.
We discussed two possible options to deal with this:
1.1 To continue ignoring these cases in the front-end, keep the dwarf
expressiveness limitation, and document it.
1.2 To change DW_TAG_LLVM_annotation so it behaves like a qualifier
DIE (like const, volatile, etc.) so it can apply to any type.
2. The ordering problem: sparse annotations order differently than
GNU/C99 compiler attributes. Therefore translating 1-to-1 from
sparse annotations to compiler attributes results in attributes with
different syntax than normal compiler attributes.
This was accepted in clang.
But found resistance in GCC when we sent the first patch series.
During the meeting we went thru several possible ways of dealing with
this problem, but we didn't reach any conclusion on what to do, since
the time ran out.
We agreed to continue the discussion at the BPF office hours next 5
January 2023.
In the meanwhile, below in this email is a slightly updated version of
the material used to go thru the topics during the discussion. If there
is any mistake or if you see that our understanding of the
problem/situation is not correct, please point it out. If you want to
add more information, please do so by replying to this thread.
Finally, it was agreed that we (GCC BPF hackers) would send Yonghong our
github accounts so he can subscribe us to notifications in the llvm
phabricator, so we can be aware of potentially ABI/breaking changes at
the time they are discussed, and not afterwards scanning bpf@vger. I
alredy sent him the information.
Thank you for your time today. It is appreciated.
----
* Background
** Attribute/annotation ordering in declarations
- By "ordering" in this context we understand the correspondence of
annotations (like compiler attributes) with language entities.
Given:
: char * __attribute ((btf_type_tag("foo"))) * buf;
What is the language entity (type) to which the attribute
applies? Is it char? Is it char*? Is it char**?
- This is decided by compilers at parse-time and reflected in the
structure of the AST.
- Therefore this is a source-level concept, and it is not amenable
by generating debug info with a different ordering.
** Compilers implement different orderings for different kinds of attributes
- The sparse compiler implements a particular ordering for sparse
annotations, which look like:
: __attribute__ ((address_space(user)))
The ordering is not documented, but can be determined by running
sparse and/or by looking at the sparse parser source code.
[The fact sparse annotations look like compiler GNU-like
attributes must not mislead us: they are not the same and they
don't order the same way.]
- The GCC and clang compilers implement another particular ordering
for GNU-like compiler attributes, which have the form:
: __attribute__ ((noinline))
The ordering is specified in the "Attribute Syntax" chapter in
the GCC manual.
- The GCC and clang compilers implement another particular ordering
for C2x compiler attributes, which have the form:
: [[noinline]]
The ordering is specified in the C2x specification.
All three entities above, sparse annotations, GNU-like compiler
attributes and C2x compiler attributes use different orderings,
i.e. the correspondence of the annotation/attribute with the
language entity it annotates (type, declaration, ...) varies.
* The problem
** Re-use of sparse annotations as compiler attributes
- At some point we got the request from the BPF community to support
a couple of new compiler attributes in GCC:
: btf_type_tag ("foobar")
: btf_decl_tag ("foobar")
These attributes translate into annotations in both BTF and DWARF
debugging information.
clang/llvm had already been modified to support these attributes,
the new BTF kind, and a new DW_TAG_LLVM_annotation DWARF DIE type.
- David Faust wrote an implementation, and used as a criteria the
several examples he saw in the kernel sources and what clang
generates. He added the same support in BTF and a compatible
DW_TAG_GNU_annotation for DWARF.
He obviously implemented the new compiler attributes as what they
are: GNU-like compiler attributes, and saw that they were being
ordered differently to what the kernel sources would expect.
- Then we investigated and learned that the intention at the BPF
side was to re-use the already existing address_space sparse
annotations in the kernel sources as btf_*_tag compiler
attributes.
- At present, it looks like support for several other sparse-like
attributes have been added to clang (which ones?) and as a result
now clang effectively has GNU-like compiler attributes that order
differently than all other attributes.
- At LPC 2022 we met David Malcolm, who has been working in
implementing sparse annotations and warnings natively in GCC. He
found similar problems with the ordering of the attributes.
** Scope of the problem
- This happens when sparse annotations are re-used as compiler
attributes.
- sparse annotations are NOT compiler attributes => there is NO bug
in sparse.
- These annotations are used very widely in the Linux kernel sources.
- Compilers involved: sparse, GCC and clang
- sparse annotations can appear in both BPF C programs and kernel
source files.
** Ordering discrepancies between sparse annotations and compiler attributes
Using these definitions:
: #ifdef __CHECKER__
: #define __tag1 __attribute__((noderef, address_space(__tag1)))
: #define __tag2 __attribute__((noderef, address_space(__tag2)))
: #else
: #define __tag1 __attribute__((btf_type_tag("tag1")))
: #define __tag2 __attribute__((btf_type_tag("tag2")))
: #endif
__CHECKER__ is defined by sparse, so the former definitions are
used * while compiling the examples with sparse.
The latter definitions with btf_type_tag are used when compiling
with GCC or clang.
Compilers compared:
- sparse
+ Ordering information is obtained via sparse debug printed as
part of a warning.
- GCC
+ Slightly outdated 'master' branch, with the patches to support
btf_type_tag and btf_decl_tag applied:
https://gcc.gnu.org/git/gcc.git:refs/users/dfaust/heads/btf-type-tag-new-rebase
+ Ordering information is obtained via the generated BTF and
DWARF. Note that we have verified that the BTF and DWARF
results are accurately representative of the GCC internal
compiler structures which produce it, and that therefore they
reflect how the compiler has parsed the declarations.
- clang
+ clang version 16.0.0. main branch of December 14 2022.
+ Ordering information is obtained via the generated BTF and
DWARF.
Probably the simplest case involving pointers, shows how both
compilers are mangling the generated BTF in order to reflect the
sparse expectations (presumably the same hack is in pahole when
it does the DWARF -> BTF translation):
: char __tag1 * buf;
- sparse
+ __tag1 applies to type char.
: got char [noderef] __tag1 *[addressable] [toplevel] buf
- clang
+ According to DWARF __tag1 applies to type char
+ According to BTF __tag1 applies to type char
: DWARF
: 0x00000023: DW_TAG_variable
: DW_AT_name ("buf")
: DW_AT_type (0x0000002e "char *")
:
: 0x0000002e: DW_TAG_pointer_type
: DW_AT_type (0x00000037 "char")
:
: 0x00000033: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
: BTF
: [1] TYPE_TAG 'tag1' type_id=3
: [2] PTR '(anon)' type_id=1
: [3] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
: [4] VAR 'buf' type_id=2, linkage=global
:
: buf -> ptr -> 'tag1' -> char
- GCC
+ According to DWARF __tag1 applies to type char
+ According to BTF __tag1 applies to type char
: DWARF
: 0x0000002e: DW_TAG_variable
: DW_AT_name ("buf")
: DW_AT_type (0x00000044 "char *")
:
: 0x00000044: DW_TAG_pointer_type
: DW_AT_type (0x00000057 "char")
:
: 0x0000004d: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
:
: BTF
: [1] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
: [2] PTR '(anon)' type_id=3
: [3] TYPE_TAG 'tag1' type_id=1
: [4] VAR 'buf' type_id=2, linkage=global
:
: buf -> ptr -> 'tag1' -> char
The next example is from from bpf-next/arch/x86/kernel/kgdb.c:184
and shows cases where both the DWARF and the BTF are different:
: struct perf_event * __percpu *pev;
- sparse
+ __percpu applies to struct perf_event*
: got struct perf_event *[noderef] __percpu *[addressable] [toplevel] pev
- clang
+ According to DWARF __percpu applies to struct perf_event*
+ According to BTF __percpu applies to struct perf_event*
: DWARF
: 0x00000023: DW_TAG_variable
: DW_AT_name ("pev")
: DW_AT_type (0x0000002e "perf_event **")
:
: 0x0000002e: DW_TAG_pointer_type
: DW_AT_type (0x00000037 "perf_event *")
:
: 0x00000033: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("percpu")
:
: 0x00000037: DW_TAG_pointer_type
: DW_AT_type (0x0000003c "perf_event")
: BTF
: [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [4] PTR '(anon)' type_id=5
: [5] STRUCT 'perf_event' size=4 vlen=1
: 'a' type_id=2 bits_offset=0
: [8] TYPE_TAG 'percpu' type_id=4
: [9] PTR '(anon)' type_id=8
: [13] VAR 'pev' type_id=9, linkage=global
:
: pev -> ptr -> 'percpu' -> ptr -> struct perf_event
- GCC
+ According to DWARF __percpu applies to struct perf_event
+ According to BTF __percpu applies to struct perf_event
: DWARF
: 0x00000023: DW_TAG_variable
: DW_AT_name ("pev")
: DW_AT_type (0x00000039 "perf_event **")
:
: 0x00000039: DW_TAG_pointer_type (perf_event **)
: DW_AT_type (0x0000003f "perf_event *")
:
: 0x0000003f: DW_TAG_pointer_type (perf_event *)
: DW_AT_type (0x0000001e "perf_event")
:
: 0x00000045: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("percpu")
: BTF
: [1] STRUCT 'perf_event' size=4 vlen=1
: 'a' type_id=2 bits_offset=0
: [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [3] PTR '(anon)' type_id=4
: [4] TYPE_TAG 'percpu' type_id=1
: [5] PTR '(anon)' type_id=3
: [14] VAR 'pev' type_id=5, linkage=global
:
: pev -> ptr -> ptr -> 'percpu' -> perf_event
The following case shows two different attributes applied to
pointers-to-pointers:
: int __tag1 * __tag2 * g;
- sparse
+ __tag1 applies to int, __tag2 applies to int*
: got int [noderef] __tag1 *[noderef] __tag2 *[addressable] [toplevel] g
- clang
+ According to DWARF __tag1 applies to int, __tag2 applies to int*
+ According to BTF __tag1 applies to int, __tag2 applies to int*
: DWARF
: 0x00000023: DW_TAG_variable
: DW_AT_name ("g")
: DW_AT_type (0x0000002e "int **")
:
: 0x0000002e: DW_TAG_pointer_type
: DW_AT_type (0x00000037 "int *")
:
: 0x00000033: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag2")
:
: 0x00000037: DW_TAG_pointer_type
: DW_AT_type (0x00000040 "int")
:
: 0x0000003c: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
: BTF
: [1] TYPE_TAG 'tag2' type_id=4
: [2] PTR '(anon)' type_id=1
: [3] TYPE_TAG 'tag1' type_id=5
: [4] PTR '(anon)' type_id=3
: [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [6] VAR 'g' type_id=2, linkage=global
:
: 'g' -> ptr -> 'tag2' -> ptr -> 'tag1' -> int
- GCC
+ According to DWARF __tag1 applies to int*, __tag2 applies to int
+ According to BTF __tag1 applies to int*, __tag2 applies to int
: DWARF
: 0x0000002e: DW_TAG_variable
: DW_AT_name ("g")
: DW_AT_type (0x00000042 "int **")
:
: 0x00000042: DW_TAG_pointer_type
: DW_AT_type (0x00000055 "int *")
:
: 0x0000004b: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
:
: 0x00000055: DW_TAG_pointer_type
: DW_AT_type (0x00000068 "int")
:
: 0x0000005e: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag2")
: BTF
: [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [2] PTR '(anon)' type_id=3
: [3] TYPE_TAG 'tag2' type_id=1
: [4] PTR '(anon)' type_id=5
: [5] TYPE_TAG 'tag1' type_id=2
: [6] VAR 'g' type_id=4, linkage=global
:
: 'g' -> ptr -> 'tag1' -> ptr -> 'tag2' -> int
What follows is a typical kernel example. We get "right" results
in GCC just because both tags are the same attributes, but
effectively we hit the same problem than in the previous example:
: int __tag1 * __tag1 * val;
- sparse
+ __tag1 applies to both int and int*
: got int [noderef] __tag1 *[noderef] __tag1 *[addressable] [toplevel] val
- clang
+ According to DWARF __tag1 applies to both int and int*
+ According to BTF __tag1 applies to both int and int*
: DWARF
: 0x0000001e: DW_TAG_variable
: DW_AT_name ("val")
: DW_AT_type (0x00000029 "int **")
:
: 0x00000029: DW_TAG_pointer_type
: DW_AT_type (0x00000032 "int *")
:
: 0x0000002e: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
:
: 0x00000032: DW_TAG_pointer_type
: DW_AT_type (0x0000003b "int")
:
: 0x00000037: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
: BTF
: [1] TYPE_TAG 'tag1' type_id=4
: [2] PTR '(anon)' type_id=1
: [3] TYPE_TAG 'tag1' type_id=5
: [4] PTR '(anon)' type_id=3
: [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [6] VAR 'val' type_id=2, linkage=global
:
: 'val' -> ptr -> 'tag1' -> ptr -> 'tag1' -> int
- GCC
+ According to DWARF __tag1 applies to both int and int*
+ According to BTF __tag1 applies to both int and int*
: DWARF
: 0x0000001e: DW_TAG_variable
: DW_AT_name ("val")
: DW_AT_type (0x00000034 "int **")
:
: 0x00000034: DW_TAG_pointer_type
: DW_AT_type (0x00000047 "int *")
:
: 0x0000003d: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
:
: 0x00000047: DW_TAG_pointer_type
: DW_AT_type (0x0000005a "int")
:
: 0x00000050: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
: BTF
: [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [2] PTR '(anon)' type_id=3
: [3] TYPE_TAG 'tag1' type_id=1
: [4] PTR '(anon)' type_id=5
: [5] TYPE_TAG 'tag1' type_id=2
: [6] VAR 'val' type_id=4, linkage=global
:
: 'val' -> ptr -> 'tag1' -> ptr -> 'tag1' -> int
This last example is another variation of pointer-to-pointer with
different tags and different positions:
: int * __tag1 * __tag2 h;
- sparse
+ __tag1 applies to int*, __tag2 applies to int**
: got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h
- clang
+ According to DWARF __tag1 applies to int*, no __tag2 (??).
+ According to BTF __tag1 applies to int*, no __tag2 (??).
: DWARF
: 0x00000023: DW_TAG_variable
: DW_AT_name ("h")
: DW_AT_type (0x0000002e "int **")
:
: 0x0000002e: DW_TAG_pointer_type
: DW_AT_type (0x00000037 "int *")
:
: 0x00000033: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
: BTF
: [1] TYPE_TAG 'tag1' type_id=3
: [2] PTR '(anon)' type_id=1
: [3] PTR '(anon)' type_id=4
: [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [5] VAR 'h' type_id=2, linkage=global
:
: 'h' -> ptr -> 'tag1' -> ptr -> int
- GCC
+ According to DWARF __tag1 applies to int*, __tag2 applies to int**
+ According to BTF __tag1 applies to int, __tag2 applies to int*
: DWARF
: 0x0000002e: DW_TAG_variable
: DW_AT_name ("h")
: DW_AT_type (0x00000042 "int **")
:
: 0x00000042: DW_TAG_pointer_type
: DW_AT_type (0x00000055 "int *")
:
: 0x0000004b: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag2")
:
: 0x00000055: DW_TAG_pointer_type
: DW_AT_type (0x00000068 "int")
:
: 0x0000005e: DW_TAG_LLVM_annotation
: DW_AT_name ("btf_type_tag")
: DW_AT_const_value ("tag1")
: BTF
: [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
: [2] PTR '(anon)' type_id=3
: [3] TYPE_TAG 'tag1' type_id=1
: [4] PTR '(anon)' type_id=5
: [5] TYPE_TAG 'tag2' type_id=2
: [6] VAR 'h' type_id=4, linkage=global
:
: 'h' -> ptr -> 'tag2' -> ptr -> 'tag1' -> int
* Solutions
** Reorder sparse annotations in kernel source
[Julia Jawall kindly adapted coccinelle so it can handle this and
the following possible solution involving modifying kernel
sources.]
This would basically involve writing and applying a coccinelle
script to reorder something like this:
: int __tag1 * __tag2 * g;
into this
: int __tag2 * __tag1 * g;
where
: #define __tag1 __attribute__((address_space(__tag1)))
This totally breaks sparse so it is hardly a realistic solution.
** Add compiler attributes corresponding to existing sparse annotations to the kernel source
This would involve writing and applying a coccinelle script to add
compiler attributes corresponding to the existing sparse
annotations.
So from this:
: int __tag1 * __tag2 * g;
We would go to:
: int __tag1 __comp_tag2 * __tag2 __comp_tag1 * g;
where
: #if __CHECKER__
: # define __tag1 __attribute__((address_space(__tag1)))
: #else
: # define __tag1
: #endif
: #define __comp_tag1 __attribute__((btf_type_tag("tag1")))
This would not break sparse, but would lead to a lot of (non
trivial) redundancy in the kernel sources: both sparse annotations
and compiler attributes will denote the same thing.
** Adding native support for sparse annotations to GCC and clang
This would make both GCC and clang to handle sparse annotations
using the sparse ordering. This is what David Malcolm has been
working on.
There is some previous work From Tom Tromey to support the
following sparse annotations: address_space, noderef and force.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59850 (But AFAIK
Tom's patch also falls in the mistake of ordering the attributes
using GNU-like attribute ordering, not the sparse ordering?)
For example, in this C declaration:
: int * __attribute__((address_space(__tag1))) * p;
The compilers would recognize that as a sparse annotation (and not
a GNU-like compiler attribute) and parse it in a way it applies to
int* instead of int**.
At least two problems with this approach:
1. This could be confusing for users, and difficult to get applied
upstream, because sparse annotations look like GNU-like compiler
attributes, despite not being the same thing (ordering
differently.)
2. sparse lacks support for btf_type_tag and btf_decl_tag, and
these compiler attributes are more general than the existing
sparse annotations.
** Adding GNU-like compiler attributes to GCC and clang with sparse ordering
This is what clang does right now (or tries to do).
Solves problem 2. above, but not 1.
Maybe problem 1. could be alleviated by oddly-ordering compiler
attributes to have a distinguished name, such as using a 'sparse'
prefix:
: __attribute__((sparse_btf_type_tag ("tag1")))
** Other solutions?
next reply other threads:[~2022-12-15 18:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-15 18:43 Jose E. Marchesi [this message]
2022-12-15 22:14 ` Follow up from the btf_type_tag discussion in the BPF office hours Jose E. Marchesi
2022-12-17 1:38 ` Yonghong Song
2022-12-19 17:27 ` Jose E. Marchesi
2022-12-28 4:49 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7s4ece1.fsf@oracle.com \
--to=jose.marchesi@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=david.faust@oracle.com \
--cc=dmalcolm@redhat.com \
--cc=elena.zannoni@oracle.com \
--cc=julia.lawall@inria.fr \
--cc=ndesaulniers@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox