From: Jiri Olsa <olsajiri@gmail.com>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: acme@kernel.org, ast@kernel.org, yhs@fb.com, andrii@kernel.org,
daniel@iogearbox.net, laoar.shao@gmail.com, martin.lau@linux.dev,
song@kernel.org, john.fastabend@gmail.com, kpsingh@kernel.org,
sdf@google.com, haoluo@google.com, bpf@vger.kernel.org
Subject: Re: [RFC dwarves 5/6] btf_encoder: store ELF function representations sorted by name _and_ address
Date: Thu, 18 May 2023 10:39:19 +0200 [thread overview]
Message-ID: <ZGXkN2TeEJZHMSG8@krava> (raw)
In-Reply-To: <20230517161648.17582-6-alan.maguire@oracle.com>
On Wed, May 17, 2023 at 05:16:47PM +0100, Alan Maguire wrote:
> By making sorting function for our ELF function list match on
> both name and function, we ensure that the set of ELF functions
> includes multiple copies for functions which have multiple instances
> across CUs. For example, cpumask_weight has 22 instances in
> System.map/kallsyms:
>
> ffffffff8103b530 t cpumask_weight
> ffffffff8103e300 t cpumask_weight
> ffffffff81040d30 t cpumask_weight
> ffffffff8104fa00 t cpumask_weight
> ffffffff81064300 t cpumask_weight
> ffffffff81082ba0 t cpumask_weight
> ffffffff81084f50 t cpumask_weight
> ffffffff810a4ad0 t cpumask_weight
> ffffffff810bb740 t cpumask_weight
> ffffffff8110a6c0 t cpumask_weight
> ffffffff81118ab0 t cpumask_weight
> ffffffff81129b50 t cpumask_weight
> ffffffff81137dc0 t cpumask_weight
> ffffffff811aead0 t cpumask_weight
> ffffffff811d6800 t cpumask_weight
> ffffffff811e1370 t cpumask_weight
> ffffffff812fae80 t cpumask_weight
> ffffffff81375c50 t cpumask_weight
> ffffffff81634b60 t cpumask_weight
> ffffffff817ba540 t cpumask_weight
> ffffffff819abf30 t cpumask_weight
> ffffffff81a7cb60 t cpumask_weight
>
> With ELF representations for each address, and DWARF info about
> addresses (low_pc) we can match DWARF with ELF accurately.
> The result for the BTF representation is that we end up with
> a single de-duped function:
>
> [9287] FUNC 'cpumask_weight' type_id=9286 linkage=static
>
> ...and 22 DECL_TAGs for each address that point at it:
>
> 9288] DECL_TAG 'address=0xffffffff8103b530' type_id=9287 component_idx=-1
> [9623] DECL_TAG 'address=0xffffffff8103e300' type_id=9287 component_idx=-1
> [9829] DECL_TAG 'address=0xffffffff81040d30' type_id=9287 component_idx=-1
> [11609] DECL_TAG 'address=0xffffffff8104fa00' type_id=9287 component_idx=-1
> [13299] DECL_TAG 'address=0xffffffff81064300' type_id=9287 component_idx=-1
> [15704] DECL_TAG 'address=0xffffffff81082ba0' type_id=9287 component_idx=-1
> [15731] DECL_TAG 'address=0xffffffff81084f50' type_id=9287 component_idx=-1
> [18582] DECL_TAG 'address=0xffffffff810a4ad0' type_id=9287 component_idx=-1
> [20234] DECL_TAG 'address=0xffffffff810bb740' type_id=9287 component_idx=-1
> [25384] DECL_TAG 'address=0xffffffff8110a6c0' type_id=9287 component_idx=-1
> [25798] DECL_TAG 'address=0xffffffff81118ab0' type_id=9287 component_idx=-1
> [26285] DECL_TAG 'address=0xffffffff81129b50' type_id=9287 component_idx=-1
> [27040] DECL_TAG 'address=0xffffffff81137dc0' type_id=9287 component_idx=-1
> [32900] DECL_TAG 'address=0xffffffff811aead0' type_id=9287 component_idx=-1
> [35059] DECL_TAG 'address=0xffffffff811d6800' type_id=9287 component_idx=-1
> [35353] DECL_TAG 'address=0xffffffff811e1370' type_id=9287 component_idx=-1
> [48934] DECL_TAG 'address=0xffffffff812fae80' type_id=9287 component_idx=-1
> [54476] DECL_TAG 'address=0xffffffff81375c50' type_id=9287 component_idx=-1
> [87772] DECL_TAG 'address=0xffffffff81634b60' type_id=9287 component_idx=-1
> [108841] DECL_TAG 'address=0xffffffff817ba540' type_id=9287 component_idx=-1
> [132557] DECL_TAG 'address=0xffffffff819abf30' type_id=9287 component_idx=-1
> [143689] DECL_TAG 'address=0xffffffff81a7cb60' type_id=9287 component_idx=-1
right, Yonghong pointed this out in:
https://lore.kernel.org/bpf/49e4fee2-8be0-325f-3372-c79d96b686e9@meta.com/
it's problem, because we pass btf id as attach id during bpf program load,
and kernel does not have a way to figure out which address from the associated
DECL_TAGs to use
if we could change dedup algo to take the function address into account and
make it not de-duplicate equal functions with different addresses, then we
could:
- find btf id that properly and uniquely identifies the function we
want to trace
- store the vmlinux base address and treat stored function addresses as
offsets, so the verifier can get proper address even if the kernel
is relocated
or
- add support for kernel's kallsyms to return address for given btf id,
I plan to check on this one
jirka
>
> Consider another case where the same name - wakeup_show() - is
> used for two different function signatures:
>
> From kernel/irq/irqdesc.c
>
> static ssize_t wakeup_show(struct kobject *kobj,
> struct kobj_attribute *attr, char *buf)
>
> ...and from drivers/base/power/sysfs.c
>
> static ssize_t wakeup_show(struct device *dev, struct device_attribute *attr,
> char *buf);
>
> We see both defined in BTF, along with the addresses that
> tell us which is which:
>
> [28472] FUNC 'wakeup_show' type_id=11214 linkage=static
>
> specifies
>
> [11214] FUNC_PROTO '(anon)' ret_type_id=76 vlen=3
> 'kobj' type_id=877
> 'attr' type_id=11200
> 'buf' type_id=56
>
> ...and has declaration tag
>
> [28473] DECL_TAG 'address=0xffffffff8115eab0' type_id=28472 component_idx=-1
>
> which identifies
>
> ffffffff8115eab0 t wakeup_show
>
> ...as the function with the first signature.
>
> Similarly,
>
> [114375] FUNC 'wakeup_show' type_id=4750 linkage=static
>
> [4750] FUNC_PROTO '(anon)' ret_type_id=76 vlen=3
> 'dev' type_id=1488
> 'attr' type_id=3909
> 'buf' type_id=56
> ...and
>
> [114376] DECL_TAG 'address=0xffffffff8181eac0' type_id=114375 component_idx=-1
>
> ...tell us that
>
> ffffffff8181eac0 t wakeup_show
>
> ...has the second signature. So we can accommodate multiple
> functions with conflicting signatures in BTF, since we have
> added extra info to map from function description in BTF
> to address.
>
> In total for vmlinux 52006 DECL_TAGs are added, and these add
> 2MB to the BTF representation.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> btf_encoder.c | 26 ++++++++++++++++++++------
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 3bd0fe0..315053d 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -988,13 +988,25 @@ static int functions_cmp(const void *_a, const void *_b)
> {
> const struct elf_function *a = _a;
> const struct elf_function *b = _b;
> + int ret;
>
> /* if search key allows prefix match, verify target has matching
> * prefix len and prefix matches.
> */
> if (a->prefixlen && a->prefixlen == b->prefixlen)
> - return strncmp(a->name, b->name, b->prefixlen);
> - return strcmp(a->name, b->name);
> + ret = strncmp(a->name, b->name, b->prefixlen);
> + else
> + ret = strcmp(a->name, b->name);
> +
> + if (ret || !b->addr)
> + return ret;
> +
> + /* secondarily sort/search by address. */
> + if (a->addr < b->addr)
> + return -1;
> + if (a->addr > b->addr)
> + return 1;
> + return 0;
> }
>
> #ifndef max
> @@ -1044,9 +1056,11 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *
> }
>
> static struct elf_function *btf_encoder__find_function(const struct btf_encoder *encoder,
> - const char *name, size_t prefixlen)
> + struct function *fn, size_t prefixlen)
> {
> - struct elf_function key = { .name = name, .prefixlen = prefixlen };
> + struct elf_function key = { .name = function__name(fn),
> + .addr = fn->low_pc,
> + .prefixlen = prefixlen };
>
> return bsearch(&key, encoder->functions.entries, encoder->functions.cnt, sizeof(key), functions_cmp);
> }
> @@ -1846,7 +1860,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
> continue;
>
> /* prefer exact function name match... */
> - func = btf_encoder__find_function(encoder, name, 0);
> + func = btf_encoder__find_function(encoder, fn, 0);
> if (func) {
> if (func->generated)
> continue;
> @@ -1863,7 +1877,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
> * it does not have optimized-out parameters
> * in any cu.
> */
> - func = btf_encoder__find_function(encoder, name,
> + func = btf_encoder__find_function(encoder, fn,
> strlen(name));
> if (func) {
> save = true;
> --
> 2.31.1
>
next prev parent reply other threads:[~2023-05-18 8:39 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-17 16:16 [RFC dwarves 0/6] Encoding function addresses using DECL_TAGs Alan Maguire
2023-05-17 16:16 ` [RFC dwarves 1/6] btf_encoder: record function address and if it is local Alan Maguire
2023-05-17 16:16 ` [RFC dwarves 2/6] dwarf_loader: store address in function low_pc if available Alan Maguire
2023-05-17 16:16 ` [RFC dwarves 3/6] dwarf_loader: transfer low_pc info from subtroutine to its abstract origin Alan Maguire
2023-05-17 16:16 ` [RFC dwarves 4/6] btf_encoder: add "addr=0x<addr>" function declaration tag if --btf_gen_func_addr specified Alan Maguire
2023-05-17 16:16 ` [RFC dwarves 5/6] btf_encoder: store ELF function representations sorted by name _and_ address Alan Maguire
2023-05-18 8:39 ` Jiri Olsa [this message]
2023-05-18 13:23 ` Alan Maguire
2023-05-18 15:20 ` Yonghong Song
2023-05-18 16:22 ` Jiri Olsa
2023-05-18 18:25 ` Yonghong Song
2023-05-19 0:23 ` Yonghong Song
2023-05-19 0:26 ` Alexei Starovoitov
2023-05-22 21:31 ` Andrii Nakryiko
2023-05-25 8:52 ` Jiri Olsa
2023-05-25 10:14 ` Alan Maguire
2023-05-25 17:29 ` Andrii Nakryiko
2023-05-17 16:16 ` [RFC dwarves 6/6] pahole: document --btf_gen_func_addr Alan Maguire
2023-05-19 9:44 ` [RFC dwarves 0/6] Encoding function addresses using DECL_TAGs Yafang Shao
2023-05-19 14:46 ` Yonghong Song
2023-05-19 15:08 ` Alan Maguire
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGXkN2TeEJZHMSG8@krava \
--to=olsajiri@gmail.com \
--cc=acme@kernel.org \
--cc=alan.maguire@oracle.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=kpsingh@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox