From: Ilya Leoshkevich <iii@linux.ibm.com>
To: Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net
Cc: kernel-team@fb.com, Alan Maguire <alan.maguire@oracle.com>,
Dave Marchevsky <davemarchevsky@fb.com>,
Hengqi Chen <hengqi.chen@gmail.com>
Subject: Re: [PATCH v3 bpf-next 1/7] libbpf: add BPF-side of USDT support
Date: Thu, 07 Apr 2022 16:19:33 +0200 [thread overview]
Message-ID: <28c378a6eb72b66b44cfac250807a2a01ee478af.camel@linux.ibm.com> (raw)
In-Reply-To: <20220404234202.331384-2-andrii@kernel.org>
On Mon, 2022-04-04 at 16:41 -0700, Andrii Nakryiko wrote:
> Add BPF-side implementation of libbpf-provided USDT support. This
> consists of single header library, usdt.bpf.h, which is meant to be
> used
> from user's BPF-side source code. This header is added to the list of
> installed libbpf header, along bpf_helpers.h and others.
>
> BPF-side implementation consists of two BPF maps:
> - spec map, which contains "a USDT spec" which encodes information
> necessary to be able to fetch USDT arguments and other
> information
> (argument count, user-provided cookie value, etc) at runtime;
> - IP-to-spec-ID map, which is only used on kernels that don't
> support
> BPF cookie feature. It allows to lookup spec ID based on the
> place
> in user application that triggers USDT program.
>
> These maps have default sizes, 256 and 1024, which are chosen
> conservatively to not waste a lot of space, but handling a lot of
> common
> cases. But there could be cases when user application needs to either
> trace a lot of different USDTs, or USDTs are heavily inlined and
> their
> arguments are located in a lot of differing locations. For such cases
> it
> might be necessary to size those maps up, which libbpf allows to do
> by
> overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
>
> It is an important aspect to keep in mind. Single USDT (user-space
> equivalent of kernel tracepoint) can have multiple USDT "call sites".
> That is, single logical USDT is triggered from multiple places in
> user
> application. This can happen due to function inlining. Each such
> inlined
> instance of USDT invocation can have its own unique USDT argument
> specification (instructions about the location of the value of each
> of
> USDT arguments). So while USDT looks very similar to usual uprobe or
> kernel tracepoint, under the hood it's actually a collection of
> uprobes,
> each potentially needing different spec to know how to fetch
> arguments.
>
> User-visible API consists of three helper functions:
> - bpf_usdt_arg_cnt(), which returns number of arguments of current
> USDT;
> - bpf_usdt_arg(), which reads value of specified USDT argument (by
> it's zero-indexed position) and returns it as 64-bit value;
> - bpf_usdt_cookie(), which functions like BPF cookie for USDT
> programs; this is necessary as libbpf doesn't allow specifying
> actual
> BPF cookie and utilizes it internally for USDT support
> implementation.
>
> Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
> BPF program. On kernels that don't support BPF cookie it is used to
> fetch absolute IP address of the underlying uprobe.
>
> usdt.bpf.h also provides BPF_USDT() macro, which functions like
> BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
> get access to USDT arguments, if USDT definition is static and known
> to
> the user. It is expected that majority of use cases won't have to use
> bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will
> cover
> all their needs.
>
> Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
> detect kernel support for BPF cookie. If BPF CO-RE dependency is
> undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
> either a boolean constant (or equivalently zero and non-zero), or
> even
> point it to its own .rodata variable that can be specified from
> user's
> application user-space code. It is important that
> BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value
> (thus
> .rodata and not just .data), as otherwise BPF code will still contain
> bpf_get_attach_cookie() BPF helper call and will fail validation at
> runtime, if not dead-code eliminated.
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> tools/lib/bpf/Makefile | 2 +-
> tools/lib/bpf/usdt.bpf.h | 256
> +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 257 insertions(+), 1 deletion(-)
> create mode 100644 tools/lib/bpf/usdt.bpf.h
[...]
> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> new file mode 100644
> index 000000000000..60237acf6b02
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.bpf.h
> @@ -0,0 +1,256 @@
[...]
> +/* Fetch USDT argument #*arg_num* (zero-indexed) and put its value
> into *res.
> + * Returns 0 on success; negative error, otherwise.
> + * On error *res is guaranteed to be set to zero.
> + */
> +static inline __noinline
> +int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
> +{
> + struct __bpf_usdt_spec *spec;
> + struct __bpf_usdt_arg_spec *arg_spec;
> + unsigned long val;
> + int err, spec_id;
> +
> + *res = 0;
> +
> + spec_id = __bpf_usdt_spec_id(ctx);
> + if (spec_id < 0)
> + return -ESRCH;
> +
> + spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> + if (!spec)
> + return -ESRCH;
> +
> + if (arg_num >= BPF_USDT_MAX_ARG_CNT || arg_num >= spec-
> >arg_cnt)
> + return -ENOENT;
> +
> + arg_spec = &spec->args[arg_num];
> + switch (arg_spec->arg_type) {
> + case BPF_USDT_ARG_CONST:
> + /* Arg is just a constant ("-4@$-9" in USDT arg
> spec).
> + * value is recorded in arg_spec->val_off directly.
> + */
> + val = arg_spec->val_off;
> + break;
> + case BPF_USDT_ARG_REG:
> + /* Arg is in a register (e.g, "8@%rax" in USDT arg
> spec),
> + * so we read the contents of that register directly
> from
> + * struct pt_regs. To keep things simple user-space
> parts
> + * record offsetof(struct pt_regs, <regname>) in
> arg_spec->reg_off.
> + */
> + err = bpf_probe_read_kernel(&val, sizeof(val), (void
> *)ctx + arg_spec->reg_off);
> + if (err)
> + return err;
> + break;
> + case BPF_USDT_ARG_REG_DEREF:
> + /* Arg is in memory addressed by register, plus some
> offset
> + * (e.g., "-4@-1204(%rbp)" in USDT arg spec).
> Register is
> + * identified lik with BPF_USDT_ARG_REG case, and the
> offset
> + * is in arg_spec->val_off. We first fetch register
> contents
> + * from pt_regs, then do another user-space probe
> read to
> + * fetch argument value itself.
> + */
> + err = bpf_probe_read_kernel(&val, sizeof(val), (void
> *)ctx + arg_spec->reg_off);
> + if (err)
> + return err;
> + err = bpf_probe_read_user(&val, sizeof(val), (void
> *)val + arg_spec->val_off);
Is there a reason we always read 8 bytes here?
What if the user is interested in the last byte of a page?
[...]
next prev parent reply other threads:[~2022-04-07 14:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-04 23:41 [PATCH v3 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
2022-04-04 23:41 ` [PATCH v3 bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
2022-04-05 1:05 ` Dave Marchevsky
2022-04-07 14:19 ` Ilya Leoshkevich [this message]
2022-04-07 23:27 ` Andrii Nakryiko
2022-04-04 23:41 ` [PATCH v3 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
2022-04-04 23:41 ` [PATCH v3 bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
2022-04-04 23:41 ` [PATCH v3 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
2022-04-04 23:42 ` [PATCH v3 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
2022-04-06 17:23 ` Andrii Nakryiko
2022-04-06 22:49 ` Ilya Leoshkevich
2022-04-06 23:15 ` Andrii Nakryiko
2022-04-08 14:16 ` Alan Maguire
2022-04-08 16:17 ` Andrii Nakryiko
2022-04-04 23:42 ` [PATCH v3 bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
2022-04-04 23:42 ` [PATCH v3 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
2022-04-05 13:23 ` [PATCH v3 bpf-next 0/7] Add libbpf support for USDTs Hengqi Chen
2022-04-05 20:30 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28c378a6eb72b66b44cfac250807a2a01ee478af.camel@linux.ibm.com \
--to=iii@linux.ibm.com \
--cc=alan.maguire@oracle.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davemarchevsky@fb.com \
--cc=hengqi.chen@gmail.com \
--cc=kernel-team@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox