From: Yonghong Song <yhs@fb.com>
To: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>, bpf <bpf@vger.kernel.org>
Cc: "Dario Barać" <dario.barac@sartura.hr>,
"David Marcinkovic" <david.marcinkovic@sartura.hr>,
"Luka Perkov" <luka.perkov@sartura.hr>
Subject: Re: BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values
Date: Fri, 10 Sep 2021 10:44:58 -0700 [thread overview]
Message-ID: <c66fe63f-ee85-981b-ef2e-349c70f0cd7a@fb.com> (raw)
In-Reply-To: <CAOjtDRUnjONzDgtov-ugXejL-TUGwLgyQ50Q1uJqFSH=1q0QUg@mail.gmail.com>
On 9/10/21 8:36 AM, Juraj Vijtiuk wrote:
> Hello,
>
> while developing some cgroup socket programs, we have noticed that
> some BPF_PROG_TYPE_CGROUP_SOCK programs return different values when
> the bpf_sock context is accessed, depending on how they are accessed.
> One example of the issue would be access to ctx->family in programs
> that attach to BPF_CGROUP_INET6_POST_BIND and
> BPF_CGROUP_INET4_POST_BIND. A direct ctx->family access returns the
> correct value, while BPF_CORE_READ(ctx, family) returns random values.
> The BPF C program and an example userspace C loader are attached
> below, with an example trace_pipe output.
>
> So far we have looked at the generated BPF byte code with llvm-objdump
> and everything looked fine there, the main difference being in the way
> the access is done, as expected. The BPF_CORE_READ macro expands into
> a bpf_probe_read_kernel() call with arguments wrapped in
> __builtin_preserve_access_index. bpf_probe_read_* helper calls are
> supported for BPF_PROG_TYPE_CGROUP_SOCKS so that shouldn't be an
> issue. Next, we looked at libbpf debug output, where everything looked
> ok too. The part of the output with relocations is attached below.
This is an incorrect usage of CORE. See below.
>
> We have tested this with various kernel versions, including 5.10, 5.11
> and 5.13 on x86_64 and 5.11 on 32 bit ARM. The issue appeared on all
> of those kernels and architectures.
>
> At this point we're not sure what to look at next so any ideas on what
> might cause the issues or suggestions on what to test next would be
> greatly appreciated.
>
> Regards,
> Juraj Vijtiuk
>
> example.bpf.c
> ----------------------------------------
>
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_core_read.h>
>
> SEC("cgroup/post_bind4")
> int cgroup_post_bind4_prog(struct bpf_sock *ctx)
> {
> u32 family1 = 0;
> u32 family2 = 0;
>
> family1 = ctx->family;
> family2 = BPF_CORE_READ(ctx, family);
> bpf_printk("family1 = %u, family2 = %u\n", family1, family2);
We have assembly code below:
0: b7 02 00 00 04 00 00 00 r2 = 4
1: bf 13 00 00 00 00 00 00 r3 = r1
2: 0f 23 00 00 00 00 00 00 r3 += r2
3: 61 16 04 00 00 00 00 00 r6 = *(u32 *)(r1 + 4)
4: bf a1 00 00 00 00 00 00 r1 = r10
5: 07 01 00 00 e0 ff ff ff r1 += -32
6: b4 02 00 00 04 00 00 00 w2 = 4
7: 85 00 00 00 71 00 00 00 call 113
8: 61 a4 e0 ff 00 00 00 00 r4 = *(u32 *)(r10 - 32)
Looks like they are the same one insn #3 read context + 4
and another insn #7 also read context + 4.
But for insn #3, the verifier will rewrite it proper kernel field,
case offsetof(struct bpf_sock, family):
*insn++ = BPF_LDX_MEM(
BPF_FIELD_SIZEOF(struct sock_common, skc_family),
si->dst_reg, si->src_reg,
bpf_target_off(struct sock_common,
skc_family,
sizeof_field(struct sock_common,
skc_family),
target_size));
break;
and this is not the same for insn #7.
That is why they have different results. The CORE is used for accessing
kernel data structures, the "ctx" is not really a kernel data structure
(rather a UAPI interface) in most cases.
>
> return 0;
> }
>
> char LICENSE[] SEC("license") = "GPL";
>
> example.c
> ----------------------------------------
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/resource.h>
> #include <argp.h>
>
> #include <bpf/libbpf.h>
> #include <bpf/bpf.h>
>
> #include "example.skel.h"
>
> void read_trace_pipe(void)
> {
> int trace_fd;
>
> trace_fd = open("/sys/kernel/debug/tracing/trace_pipe", O_RDONLY, 0);
> if (trace_fd < 0)
> return;
>
> while (1) {
> static char buf[4096];
> ssize_t sz;
>
> sz = read(trace_fd, buf, sizeof(buf) - 1);
> if (sz > 0) {
> buf[sz] = 0;
> puts(buf);
> }
> }
> }
>
> int libbpf_print_fn(enum libbpf_print_level level,
> const char *format, va_list args)
> {
> return vfprintf(stderr, format, args);
> }
>
> int main(int argc, char **argv) {
> struct example_bpf *obj;
> int err = 0;
> struct rlimit rlim = {
> .rlim_cur = 512UL << 20,
> .rlim_max = 512UL << 20,
> };
>
> err = setrlimit(RLIMIT_MEMLOCK, &rlim);
> if (err) {
> fprintf(stderr, "failed to change rlimit\n");
> return 1;
> }
>
> libbpf_set_print(libbpf_print_fn);
> obj = example_bpf__open();
> if (!obj) {
> fprintf(stderr, "failed to open and/or load BPF object\n");
> return 1;
> }
>
> err = example_bpf__load(obj);
> if (err) {
> fprintf(stderr, "failed to load BPF object %d\n", err);
> goto cleanup;
> }
>
> const char *cgroup_path = "/sys/fs/cgroup";
> int cgroup_fd = open(cgroup_path, O_DIRECTORY | O_RDONLY);
>
> struct bpf_program *prog = obj->progs.cgroup_post_bind4_prog;
> obj->links.cgroup_post_bind4_prog =
> bpf_program__attach_cgroup(prog, cgroup_fd);
> err = libbpf_get_error(obj->links.cgroup_post_bind4_prog);
> if (err) {
> fprintf(stderr, "failed to attach BPF program %d\n", err);
> goto cleanup;
> }
>
> read_trace_pipe();
>
> cleanup:
> example_bpf__destroy(obj);
> return err != 0;
> }
>
> trace_pipe output
> ----------------------------------------
> Chrome_IOThread-26477 [006] d..2 385580.114654: bpf_trace_printk:
> family1 = 2, family2 = 1747691712
> <...>-144100 [004] d..2 385594.936690: bpf_trace_printk: family1 = 2,
> family2 = 0
>
> libbpf relocation log
> ----------------------------------------
> libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
> libbpf: sec 'cgroup/post_bind4': found 2 CO-RE relocations
> libbpf: prog 'cgroup_post_bind4_prog': relo #0: kind <byte_off> (0),
> spec is [2] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: CO-RE relocating [2] struct bpf_sock: found target candidate
> [24518] struct bpf_sock
> libbpf: prog 'cgroup_post_bind4_prog': relo #0: matching candidate #0
> [24518] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: prog 'cgroup_post_bind4_prog': relo #0: patched insn #9
> (ALU/ALU64) imm 4 -> 4
> libbpf: prog 'cgroup_post_bind4_prog': relo #1: kind <byte_off> (0),
> spec is [2] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: prog 'cgroup_post_bind4_prog': relo #1: matching candidate #0
> [24518] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: prog 'cgroup_post_bind4_prog': relo #1: patched insn #12
> (LDX/ST/STX) off 4 -> 4
>
next prev parent reply other threads:[~2021-09-10 17:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-10 15:36 BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values Juraj Vijtiuk
2021-09-10 17:44 ` Yonghong Song [this message]
2021-09-13 10:02 ` Juraj Vijtiuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c66fe63f-ee85-981b-ef2e-349c70f0cd7a@fb.com \
--to=yhs@fb.com \
--cc=bpf@vger.kernel.org \
--cc=dario.barac@sartura.hr \
--cc=david.marcinkovic@sartura.hr \
--cc=juraj.vijtiuk@sartura.hr \
--cc=luka.perkov@sartura.hr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox