netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@kernel.org>
To: <davem@davemloft.net>
Cc: <daniel@iogearbox.net>, <x86@kernel.org>,
	<netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<kernel-team@fb.com>
Subject: [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing
Date: Fri, 4 Oct 2019 22:03:04 -0700	[thread overview]
Message-ID: <20191005050314.1114330-1-ast@kernel.org> (raw)

Revolutionize bpf tracing and bpf C programming.
C language allows any pointer to be typecasted to any other pointer
or convert integer to a pointer.
Though bpf verifier is operating at assembly level it has strict type
checking for fixed number of types.
Known types are defined in 'enum bpf_reg_type'.
For example:
PTR_TO_FLOW_KEYS is a pointer to 'struct bpf_flow_keys'
PTR_TO_SOCKET is a pointer to 'struct bpf_sock',
and so on.

When it comes to bpf tracing there are no types to track.
bpf+kprobe receives 'struct pt_regs' as input.
bpf+raw_tracepoint receives raw kernel arguments as an array of u64 values.
It was up to bpf program to interpret these integers.
Typical tracing program looks like:
int bpf_prog(struct pt_regs *ctx)
{
    struct net_device *dev;
    struct sk_buff *skb;
    int ifindex;

    skb = (struct sk_buff *) ctx->di;
    bpf_probe_read(&dev, sizeof(dev), &skb->dev);
    bpf_probe_read(&ifindex, sizeof(ifindex), &dev->ifindex);
}
Addressing mistakes will not be caught by C compiler or by the verifier.
The program above could have typecasted ctx->si to skb and page faulted
on every bpf_probe_read().
bpf_probe_read() allows reading any address and suppresses page faults.
Typical program has hundreds of bpf_probe_read() calls to walk
kernel data structures.
Not only tracing program would be slow, but there was always a risk
that bpf_probe_read() would read mmio region of memory and cause
unpredictable hw behavior.

With introduction of Compile Once Run Everywhere technology in libbpf
and in LLVM and BPF Type Format (BTF) the verifier is finally ready
for the next step in program verification.
Now it can use in-kernel BTF to type check bpf assembly code.

Equivalent program will look like:
struct trace_kfree_skb {
    struct sk_buff *skb;
    void *location;
};
SEC("raw_tracepoint/kfree_skb")
int trace_kfree_skb(struct trace_kfree_skb* ctx)
{
    struct sk_buff *skb = ctx->skb;
    struct net_device *dev;
    int ifindex;

    __builtin_preserve_access_index(({
        dev = skb->dev;
        ifindex = dev->ifindex;
    }));
}

These patches teach bpf verifier to recognize kfree_skb's first argument
as 'struct sk_buff *' because this is what kernel C code is doing.
The bpf program cannot 'cheat' and say that the first argument
to kfree_skb raw_tracepoint is some other type.
The verifier will catch such type mismatch between bpf program
assumption of kernel code and the actual type in the kernel.

Furthermore skb->dev access is type tracked as well.
The verifier can see which field of skb is being read
in bpf assembly. It will match offset to type.
If bpf program has code:
struct net_device *dev = (void *)skb->len;
C compiler will not complain and generate bpf assembly code,
but the verifier will recognize that integer 'len' field
is being accessed at offsetof(struct sk_buff, len) and will reject
further dereference of 'dev' variable because it contains
integer value instead of a pointer.

Such sophisticated type tracking allows calling networking
bpf helpers from tracing programs.
This patchset allows calling bpf_skb_event_output() that dumps
skb data into perf ring buffer.
It greatly improves observability.
Now users can not only see packet lenth of the skb
about to be freed in kfree_skb() kernel function, but can
dump it to user space via perf ring buffer using bpf helper
that was previously available only to TC and socket filters.
See patch 10 for full example.

The end result is safer and faster bpf tracing.
Safer - because direct calls to bpf_probe_read() are disallowed and
arbitrary addresses cannot be read.
Faster - because normal loads are used to walk kernel data structures
instead of bpf_probe_read() calls.
Note that such loads can page fault and are supported by
hidden bpf_probe_read() in interpreter and via exception table
if program is JITed.

See patches for details.

Alexei Starovoitov (10):
  bpf: add typecast to raw_tracepoints to help BTF generation
  bpf: add typecast to bpf helpers to help BTF generation
  bpf: process in-kernel BTF
  libbpf: auto-detect btf_id of raw_tracepoint
  bpf: implement accurate raw_tp context access via BTF
  bpf: add support for BTF pointers to interpreter
  bpf: add support for BTF pointers to x86 JIT
  bpf: check types of arguments passed into helpers
  bpf: disallow bpf_probe_read[_str] helpers
  selftests/bpf: add kfree_skb raw_tp test

 arch/x86/net/bpf_jit_comp.c                   |  96 +++++-
 include/linux/bpf.h                           |  21 +-
 include/linux/bpf_verifier.h                  |   6 +-
 include/linux/btf.h                           |   1 +
 include/linux/extable.h                       |  10 +
 include/linux/filter.h                        |   6 +-
 include/trace/bpf_probe.h                     |   3 +-
 include/uapi/linux/bpf.h                      |   3 +-
 kernel/bpf/btf.c                              | 318 ++++++++++++++++++
 kernel/bpf/core.c                             |  39 ++-
 kernel/bpf/verifier.c                         | 125 ++++++-
 kernel/extable.c                              |   2 +
 kernel/trace/bpf_trace.c                      |  10 +-
 net/core/filter.c                             |  15 +-
 tools/include/uapi/linux/bpf.h                |   3 +-
 tools/lib/bpf/libbpf.c                        |  16 +
 tools/testing/selftests/bpf/bpf_helpers.h     |   4 +
 .../selftests/bpf/prog_tests/kfree_skb.c      |  90 +++++
 tools/testing/selftests/bpf/progs/kfree_skb.c |  76 +++++
 19 files changed, 828 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kfree_skb.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfree_skb.c

-- 
2.20.0


             reply	other threads:[~2019-10-05  5:03 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-05  5:03 Alexei Starovoitov [this message]
2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
2019-10-05 18:40   ` Andrii Nakryiko
2019-10-06  3:58   ` John Fastabend
2019-10-05  5:03 ` [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers " Alexei Starovoitov
2019-10-05 18:41   ` Andrii Nakryiko
2019-10-06  4:00   ` John Fastabend
2019-10-05  5:03 ` [PATCH bpf-next 03/10] bpf: process in-kernel BTF Alexei Starovoitov
2019-10-06  6:36   ` Andrii Nakryiko
2019-10-06 23:49     ` Alexei Starovoitov
2019-10-07  0:20       ` Andrii Nakryiko
2019-10-09 20:51   ` Martin Lau
2019-10-10  3:43     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint Alexei Starovoitov
2019-10-07 23:41   ` Andrii Nakryiko
2019-10-09  2:26     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF Alexei Starovoitov
2019-10-07 16:32   ` Alan Maguire
2019-10-09  3:59     ` Alexei Starovoitov
2019-10-08  0:35   ` Andrii Nakryiko
2019-10-09  3:30     ` Alexei Starovoitov
2019-10-09  4:01       ` Andrii Nakryiko
2019-10-09  5:10         ` Andrii Nakryiko
2019-10-10  3:54           ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter Alexei Starovoitov
2019-10-08  3:08   ` Andrii Nakryiko
2019-10-05  5:03 ` [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT Alexei Starovoitov
2019-10-05  6:03   ` Eric Dumazet
2019-10-09 17:38   ` Andrii Nakryiko
2019-10-09 17:46     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers Alexei Starovoitov
2019-10-09 18:01   ` Andrii Nakryiko
2019-10-09 19:58     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers Alexei Starovoitov
2019-10-09  5:29   ` Andrii Nakryiko
2019-10-09 19:38     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test Alexei Starovoitov
2019-10-09  5:36   ` Andrii Nakryiko
2019-10-09 17:37     ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191005050314.1114330-1-ast@kernel.org \
    --to=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).