All of lore.kernel.org
 help / color / mirror / Atom feed
* Contextually speaking...
@ 2017-05-13 22:36 David Miller
  2017-05-14  7:19 ` Vetoshkin Nikita
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: David Miller @ 2017-05-13 22:36 UTC (permalink / raw)
  To: xdp-newbies


Every eBPF program has a type, and that type is important because it
determines the kind of "context" which will be passed into your
program so that it can do it's work.

The context is the argument passed into the main entry point of your
eBPF program.

The eBPF program type is specified when the program is loaded via the
sys_bpf() system call.  For most of us this is usually achieved by
calling bpf_load_program() in libbpf.  "enum bpf_prog_type" currently
has the following values:

	BPF_PROG_TYPE_SOCKET_FILTER
	BPF_PROG_TYPE_KPROBE
	BPF_PROG_TYPE_SCHED_CLS
	BPF_PROG_TYPE_SCHED_ACT
	BPF_PROG_TYPE_TRACEPOINT
	BPF_PROG_TYPE_XDP
	BPF_PROG_TYPE_PERF_EVENT
	BPF_PROG_TYPE_CGROUP_SKB
	BPF_PROG_TYPE_CGROUP_SOCK
	BPF_PROG_TYPE_LWT_IN
	BPF_PROG_TYPE_LWT_OUT
	BPF_PROG_TYPE_LWT_XMIT

More can appear in the future.

For example, BPF_PROG_TYPE_SOCK_FILTER takes a "struct __sk_buff *" as
it's context argument.  Programs of type BPF_PROG_TYPE_SCHED_CLS and
BPF_PROG_TYPE_SCHED_ACT also take "struct __sk_buff *" as their
context argument.

These three program types have another thing in common, they are
allowed to use the LD_ABS and LD_IND instructions to access packet
data.  You cannot (currently) generate these from C code, only from
hand written eBPF assembler.  But they are important to understand
in their historical context.

LD_ABS and LD_IND simply allow byte, half-word, and word sized loads
to the packet data.  The value returned is in cpu endianness.  These
two instructions come from classical BPF, and are thus older than some
of you reading this text right now.

Therefore, if you look at libpcap or any other piece of code that
generates classical BPF, you will see that it makes use of LD_ABS and
LD_IND.

But from C code, you can load members of "struct __sk_buff" and access
packet data directly using what you get from there.  We will refer to
this as "direct packet access" And this brings us to an important
topic.

Any direct packet access must be properly validated before it is
performed.  We'll get into what that means exactly in just a second.
If proper validation is not performed, the eBPF verifier will reject
your program and refuse to load it.

Here is how you do it.  Let's write a very simple program that returns
"1" if we have an ipv4 ethernet packet, and "0" otherwise.

SEC("my_program")
int my_main(struct __sk_buff *skb)
{
	void *data_end = (void *)(long)skb->data_end;
	void *data = (void *)(long)skb->data;

Here we load the extents of the packet data, basically the start and
end pointers.  The casts in the assignments are necessary, so please
just copy this pattern into your programs.

The packet starts with the ethernet header, so let's get that going:

	struct ethhdr *eth = (struct ethhdr *)(data);

Now, we can't just go "eth->h_proto", that's illegal.  We have to
explicitly test that such an access is in range and doesn't go
beyond "data_end".

So let's make that test:

	if (eth + 1 > data_end)
		return 0;

The eBPF verifier will see that "eth" holds a packet pointer,
and also that you have made sure that from "eth" to "eth + 1"
is inside the valid access range for the packet.

Therefore, from this point forward you may validly access any part of
"struct ethhdr" via the variable "eth".  Let's do that.

	if (eth->h_proto == bpf_htons(ETH_P_IP))
		return 1;
	return 0;
}

And that's it.

The program type has another influence on your program.  It determines
the meaning of your program's return value.

A program of type BPF_PROG_TYPE_SOCK_FILTER returns the number of
bytes of the packet which should be accepted by the filter.  A return
value of zero means drop the packet.  A non-zero return value means to
truncate the packet to that many bytes, and accept it.

So our example program above needs a little bit of an adjustment to
make it suitable for BPF_PROG_TYPE_SOCK_FILTER:

SEC("my_program")
int my_main(struct __sk_buff *skb)
{
	void *data_end = (void *)(long)skb->data_end;
	void *data = (void *)(long)skb->data;
	struct ethhdr *eth = (struct ethhdr *)(data);
	int len = skb->len;

	if (eth + 1 > data_end)
		return 0;
	if (eth->h_proto == bpf_htons(ETH_P_IP))
		return len;
	return 0;
}

So what changed is that we load "len" from the context metadata and
return "len" when we want to accept the packet.  This says "accept
the packet and do not truncate it."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Contextually speaking...
  2017-05-13 22:36 Contextually speaking David Miller
@ 2017-05-14  7:19 ` Vetoshkin Nikita
  2017-05-14 12:48   ` Daniel Borkmann
       [not found] ` <CAFkxUYVNTPxaO02AsV-ur3cgXoK-diZLxF9LMOjZupw0j67h9Q@mail.gmail.com>
  2018-01-05 22:22 ` Charlemagne Lasse
  2 siblings, 1 reply; 5+ messages in thread
From: Vetoshkin Nikita @ 2017-05-14  7:19 UTC (permalink / raw)
  To: David Miller; +Cc: xdp-newbies

As I understand from a C compiler point of view ->data and ->data_end
are just arbitrary pointers embedded in a struct. Where does this
semantics arises from? I.e. how does eBPF verifier knows that data
ends where data_end points to?

On Sun, May 14, 2017 at 3:36 AM, David Miller <davem@davemloft.net> wrote:
>
> Every eBPF program has a type, and that type is important because it
> determines the kind of "context" which will be passed into your
> program so that it can do it's work.
>
> The context is the argument passed into the main entry point of your
> eBPF program.
>
> The eBPF program type is specified when the program is loaded via the
> sys_bpf() system call.  For most of us this is usually achieved by
> calling bpf_load_program() in libbpf.  "enum bpf_prog_type" currently
> has the following values:
>
>         BPF_PROG_TYPE_SOCKET_FILTER
>         BPF_PROG_TYPE_KPROBE
>         BPF_PROG_TYPE_SCHED_CLS
>         BPF_PROG_TYPE_SCHED_ACT
>         BPF_PROG_TYPE_TRACEPOINT
>         BPF_PROG_TYPE_XDP
>         BPF_PROG_TYPE_PERF_EVENT
>         BPF_PROG_TYPE_CGROUP_SKB
>         BPF_PROG_TYPE_CGROUP_SOCK
>         BPF_PROG_TYPE_LWT_IN
>         BPF_PROG_TYPE_LWT_OUT
>         BPF_PROG_TYPE_LWT_XMIT
>
> More can appear in the future.
>
> For example, BPF_PROG_TYPE_SOCK_FILTER takes a "struct __sk_buff *" as
> it's context argument.  Programs of type BPF_PROG_TYPE_SCHED_CLS and
> BPF_PROG_TYPE_SCHED_ACT also take "struct __sk_buff *" as their
> context argument.
>
> These three program types have another thing in common, they are
> allowed to use the LD_ABS and LD_IND instructions to access packet
> data.  You cannot (currently) generate these from C code, only from
> hand written eBPF assembler.  But they are important to understand
> in their historical context.
>
> LD_ABS and LD_IND simply allow byte, half-word, and word sized loads
> to the packet data.  The value returned is in cpu endianness.  These
> two instructions come from classical BPF, and are thus older than some
> of you reading this text right now.
>
> Therefore, if you look at libpcap or any other piece of code that
> generates classical BPF, you will see that it makes use of LD_ABS and
> LD_IND.
>
> But from C code, you can load members of "struct __sk_buff" and access
> packet data directly using what you get from there.  We will refer to
> this as "direct packet access" And this brings us to an important
> topic.
>
> Any direct packet access must be properly validated before it is
> performed.  We'll get into what that means exactly in just a second.
> If proper validation is not performed, the eBPF verifier will reject
> your program and refuse to load it.
>
> Here is how you do it.  Let's write a very simple program that returns
> "1" if we have an ipv4 ethernet packet, and "0" otherwise.
>
> SEC("my_program")
> int my_main(struct __sk_buff *skb)
> {
>         void *data_end = (void *)(long)skb->data_end;
>         void *data = (void *)(long)skb->data;
>
> Here we load the extents of the packet data, basically the start and
> end pointers.  The casts in the assignments are necessary, so please
> just copy this pattern into your programs.
>
> The packet starts with the ethernet header, so let's get that going:
>
>         struct ethhdr *eth = (struct ethhdr *)(data);
>
> Now, we can't just go "eth->h_proto", that's illegal.  We have to
> explicitly test that such an access is in range and doesn't go
> beyond "data_end".
>
> So let's make that test:
>
>         if (eth + 1 > data_end)
>                 return 0;
>
> The eBPF verifier will see that "eth" holds a packet pointer,
> and also that you have made sure that from "eth" to "eth + 1"
> is inside the valid access range for the packet.
>
> Therefore, from this point forward you may validly access any part of
> "struct ethhdr" via the variable "eth".  Let's do that.
>
>         if (eth->h_proto == bpf_htons(ETH_P_IP))
>                 return 1;
>         return 0;
> }
>
> And that's it.
>
> The program type has another influence on your program.  It determines
> the meaning of your program's return value.
>
> A program of type BPF_PROG_TYPE_SOCK_FILTER returns the number of
> bytes of the packet which should be accepted by the filter.  A return
> value of zero means drop the packet.  A non-zero return value means to
> truncate the packet to that many bytes, and accept it.
>
> So our example program above needs a little bit of an adjustment to
> make it suitable for BPF_PROG_TYPE_SOCK_FILTER:
>
> SEC("my_program")
> int my_main(struct __sk_buff *skb)
> {
>         void *data_end = (void *)(long)skb->data_end;
>         void *data = (void *)(long)skb->data;
>         struct ethhdr *eth = (struct ethhdr *)(data);
>         int len = skb->len;
>
>         if (eth + 1 > data_end)
>                 return 0;
>         if (eth->h_proto == bpf_htons(ETH_P_IP))
>                 return len;
>         return 0;
> }
>
> So what changed is that we load "len" from the context metadata and
> return "len" when we want to accept the packet.  This says "accept
> the packet and do not truncate it."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Contextually speaking...
  2017-05-14  7:19 ` Vetoshkin Nikita
@ 2017-05-14 12:48   ` Daniel Borkmann
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Borkmann @ 2017-05-14 12:48 UTC (permalink / raw)
  To: Vetoshkin Nikita; +Cc: David Miller, xdp-newbies

On 05/14/2017 09:19 AM, Vetoshkin Nikita wrote:
> As I understand from a C compiler point of view ->data and ->data_end
> are just arbitrary pointers embedded in a struct. Where does this
> semantics arises from? I.e. how does eBPF verifier knows that data
> ends where data_end points to?

The verifier only needs to match on data/data_end and makes sure the
program code making use of this is within their bounds. It doesn't
need to know the actual address at verification time. We do this, so
that read/write access to the packet can happen efficiently without
needing to call a helper function to perform the same (and without
having to perform a check on every single access), adding data/data_end
into the context also allows to do all this without changing BPF JIT
compilers. The actual address for data/data_end is filled into the
xdp_buff context structure shortly before the BPF program gets
executed in the driver.

Best,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Contextually speaking...
       [not found] ` <CAFkxUYVNTPxaO02AsV-ur3cgXoK-diZLxF9LMOjZupw0j67h9Q@mail.gmail.com>
@ 2017-05-14 15:52   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2017-05-14 15:52 UTC (permalink / raw)
  To: nikita.vetoshkin; +Cc: xdp-newbies

From: Vetoshkin Nikita <nikita.vetoshkin@gmail.com>
Date: Sun, 14 May 2017 07:16:45 +0000

> As I understand from a C compiler point of view ->data and ->data_end are
> just arbitrary pointers embedded in a struct. Where does this semantics
> arises from? I.e. how does eBPF verifier knows that data ends where
> data_end points to?

Please do not top-post.

When the program runs, the invocation point sets ->data to skb->data
and ->data_end to "skb->data + skb->len" or something similar.

The kernel is in full control of the values set there in the context,
and that's why the verifier may assume these properties.  The verifier
executes in the kernel where the semantics are guaranteed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Contextually speaking...
  2017-05-13 22:36 Contextually speaking David Miller
  2017-05-14  7:19 ` Vetoshkin Nikita
       [not found] ` <CAFkxUYVNTPxaO02AsV-ur3cgXoK-diZLxF9LMOjZupw0j67h9Q@mail.gmail.com>
@ 2018-01-05 22:22 ` Charlemagne Lasse
  2 siblings, 0 replies; 5+ messages in thread
From: Charlemagne Lasse @ 2018-01-05 22:22 UTC (permalink / raw)
  To: David Miller; +Cc: xdp-newbies

2017-05-13 23:36 GMT+02:00 David Miller <davem@davemloft.net>:
<snip>
> So our example program above needs a little bit of an adjustment to
> make it suitable for BPF_PROG_TYPE_SOCK_FILTER:
>
> SEC("my_program")
> int my_main(struct __sk_buff *skb)
> {
>         void *data_end = (void *)(long)skb->data_end;
>         void *data = (void *)(long)skb->data;
>         struct ethhdr *eth = (struct ethhdr *)(data);
>         int len = skb->len;
>
>         if (eth + 1 > data_end)
>                 return 0;
>         if (eth->h_proto == bpf_htons(ETH_P_IP))
>                 return len;
>         return 0;
> }
>
> So what changed is that we load "len" from the context metadata and
> return "len" when we want to accept the packet.  This says "accept
> the packet and do not truncate it."

This cannot work with BPF_PROG_TYPE_SOCKET_FILTER because skb->data_end
and skb->data cannot be accessed by this BPF_PROG type. The verifier will
reject it with

0: (b7) r0 = 0
1: (61) r2 = *(u32 *)(r1 +80)
invalid bpf_context access off=80 size=4

Reason for that is the function sk_filter_is_valid_access which checks
the access to the __sk_buff context pointer (r1):

static bool sk_filter_is_valid_access(int off, int size,
      enum bpf_access_type type,
      struct bpf_insn_access_aux *info)
{
    switch (off) {
    case bpf_ctx_range(struct __sk_buff, tc_classid):
    case bpf_ctx_range(struct __sk_buff, data):
    case bpf_ctx_range(struct __sk_buff, data_meta):
    case bpf_ctx_range(struct __sk_buff, data_end):
    case bpf_ctx_range_till(struct __sk_buff, family, local_port):
        return false;
    }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-05 22:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-13 22:36 Contextually speaking David Miller
2017-05-14  7:19 ` Vetoshkin Nikita
2017-05-14 12:48   ` Daniel Borkmann
     [not found] ` <CAFkxUYVNTPxaO02AsV-ur3cgXoK-diZLxF9LMOjZupw0j67h9Q@mail.gmail.com>
2017-05-14 15:52   ` David Miller
2018-01-05 22:22 ` Charlemagne Lasse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.