* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log)
@ 2014-09-19 21:04 Alexei Starovoitov
[not found] ` <CAMEtUux071cELLdoWs21WL0dqgdwj+O=P64aeXSfyUtFW9U69w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2014-09-19 21:04 UTC (permalink / raw)
To: David S. Miller
Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski,
Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
Kees Cook, Linux API, Network Development, LKML, Daniel Borkmann
On Thu, Sep 18, 2014 at 10:28 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 09/18/2014 05:24 PM, Alexei Starovoitov wrote:
> ...
>>
>> solve or not. If we decide to solve it, we need to have
>> a plan to solve it all the way. Partial fix for size of bpf_attr
>> is not a plan. It's something that is not addressing the problem
>> completely. Little bit of help is not useful for userspace. It
>> would need to deal with new types, verifier differences and
>> other things that I mentioned earlier.
>
>
> Hm, I don't think it would be a strict requirement to solve it
> all the way, and I think that perf_event_open() with perf_copy_attr()
> is not trying to do so either. It, however, is trying on a ``best
> effort basis'' to still load something if new features are unused
> by the binary (I guess you saw the comment in perf_copy_attr()).
>
> Iff, e.g. due to new types we fail at the verifier stage, sure,
> that's life since we only have backwards-compatible guarantee,
> but in case we tried to use features we support, we're still able
> to load the eBPF program while right now, we're rejecting it right
> up-front. That's just my $0.02 ...
David,
the 'changes requested' status means that you want me to
address this forward compatibility now instead of later?
Or something else?
I don't want to second guess, respin and spam people
unnecessary. In this case I don't think Daniel is insisting
on doing it in this patch set. The things discussed above
are not urgent. Unless I'm missing something.
Please clarify.
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <CAMEtUux071cELLdoWs21WL0dqgdwj+O=P64aeXSfyUtFW9U69w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <CAMEtUux071cELLdoWs21WL0dqgdwj+O=P64aeXSfyUtFW9U69w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-09-19 21:38 ` David Miller 0 siblings, 0 replies; 14+ messages in thread From: David Miller @ 2014-09-19 21:38 UTC (permalink / raw) To: ast-uqk4Ao+rVK5Wk0Htik3J/w Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, luto-kltTT9wpgjJwATOyAt5JVQ, hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r, chema-hpIqsD4AKlfQT0dZR+AlfA, edumazet-hpIqsD4AKlfQT0dZR+AlfA, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, pablo-Cap9r6Oaw4JrovVCs/uTlw, hpa-YMNOUZJC4hwAvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, keescook-F7+t8E8rja9g9hUCZPvPmw, linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, dborkman-H+wXaHxf7aLQT0dZR+AlfA From: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> Date: Fri, 19 Sep 2014 14:04:50 -0700 > the 'changes requested' status means that you want me to > address this forward compatibility now instead of later? I want you to address this now. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v13 net-next 00/11] eBPF syscall, verifier, testsuite
@ 2014-09-17 0:39 Alexei Starovoitov
[not found] ` <1410914370-29883-1-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2014-09-17 0:39 UTC (permalink / raw)
To: David S. Miller
Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Daniel Borkmann,
Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
Kees Cook, linux-api-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hi All,
v12 -> v13:
- replaced 'foo __user *' pointers with __aligned_u64 (suggested by David)
- added __attribute__((aligned(8)) to 'union bpf_attr' to keep
constant alignment between patches
- updated manpage and syscall wrappers due to __aligned_u64
- rebased, retested on x64 with 32-bit and 64-bit userspace and on i386,
build tested on arm32,sparc64
v11 -> v12:
- dropped patch 11 and copied few macros to libbpf.h (suggested by Daniel)
- replaced 'enum bpf_prog_type' with u32 to be safe in compat (.. Andy)
- implemented and tested compat support (not part of this set) (.. Daniel)
- changed 'void *log_buf' to 'char *' (.. Daniel)
- combined struct bpf_work_struct and bpf_prog_info (.. Daniel)
- added better return value explanation to manpage (.. Andy)
- added log_buf/log_size explanation to manpage (.. Andy & Daniel)
- added a lot more info about prog_type and map_type to manpage (.. Andy)
- rebased, tweaked test_stubs
Patches 1-4 establish BPF syscall shell for maps and programs.
Patches 5-10 add verifier step by step
Patch 11 adds test stubs for 'unspec' program type and verifier testsuite
from user space
Note that patches 1,3,4,7 add commands and attributes to the syscall
while being backwards compatible from each other, which should demonstrate
how other commands can be added in the future.
After this set the programs can be loaded for testing only. They cannot
be attached to any events. Though manpage talks about tracing and sockets,
it will be a subject of future patches.
Please take a look at manpage:
BPF(2) Linux Programmer's Manual BPF(2)
NAME
bpf - perform a command on eBPF map or program
SYNOPSIS
#include <linux/bpf.h>
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
DESCRIPTION
bpf() syscall is a multiplexor for a range of different operations on
eBPF which can be characterized as "universal in-kernel virtual
machine". eBPF is similar to original Berkeley Packet Filter (or
"classic BPF") used to filter network packets. Both statically analyze
the programs before loading them into the kernel to ensure that
programs cannot harm the running system.
eBPF extends classic BPF in multiple ways including ability to call in-
kernel helper functions and access shared data structures like eBPF
maps. The programs can be written in a restricted C that is compiled
into eBPF bytecode and executed on the eBPF virtual machine or JITed
into native instruction set.
eBPF Design/Architecture
eBPF maps is a generic storage of different types. User process can
create multiple maps (with key/value being opaque bytes of data) and
access them via file descriptor. In parallel eBPF programs can access
maps from inside the kernel. It's up to user process and eBPF program
to decide what they store inside maps.
eBPF programs are similar to kernel modules. They are loaded by the
user process and automatically unloaded when process exits. Each eBPF
program is a safe run-to-completion set of instructions. eBPF verifier
statically determines that the program terminates and is safe to
execute. During verification the program takes a hold of maps that it
intends to use, so selected maps cannot be removed until the program is
unloaded. The program can be attached to different events. These events
can be packets, tracepoint events and other types in the future. A new
event triggers execution of the program which may store information
about the event in the maps. Beyond storing data the programs may call
into in-kernel helper functions which may, for example, dump stack, do
trace_printk or other forms of live kernel debugging. The same program
can be attached to multiple events. Different programs can access the
same map:
tracepoint tracepoint tracepoint sk_buff sk_buff
event A event B event C on eth0 on eth1
| | | | |
| | | | |
--> tracing <-- tracing socket socket
prog_1 prog_2 prog_3 prog_4
| | | |
|--- -----| |-------| map_3
map_1 map_2
Syscall Arguments
bpf() syscall operation is determined by cmd which can be one of the
following:
BPF_MAP_CREATE
Create a map with given type and attributes and return map FD
BPF_MAP_LOOKUP_ELEM
Lookup element by key in a given map and return its value
BPF_MAP_UPDATE_ELEM
Create or update element (key/value pair) in a given map
BPF_MAP_DELETE_ELEM
Lookup and delete element by key in a given map
BPF_MAP_GET_NEXT_KEY
Lookup element by key in a given map and return key of next
element
BPF_PROG_LOAD
Verify and load eBPF program
attr is a pointer to a union of type bpf_attr as defined below.
size is the size of the union.
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type;
__u32 key_size; /* size of key in bytes */
__u32 value_size; /* size of value in bytes */
__u32 max_entries; /* max number of entries in a map */
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
__u32 map_fd;
__aligned_u64 key;
union {
__aligned_u64 value;
__aligned_u64 next_key;
};
};
struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type;
__u32 insn_cnt;
__aligned_u64 insns; /* 'const struct bpf_insn *' */
__aligned_u64 license; /* 'const char *' */
__u32 log_level; /* verbosity level of eBPF verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied 'char *' buffer */
};
} __attribute__((aligned(8)));
eBPF maps
maps is a generic storage of different types for sharing data between
kernel and userspace.
Any map type has the following attributes:
. type
. max number of elements
. key size in bytes
. value size in bytes
The following wrapper functions demonstrate how this syscall can be
used to access the maps. The functions use the cmd argument to invoke
different operations.
BPF_MAP_CREATE
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries)
{
union bpf_attr attr = {
.map_type = map_type,
.key_size = key_size,
.value_size = value_size,
.max_entries = max_entries
};
return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
bpf() syscall creates a map of map_type type and given
attributes key_size, value_size, max_entries. On success it
returns process-local file descriptor. On error, -1 is returned
and errno is set to EINVAL or EPERM or ENOMEM.
The attributes key_size and value_size will be used by verifier
during program loading to check that program is calling
bpf_map_*_elem() helper functions with correctly initialized key
and that program doesn't access map element value beyond
specified value_size. For example, when map is created with
key_size = 8 and program does:
bpf_map_lookup_elem(map_fd, fp - 4)
such program will be rejected, since in-kernel helper function
bpf_map_lookup_elem(map_fd, void *key) expects to read 8 bytes
from 'key' pointer, but 'fp - 4' starting address will cause out
of bounds stack access.
Similarly, when map is created with value_size = 1 and program
does:
value = bpf_map_lookup_elem(...);
*(u32 *)value = 1;
such program will be rejected, since it accesses value pointer
beyond specified 1 byte value_size limit.
Currently only hash table map_type is supported:
enum bpf_map_type {
BPF_MAP_TYPE_UNSPEC,
BPF_MAP_TYPE_HASH,
};
map_type selects one of the available map implementations in
kernel. For all map_types eBPF programs access maps with the
same bpf_map_lookup_elem()/bpf_map_update_elem() helper
functions.
BPF_MAP_LOOKUP_ELEM
int bpf_lookup_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
};
return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
}
bpf() syscall looks up an element with given key in a map fd.
If element is found it returns zero and stores element's value
into value. If element is not found it returns -1 and sets
errno to ENOENT.
BPF_MAP_UPDATE_ELEM
int bpf_update_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
};
return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}
The call creates or updates element with given key/value in a
map fd. On success it returns zero. On error, -1 is returned
and errno is set to EINVAL or EPERM or ENOMEM or E2BIG. E2BIG
indicates that number of elements in the map reached max_entries
limit specified at map creation time.
BPF_MAP_DELETE_ELEM
int bpf_delete_elem(int fd, void *key)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
};
return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
}
The call deletes an element in a map fd with given key. Returns
zero on success. If element is not found it returns -1 and sets
errno to ENOENT.
BPF_MAP_GET_NEXT_KEY
int bpf_get_next_key(int fd, void *key, void *next_key)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.next_key = ptr_to_u64(next_key),
};
return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
}
The call looks up an element by key in a given map fd and
returns key of the next element into next_key pointer. If key is
not found, it return zero and returns key of the first element
into next_key. If key is the last element, it returns -1 and
sets errno to ENOENT. Other possible errno values are ENOMEM,
EFAULT, EPERM, EINVAL. This method can be used to iterate over
all elements of the map.
close(map_fd)
will delete the map map_fd. Exiting process will delete all
maps automatically.
eBPF programs
BPF_PROG_LOAD
This cmd is used to load eBPF program into the kernel.
char bpf_log_buf[LOG_BUF_SIZE];
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int insn_cnt,
const char *license)
{
union bpf_attr attr = {
.prog_type = prog_type,
.insns = ptr_to_u64(insns),
.insn_cnt = insn_cnt,
.license = ptr_to_u64(license),
.log_buf = ptr_to_u64(bpf_log_buf),
.log_size = LOG_BUF_SIZE,
.log_level = 1,
};
return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
prog_type is one of the available program types:
enum bpf_prog_type {
BPF_PROG_TYPE_UNSPEC,
BPF_PROG_TYPE_SOCKET,
BPF_PROG_TYPE_TRACING,
};
By picking prog_type program author selects a set of helper
functions callable from eBPF program and corresponding format of
struct bpf_context (which is the data blob passed into the
program as the first argument). For example, the programs
loaded with prog_type = TYPE_TRACING may call bpf_printk()
helper, whereas TYPE_SOCKET programs may not. The set of
functions available to the programs under given type may
increase in the future.
Currently the set of functions for TYPE_TRACING is:
bpf_map_lookup_elem(map_fd, void *key) // lookup key in a map_fd
bpf_map_update_elem(map_fd, void *key, void *value) // update key/value
bpf_map_delete_elem(map_fd, void *key) // delete key in a map_fd
bpf_ktime_get_ns(void) // returns current ktime
bpf_printk(char *fmt, int fmt_size, ...) // prints into trace buffer
bpf_memcmp(void *ptr1, void *ptr2, int size) // non-faulting memcmp
bpf_fetch_ptr(void *ptr) // non-faulting load pointer from any address
bpf_fetch_u8(void *ptr) // non-faulting 1 byte load
bpf_fetch_u16(void *ptr) // other non-faulting loads
bpf_fetch_u32(void *ptr)
bpf_fetch_u64(void *ptr)
and bpf_context is defined as:
struct bpf_context {
/* argN fields match one to one to arguments passed to trace events */
u64 arg1, arg2, arg3, arg4, arg5, arg6;
/* return value from kretprobe event or from syscall_exit event */
u64 ret;
};
The set of helper functions for TYPE_SOCKET is TBD.
More program types may be added in the future. Like
BPF_PROG_TYPE_USER_TRACING for unprivileged programs.
BPF_PROG_TYPE_UNSPEC is used for testing only. Such programs
cannot be attached to events.
insns array of "struct bpf_insn" instructions
insn_cnt number of instructions in the program
license license string, which must be GPL compatible to call
helper functions marked gpl_only
log_buf user supplied buffer that in-kernel verifier is using to
store verification log. Log is a multi-line string that should
be used by program author to understand how verifier came to
conclusion that program is unsafe. The format of the output can
change at any time as verifier evolves.
log_size size of user buffer. If size of the buffer is not large
enough to store all verifier messages, -1 is returned and errno
is set to ENOSPC.
log_level verbosity level of eBPF verifier, where zero means no
logs provided
close(prog_fd)
will unload eBPF program
The maps are accesible from programs and generally tie the two
together. Programs process various events (like tracepoint, kprobe,
packets) and store the data into maps. User space fetches data from
maps. Either the same or a different map may be used by user space as
configuration space to alter program behavior on the fly.
Events
Once an eBPF program is loaded, it can be attached to an event. Various
kernel subsystems have different ways to do so. For example:
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd));
will attach the program prog_fd to socket sock which was received by
prior call to socket().
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
will attach the program prog_fd to perf event event_fd which was
received by prior call to perf_event_open().
Another way to attach the program to a tracing event is:
event_fd = open("/sys/kernel/debug/tracing/events/skb/kfree_skb/filter");
write(event_fd, "bpf-123"); /* where 123 is eBPF program FD */
/* here program is attached and will be triggered by events */
close(event_fd); /* to detach from event */
EXAMPLES
/* eBPF+sockets example:
* 1. create map with maximum of 2 elements
* 2. set map[6] = 0 and map[17] = 0
* 3. load eBPF program that counts number of TCP and UDP packets received
* via map[skb->ip->proto]++
* 4. attach prog_fd to raw socket via setsockopt()
* 5. print number of received TCP/UDP packets every second
*/
int main(int ac, char **av)
{
int sock, map_fd, prog_fd, key;
long long value = 0, tcp_cnt, udp_cnt;
map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 2);
if (map_fd < 0) {
printf("failed to create map '%s'\n", strerror(errno));
/* likely not run as root */
return 1;
}
key = 6; /* ip->proto == tcp */
assert(bpf_update_elem(map_fd, &key, &value) == 0);
key = 17; /* ip->proto == udp */
assert(bpf_update_elem(map_fd, &key, &value) == 0);
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
BPF_LD_ABS(BPF_B, 14 + 9), /* r0 = ip->proto */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), /* r0 = map_lookup(r1, r2) */
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* if (r0 == 0) goto pc+2 */
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *)r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(), /* return r0 */
};
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET, prog, sizeof(prog), "GPL");
assert(prog_fd >= 0);
sock = open_raw_sock("lo");
assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
sizeof(prog_fd)) == 0);
for (;;) {
key = 6;
assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
key = 17;
assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
printf("TCP %lld UDP %lld packets0, tcp_cnt, udp_cnt);
sleep(1);
}
return 0;
}
RETURN VALUE
For a successful call, the return value depends on the operation:
BPF_MAP_CREATE
The new file descriptor associated with eBPF map.
BPF_PROG_LOAD
The new file descriptor associated with eBPF program.
All other commands
Zero.
On error, -1 is returned, and errno is set appropriately.
ERRORS
EPERM bpf() syscall was made without sufficient privilege (without the
CAP_SYS_ADMIN capability).
ENOMEM Cannot allocate sufficient memory.
EBADF fd is not an open file descriptor
EFAULT One of the pointers ( key or value or log_buf or insns ) is
outside accessible address space.
EINVAL The value specified in cmd is not recognized by this kernel.
EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.
EINVAL For BPF_MAP_*_ELEM commands, some of the fields of "union
bpf_attr" unused by this command are not set to zero.
EINVAL For BPF_PROG_LOAD, attempt to load invalid program (unrecognized
instruction or uses reserved fields or jumps out of range or
loop detected or calls unknown function).
EACCES For BPF_PROG_LOAD, though program has valid instructions, it was
rejected, since it was deemed unsafe (may access disallowed
memory region or uninitialized stack/register or function
constraints don't match actual types or misaligned access). In
such case it is recommended to call bpf() again with log_level =
1 and examine log_buf for specific reason provided by verifier.
ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that
element with given key was not found.
E2BIG program is too large or a map reached max_entries limit (max
number of elements).
NOTES
These commands may be used only by a privileged process (one having the
CAP_SYS_ADMIN capability).
SEE ALSO
eBPF architecture and instruction set is explained in
Documentation/networking/filter.txt
Linux 2014-09-16 BPF(2)
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <1410914370-29883-1-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>]
* [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <1410914370-29883-1-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> @ 2014-09-17 0:39 ` Alexei Starovoitov [not found] ` <1410914370-29883-8-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2014-09-17 0:39 UTC (permalink / raw) To: David S. Miller Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Daniel Borkmann, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA add optional attributes for BPF_PROG_LOAD syscall: union bpf_attr { struct { ... __u32 log_level; /* verbosity level of eBPF verifier */ __u32 log_size; /* size of user buffer */ __aligned_u64 log_buf; /* user supplied 'char *buffer' */ }; }; when log_level > 0 the verifier will return its verification log in the user supplied buffer 'log_buf' which can be used by program author to analyze why verifier rejected given program. 'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt provides several examples of these messages, like the program: BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_EXIT_INSN(), will be rejected with the following multi-line message in log_buf: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (b7) r1 = 0 4: (85) call 1 5: (15) if r0 == 0x0 goto pc+1 R0=map_ptr R10=fp 6: (7a) *(u64 *)(r0 +4) = 0 misaligned access off 4 size 8 The format of the output can change at any time as verifier evolves. Signed-off-by: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> --- include/uapi/linux/bpf.h | 3 + kernel/bpf/syscall.c | 2 +- kernel/bpf/verifier.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 239 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 424f442016e7..31b0ac208a52 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -138,6 +138,9 @@ union bpf_attr { __u32 insn_cnt; __aligned_u64 insns; __aligned_u64 license; + __u32 log_level; /* verbosity level of verifier */ + __u32 log_size; /* size of user buffer */ + __aligned_u64 log_buf; /* user supplied buffer */ }; } __attribute__((aligned(8))); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 67b5e29f183e..c7be7163bd11 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -458,7 +458,7 @@ struct bpf_prog *bpf_prog_get(u32 ufd) } /* last field in 'union bpf_attr' used by this command */ -#define BPF_PROG_LOAD_LAST_FIELD license +#define BPF_PROG_LOAD_LAST_FIELD log_buf static int bpf_prog_load(union bpf_attr *attr) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d6f9c3d6b4d7..871edc1f2e1f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -125,9 +125,244 @@ * are set to NOT_INIT to indicate that they are no longer readable. */ +/* single container for all structs + * one verifier_env per bpf_check() call + */ +struct verifier_env { +}; + +/* verbose verifier prints what it's seeing + * bpf_check() is called under lock, so no race to access these global vars + */ +static u32 log_level, log_size, log_len; +static char *log_buf; + +static DEFINE_MUTEX(bpf_verifier_lock); + +/* log_level controls verbosity level of eBPF verifier. + * verbose() is used to dump the verification trace to the log, so the user + * can figure out what's wrong with the program + */ +static void verbose(const char *fmt, ...) +{ + va_list args; + + if (log_level == 0 || log_len >= log_size - 1) + return; + + va_start(args, fmt); + log_len += vscnprintf(log_buf + log_len, log_size - log_len, fmt, args); + va_end(args); +} + +static const char *const bpf_class_string[] = { + [BPF_LD] = "ld", + [BPF_LDX] = "ldx", + [BPF_ST] = "st", + [BPF_STX] = "stx", + [BPF_ALU] = "alu", + [BPF_JMP] = "jmp", + [BPF_RET] = "BUG", + [BPF_ALU64] = "alu64", +}; + +static const char *const bpf_alu_string[] = { + [BPF_ADD >> 4] = "+=", + [BPF_SUB >> 4] = "-=", + [BPF_MUL >> 4] = "*=", + [BPF_DIV >> 4] = "/=", + [BPF_OR >> 4] = "|=", + [BPF_AND >> 4] = "&=", + [BPF_LSH >> 4] = "<<=", + [BPF_RSH >> 4] = ">>=", + [BPF_NEG >> 4] = "neg", + [BPF_MOD >> 4] = "%=", + [BPF_XOR >> 4] = "^=", + [BPF_MOV >> 4] = "=", + [BPF_ARSH >> 4] = "s>>=", + [BPF_END >> 4] = "endian", +}; + +static const char *const bpf_ldst_string[] = { + [BPF_W >> 3] = "u32", + [BPF_H >> 3] = "u16", + [BPF_B >> 3] = "u8", + [BPF_DW >> 3] = "u64", +}; + +static const char *const bpf_jmp_string[] = { + [BPF_JA >> 4] = "jmp", + [BPF_JEQ >> 4] = "==", + [BPF_JGT >> 4] = ">", + [BPF_JGE >> 4] = ">=", + [BPF_JSET >> 4] = "&", + [BPF_JNE >> 4] = "!=", + [BPF_JSGT >> 4] = "s>", + [BPF_JSGE >> 4] = "s>=", + [BPF_CALL >> 4] = "call", + [BPF_EXIT >> 4] = "exit", +}; + +static void print_bpf_insn(struct bpf_insn *insn) +{ + u8 class = BPF_CLASS(insn->code); + + if (class == BPF_ALU || class == BPF_ALU64) { + if (BPF_SRC(insn->code) == BPF_X) + verbose("(%02x) %sr%d %s %sr%d\n", + insn->code, class == BPF_ALU ? "(u32) " : "", + insn->dst_reg, + bpf_alu_string[BPF_OP(insn->code) >> 4], + class == BPF_ALU ? "(u32) " : "", + insn->src_reg); + else + verbose("(%02x) %sr%d %s %s%d\n", + insn->code, class == BPF_ALU ? "(u32) " : "", + insn->dst_reg, + bpf_alu_string[BPF_OP(insn->code) >> 4], + class == BPF_ALU ? "(u32) " : "", + insn->imm); + } else if (class == BPF_STX) { + if (BPF_MODE(insn->code) == BPF_MEM) + verbose("(%02x) *(%s *)(r%d %+d) = r%d\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, + insn->off, insn->src_reg); + else if (BPF_MODE(insn->code) == BPF_XADD) + verbose("(%02x) lock *(%s *)(r%d %+d) += r%d\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, + insn->src_reg); + else + verbose("BUG_%02x\n", insn->code); + } else if (class == BPF_ST) { + if (BPF_MODE(insn->code) != BPF_MEM) { + verbose("BUG_st_%02x\n", insn->code); + return; + } + verbose("(%02x) *(%s *)(r%d %+d) = %d\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, + insn->off, insn->imm); + } else if (class == BPF_LDX) { + if (BPF_MODE(insn->code) != BPF_MEM) { + verbose("BUG_ldx_%02x\n", insn->code); + return; + } + verbose("(%02x) r%d = *(%s *)(r%d %+d)\n", + insn->code, insn->dst_reg, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->src_reg, insn->off); + } else if (class == BPF_LD) { + if (BPF_MODE(insn->code) == BPF_ABS) { + verbose("(%02x) r0 = *(%s *)skb[%d]\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->imm); + } else if (BPF_MODE(insn->code) == BPF_IND) { + verbose("(%02x) r0 = *(%s *)skb[r%d + %d]\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->src_reg, insn->imm); + } else if (BPF_MODE(insn->code) == BPF_IMM) { + verbose("(%02x) r%d = 0x%x\n", + insn->code, insn->dst_reg, insn->imm); + } else { + verbose("BUG_ld_%02x\n", insn->code); + return; + } + } else if (class == BPF_JMP) { + u8 opcode = BPF_OP(insn->code); + + if (opcode == BPF_CALL) { + verbose("(%02x) call %d\n", insn->code, insn->imm); + } else if (insn->code == (BPF_JMP | BPF_JA)) { + verbose("(%02x) goto pc%+d\n", + insn->code, insn->off); + } else if (insn->code == (BPF_JMP | BPF_EXIT)) { + verbose("(%02x) exit\n", insn->code); + } else if (BPF_SRC(insn->code) == BPF_X) { + verbose("(%02x) if r%d %s r%d goto pc%+d\n", + insn->code, insn->dst_reg, + bpf_jmp_string[BPF_OP(insn->code) >> 4], + insn->src_reg, insn->off); + } else { + verbose("(%02x) if r%d %s 0x%x goto pc%+d\n", + insn->code, insn->dst_reg, + bpf_jmp_string[BPF_OP(insn->code) >> 4], + insn->imm, insn->off); + } + } else { + verbose("(%02x) %s\n", insn->code, bpf_class_string[class]); + } +} + int bpf_check(struct bpf_prog *prog, union bpf_attr *attr) { + char __user *log_ubuf = NULL; + struct verifier_env *env; int ret = -EINVAL; + if (prog->len <= 0 || prog->len > BPF_MAXINSNS) + return -E2BIG; + + /* 'struct verifier_env' can be global, but since it's not small, + * allocate/free it every time bpf_check() is called + */ + env = kzalloc(sizeof(struct verifier_env), GFP_KERNEL); + if (!env) + return -ENOMEM; + + /* grab the mutex to protect few globals used by verifier */ + mutex_lock(&bpf_verifier_lock); + + if (attr->log_level || attr->log_buf || attr->log_size) { + /* user requested verbose verifier output + * and supplied buffer to store the verification trace + */ + log_level = attr->log_level; + log_ubuf = (char __user *) (unsigned long) attr->log_buf; + log_size = attr->log_size; + log_len = 0; + + ret = -EINVAL; + /* log_* values have to be sane */ + if (log_size < 128 || log_size > UINT_MAX >> 8 || + log_level == 0 || log_ubuf == NULL) + goto free_env; + + ret = -ENOMEM; + log_buf = vmalloc(log_size); + if (!log_buf) + goto free_env; + } else { + log_level = 0; + } + + /* ret = do_check(env); */ + + if (log_level && log_len >= log_size - 1) { + BUG_ON(log_len >= log_size); + /* verifier log exceeded user supplied buffer */ + ret = -ENOSPC; + /* fall through to return what was recorded */ + } + + /* copy verifier log back to user space including trailing zero */ + if (log_level && copy_to_user(log_ubuf, log_buf, log_len + 1) != 0) { + ret = -EFAULT; + goto free_log_buf; + } + + +free_log_buf: + if (log_level) + vfree(log_buf); +free_env: + kfree(env); + mutex_unlock(&bpf_verifier_lock); return ret; } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <1410914370-29883-8-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>]
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <1410914370-29883-8-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> @ 2014-09-17 6:51 ` Daniel Borkmann 2014-09-17 16:08 ` Alexei Starovoitov 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2014-09-17 6:51 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On 09/17/2014 02:39 AM, Alexei Starovoitov wrote: > add optional attributes for BPF_PROG_LOAD syscall: > union bpf_attr { > struct { > ... > __u32 log_level; /* verbosity level of eBPF verifier */ > __u32 log_size; /* size of user buffer */ > __aligned_u64 log_buf; /* user supplied 'char *buffer' */ > }; > }; > > when log_level > 0 the verifier will return its verification log in the user > supplied buffer 'log_buf' which can be used by program author to analyze why > verifier rejected given program. > > 'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt > provides several examples of these messages, like the program: > > BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), > BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), > BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), > BPF_LD_MAP_FD(BPF_REG_1, 0), > BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), > BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), > BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), > BPF_EXIT_INSN(), > > will be rejected with the following multi-line message in log_buf: > > 0: (7a) *(u64 *)(r10 -8) = 0 > 1: (bf) r2 = r10 > 2: (07) r2 += -8 > 3: (b7) r1 = 0 > 4: (85) call 1 > 5: (15) if r0 == 0x0 goto pc+1 > R0=map_ptr R10=fp > 6: (7a) *(u64 *)(r0 +4) = 0 > misaligned access off 4 size 8 > > The format of the output can change at any time as verifier evolves. > > Signed-off-by: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> > --- > include/uapi/linux/bpf.h | 3 + > kernel/bpf/syscall.c | 2 +- > kernel/bpf/verifier.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 239 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 424f442016e7..31b0ac208a52 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -138,6 +138,9 @@ union bpf_attr { > __u32 insn_cnt; > __aligned_u64 insns; > __aligned_u64 license; > + __u32 log_level; /* verbosity level of verifier */ > + __u32 log_size; /* size of user buffer */ > + __aligned_u64 log_buf; /* user supplied buffer */ > }; > } __attribute__((aligned(8))); > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 67b5e29f183e..c7be7163bd11 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -458,7 +458,7 @@ struct bpf_prog *bpf_prog_get(u32 ufd) > } > > /* last field in 'union bpf_attr' used by this command */ > -#define BPF_PROG_LOAD_LAST_FIELD license > +#define BPF_PROG_LOAD_LAST_FIELD log_buf I was looking to find a use case for this item, but couldn't find anything, so this seems to be dead code? Was it, so that each time you extend an uapi structure like above that you would only access the structure up to BPF_PROG_LOAD_LAST_FIELD? That might not work for old binaries using this ABI running on newer kernels where there are different expectations of what BPF_PROG_LOAD_LAST_FIELD has been at the time of compilation. > static int bpf_prog_load(union bpf_attr *attr) > { > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index d6f9c3d6b4d7..871edc1f2e1f 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -125,9 +125,244 @@ > * are set to NOT_INIT to indicate that they are no longer readable. > */ > > +/* single container for all structs > + * one verifier_env per bpf_check() call > + */ > +struct verifier_env { > +}; > + > +/* verbose verifier prints what it's seeing > + * bpf_check() is called under lock, so no race to access these global vars > + */ > +static u32 log_level, log_size, log_len; > +static char *log_buf; > + > +static DEFINE_MUTEX(bpf_verifier_lock); > + > +/* log_level controls verbosity level of eBPF verifier. > + * verbose() is used to dump the verification trace to the log, so the user > + * can figure out what's wrong with the program > + */ > +static void verbose(const char *fmt, ...) > +{ > + va_list args; > + > + if (log_level == 0 || log_len >= log_size - 1) > + return; > + > + va_start(args, fmt); > + log_len += vscnprintf(log_buf + log_len, log_size - log_len, fmt, args); > + va_end(args); > +} > + > +static const char *const bpf_class_string[] = { > + [BPF_LD] = "ld", > + [BPF_LDX] = "ldx", > + [BPF_ST] = "st", > + [BPF_STX] = "stx", > + [BPF_ALU] = "alu", > + [BPF_JMP] = "jmp", > + [BPF_RET] = "BUG", > + [BPF_ALU64] = "alu64", > +}; > + > +static const char *const bpf_alu_string[] = { > + [BPF_ADD >> 4] = "+=", > + [BPF_SUB >> 4] = "-=", > + [BPF_MUL >> 4] = "*=", > + [BPF_DIV >> 4] = "/=", > + [BPF_OR >> 4] = "|=", > + [BPF_AND >> 4] = "&=", > + [BPF_LSH >> 4] = "<<=", > + [BPF_RSH >> 4] = ">>=", > + [BPF_NEG >> 4] = "neg", > + [BPF_MOD >> 4] = "%=", > + [BPF_XOR >> 4] = "^=", > + [BPF_MOV >> 4] = "=", > + [BPF_ARSH >> 4] = "s>>=", > + [BPF_END >> 4] = "endian", > +}; > + > +static const char *const bpf_ldst_string[] = { > + [BPF_W >> 3] = "u32", > + [BPF_H >> 3] = "u16", > + [BPF_B >> 3] = "u8", > + [BPF_DW >> 3] = "u64", > +}; > + > +static const char *const bpf_jmp_string[] = { > + [BPF_JA >> 4] = "jmp", > + [BPF_JEQ >> 4] = "==", > + [BPF_JGT >> 4] = ">", > + [BPF_JGE >> 4] = ">=", > + [BPF_JSET >> 4] = "&", > + [BPF_JNE >> 4] = "!=", > + [BPF_JSGT >> 4] = "s>", > + [BPF_JSGE >> 4] = "s>=", > + [BPF_CALL >> 4] = "call", > + [BPF_EXIT >> 4] = "exit", > +}; > + > +static void print_bpf_insn(struct bpf_insn *insn) > +{ > + u8 class = BPF_CLASS(insn->code); > + > + if (class == BPF_ALU || class == BPF_ALU64) { > + if (BPF_SRC(insn->code) == BPF_X) > + verbose("(%02x) %sr%d %s %sr%d\n", > + insn->code, class == BPF_ALU ? "(u32) " : "", > + insn->dst_reg, > + bpf_alu_string[BPF_OP(insn->code) >> 4], > + class == BPF_ALU ? "(u32) " : "", > + insn->src_reg); > + else > + verbose("(%02x) %sr%d %s %s%d\n", > + insn->code, class == BPF_ALU ? "(u32) " : "", > + insn->dst_reg, > + bpf_alu_string[BPF_OP(insn->code) >> 4], > + class == BPF_ALU ? "(u32) " : "", > + insn->imm); > + } else if (class == BPF_STX) { > + if (BPF_MODE(insn->code) == BPF_MEM) > + verbose("(%02x) *(%s *)(r%d %+d) = r%d\n", > + insn->code, > + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], > + insn->dst_reg, > + insn->off, insn->src_reg); > + else if (BPF_MODE(insn->code) == BPF_XADD) > + verbose("(%02x) lock *(%s *)(r%d %+d) += r%d\n", > + insn->code, > + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], > + insn->dst_reg, insn->off, > + insn->src_reg); > + else > + verbose("BUG_%02x\n", insn->code); > + } else if (class == BPF_ST) { > + if (BPF_MODE(insn->code) != BPF_MEM) { > + verbose("BUG_st_%02x\n", insn->code); > + return; > + } > + verbose("(%02x) *(%s *)(r%d %+d) = %d\n", > + insn->code, > + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], > + insn->dst_reg, > + insn->off, insn->imm); > + } else if (class == BPF_LDX) { > + if (BPF_MODE(insn->code) != BPF_MEM) { > + verbose("BUG_ldx_%02x\n", insn->code); > + return; > + } > + verbose("(%02x) r%d = *(%s *)(r%d %+d)\n", > + insn->code, insn->dst_reg, > + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], > + insn->src_reg, insn->off); > + } else if (class == BPF_LD) { > + if (BPF_MODE(insn->code) == BPF_ABS) { > + verbose("(%02x) r0 = *(%s *)skb[%d]\n", > + insn->code, > + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], > + insn->imm); > + } else if (BPF_MODE(insn->code) == BPF_IND) { > + verbose("(%02x) r0 = *(%s *)skb[r%d + %d]\n", > + insn->code, > + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], > + insn->src_reg, insn->imm); > + } else if (BPF_MODE(insn->code) == BPF_IMM) { > + verbose("(%02x) r%d = 0x%x\n", > + insn->code, insn->dst_reg, insn->imm); > + } else { > + verbose("BUG_ld_%02x\n", insn->code); > + return; > + } > + } else if (class == BPF_JMP) { > + u8 opcode = BPF_OP(insn->code); > + > + if (opcode == BPF_CALL) { > + verbose("(%02x) call %d\n", insn->code, insn->imm); > + } else if (insn->code == (BPF_JMP | BPF_JA)) { > + verbose("(%02x) goto pc%+d\n", > + insn->code, insn->off); > + } else if (insn->code == (BPF_JMP | BPF_EXIT)) { > + verbose("(%02x) exit\n", insn->code); > + } else if (BPF_SRC(insn->code) == BPF_X) { > + verbose("(%02x) if r%d %s r%d goto pc%+d\n", > + insn->code, insn->dst_reg, > + bpf_jmp_string[BPF_OP(insn->code) >> 4], > + insn->src_reg, insn->off); > + } else { > + verbose("(%02x) if r%d %s 0x%x goto pc%+d\n", > + insn->code, insn->dst_reg, > + bpf_jmp_string[BPF_OP(insn->code) >> 4], > + insn->imm, insn->off); > + } > + } else { > + verbose("(%02x) %s\n", insn->code, bpf_class_string[class]); > + } > +} > + > int bpf_check(struct bpf_prog *prog, union bpf_attr *attr) > { > + char __user *log_ubuf = NULL; > + struct verifier_env *env; > int ret = -EINVAL; > > + if (prog->len <= 0 || prog->len > BPF_MAXINSNS) > + return -E2BIG; > + > + /* 'struct verifier_env' can be global, but since it's not small, > + * allocate/free it every time bpf_check() is called > + */ > + env = kzalloc(sizeof(struct verifier_env), GFP_KERNEL); > + if (!env) > + return -ENOMEM; > + > + /* grab the mutex to protect few globals used by verifier */ > + mutex_lock(&bpf_verifier_lock); So only because of the verifier error log (which are global vars here) we now have to hold a eBPF-related mutex lock each time when attaching a program? Also, if you really have to do the verifier error log, can't we spare ourself most part of the textifying parts if you would encode the verifier log into a normal structure array with eBPF specific error codes and then do all this pretty printing in user space? Why is that impossible? I really think it's odd. > + if (attr->log_level || attr->log_buf || attr->log_size) { > + /* user requested verbose verifier output > + * and supplied buffer to store the verification trace > + */ > + log_level = attr->log_level; > + log_ubuf = (char __user *) (unsigned long) attr->log_buf; > + log_size = attr->log_size; > + log_len = 0; > + > + ret = -EINVAL; > + /* log_* values have to be sane */ > + if (log_size < 128 || log_size > UINT_MAX >> 8 || > + log_level == 0 || log_ubuf == NULL) > + goto free_env; > + > + ret = -ENOMEM; > + log_buf = vmalloc(log_size); > + if (!log_buf) > + goto free_env; > + } else { > + log_level = 0; > + } > + > + /* ret = do_check(env); */ > + > + if (log_level && log_len >= log_size - 1) { > + BUG_ON(log_len >= log_size); > + /* verifier log exceeded user supplied buffer */ > + ret = -ENOSPC; > + /* fall through to return what was recorded */ > + } > + > + /* copy verifier log back to user space including trailing zero */ > + if (log_level && copy_to_user(log_ubuf, log_buf, log_len + 1) != 0) { > + ret = -EFAULT; > + goto free_log_buf; > + } > + > + > +free_log_buf: > + if (log_level) > + vfree(log_buf); > +free_env: > + kfree(env); > + mutex_unlock(&bpf_verifier_lock); > return ret; > } > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) 2014-09-17 6:51 ` Daniel Borkmann @ 2014-09-17 16:08 ` Alexei Starovoitov 2014-09-17 17:03 ` Daniel Borkmann 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2014-09-17 16:08 UTC (permalink / raw) To: Daniel Borkmann Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On Tue, Sep 16, 2014 at 11:51 PM, Daniel Borkmann <dborkman@redhat.com> wrote: >> >> /* last field in 'union bpf_attr' used by this command */ >> -#define BPF_PROG_LOAD_LAST_FIELD license >> +#define BPF_PROG_LOAD_LAST_FIELD log_buf > > > I was looking to find a use case for this item, but couldn't find anything, > so > this seems to be dead code? See CHECK_ATTR() macro and comment next to it in patch #1 > Was it, so that each time you extend an uapi structure like above that you > would > only access the structure up to BPF_PROG_LOAD_LAST_FIELD? That might not > work for > old binaries using this ABI running on newer kernels where there are > different > expectations of what BPF_PROG_LOAD_LAST_FIELD has been at the time of > compilation. exactly the opposite. CHECK_ATTR() is checking that all fields beyond last for given command are zero, so we can extend bpf_attr with new fields added after last. Transition from patch 4 to patch 7 and the hunk you quoted are demonstrating exactly that. Say, userspace was compiled with bpf_attr as defined in patch 4 and it populated fields all the way till 'license', and kernel is compiled with patch 7. Kernel does: union bpf_attr attr = {}; /* copy attributes from user space, may be less than sizeof(bpf_attr) */ copy_from_user(&attr, uattr, size) so newer fields (all the way till log_buf) stay zero and kernel behavior is the same as it was in patch 4. So older user space works as-is with newer kernel. Another example: say, we want to add another flag to lookup() method added in patch 3, we just add another 'flags' field after 'value' field and adjust: -#define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value +#define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags Older user apps stay binary compatible with newer kernel. Does this explain things? >> + >> + /* grab the mutex to protect few globals used by verifier */ >> + mutex_lock(&bpf_verifier_lock); > > > So only because of the verifier error log (which are global vars here) we > now have to hold a eBPF-related mutex lock each time when attaching a > program? correct. it's done on purpose to simplify verifier code. User app is blocked in bpf syscall until verifier checks the program. Not a big deal. I don't expect a lot of concurrent program loading. If it somehow becomes an issue, when can fix it, but for now I think less lines of verifier code is definitely a better trade off. > Also, if you really have to do the verifier error log, can't we spare > ourself > most part of the textifying parts if you would encode the verifier log into > a > normal structure array with eBPF specific error codes and then do all this > pretty printing in user space? Why is that impossible? I really think it's > odd. I thought I explained this already... verifier log is not at all "an array of specific error codes". verifier is printing the trace and state of what it's seeing while analyzing the program. Very branchy program may generate a trace log several times larger than program size. pretty text is the most convenient way to pass it to user. Theoretically we can come with some obscure log format for this internal verifier state, but what do we get ? only additional complexity. This obscure format will change just as text will change, because verifier will evolve. If you're looking for a way to fix this output into ABI, it's not possible. Verifier will become more advanced in the future and it's trace whether in text or in obscure encoded structs will change. Therefore text is much simpler option. Also consider the learning curve for somebody planning to add new features to verifier. This trace log is a perfect way to understand how verifier is working. Try simple program with multiple branches and see what kind of log is dumped. Another example: To make verifier easier to review, in this patch set I didn't include 'state pruning optimization' patch. That patch will change the trace log, because it changes the way verifier is working. If we had to introduce struct encoding of trace log, it would need to be changed already. So pretty text is the simplest and most convenient way. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) 2014-09-17 16:08 ` Alexei Starovoitov @ 2014-09-17 17:03 ` Daniel Borkmann 2014-09-17 19:17 ` Daniel Borkmann 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2014-09-17 17:03 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On 09/17/2014 06:08 PM, Alexei Starovoitov wrote: > On Tue, Sep 16, 2014 at 11:51 PM, Daniel Borkmann <dborkman@redhat.com> wrote: >>> >>> /* last field in 'union bpf_attr' used by this command */ >>> -#define BPF_PROG_LOAD_LAST_FIELD license >>> +#define BPF_PROG_LOAD_LAST_FIELD log_buf >> >> I was looking to find a use case for this item, but couldn't find anything, >> so >> this seems to be dead code? > > See CHECK_ATTR() macro and comment next to it in patch #1 > >> Was it, so that each time you extend an uapi structure like above that you >> would >> only access the structure up to BPF_PROG_LOAD_LAST_FIELD? That might not >> work for >> old binaries using this ABI running on newer kernels where there are >> different >> expectations of what BPF_PROG_LOAD_LAST_FIELD has been at the time of >> compilation. > > exactly the opposite. > CHECK_ATTR() is checking that all fields beyond last for given > command are zero, so we can extend bpf_attr with new fields > added after last. > Transition from patch 4 to patch 7 and the hunk you quoted are > demonstrating exactly that. Say, userspace was compiled > with bpf_attr as defined in patch 4 and it populated fields all the way > till 'license', and kernel is compiled with patch 7. Kernel does: > union bpf_attr attr = {}; > /* copy attributes from user space, may be less than sizeof(bpf_attr) */ > copy_from_user(&attr, uattr, size) > so newer fields (all the way till log_buf) stay zero and kernel > behavior is the same as it was in patch 4. > So older user space works as-is with newer kernel. Ok, I see. Lets say, since the introduction of this syscall you have added a couple of features and thus extended union bpf_attr where it grew in size over time. You built your shiny new binary on that uapi setting, and later on decide to run it on an older kernel. What will happen is that in your bpf syscall handler you will return with -EINVAL on that kernel right away since the size of union bpf_attr is greater. That would mean over time when new features will get added, applications that want to make sure to run on _all_ kernels where the bpf syscall is available have to make sure to either use the _initial_ version of union bpf_attr in order to not get an -EINVAL, or worse they have to probe though a syscall different versions of union bpf_attr if they wish to make use of a particular feature until they do not get an -EINVAL anymore. I guess that might be problematic for an application developer that wants to ship its application across different distributions usually running different kernels. At least those people might then consider holding a private copy of the _initial_ version of union bpf_attr in their own source code, but it's not pleasant I guess. I know you seem to have the constraint to run on NET-less systems, but netlink could partially resolve that in the sense that it would allow to at least load an eBPF program with initial feature set. Couldn't there be some mechanism to make this interaction more pleasant? E.g. in BPF extensions we can ask the kernel up to what extension it supports and accordingly adapt to it up front. I know it's just a /trivial/ example but have you thought about something on that kind for the syscall? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) 2014-09-17 17:03 ` Daniel Borkmann @ 2014-09-17 19:17 ` Daniel Borkmann [not found] ` <5419DE51.1060904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2014-09-17 19:17 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On 09/17/2014 07:03 PM, Daniel Borkmann wrote: > On 09/17/2014 06:08 PM, Alexei Starovoitov wrote: >> On Tue, Sep 16, 2014 at 11:51 PM, Daniel Borkmann <dborkman@redhat.com> wrote: >>>> >>>> /* last field in 'union bpf_attr' used by this command */ >>>> -#define BPF_PROG_LOAD_LAST_FIELD license >>>> +#define BPF_PROG_LOAD_LAST_FIELD log_buf >>> >>> I was looking to find a use case for this item, but couldn't find anything, >>> so >>> this seems to be dead code? >> >> See CHECK_ATTR() macro and comment next to it in patch #1 >> >>> Was it, so that each time you extend an uapi structure like above that you >>> would >>> only access the structure up to BPF_PROG_LOAD_LAST_FIELD? That might not >>> work for >>> old binaries using this ABI running on newer kernels where there are >>> different >>> expectations of what BPF_PROG_LOAD_LAST_FIELD has been at the time of >>> compilation. >> >> exactly the opposite. >> CHECK_ATTR() is checking that all fields beyond last for given >> command are zero, so we can extend bpf_attr with new fields >> added after last. >> Transition from patch 4 to patch 7 and the hunk you quoted are >> demonstrating exactly that. Say, userspace was compiled >> with bpf_attr as defined in patch 4 and it populated fields all the way >> till 'license', and kernel is compiled with patch 7. Kernel does: >> union bpf_attr attr = {}; >> /* copy attributes from user space, may be less than sizeof(bpf_attr) */ >> copy_from_user(&attr, uattr, size) >> so newer fields (all the way till log_buf) stay zero and kernel >> behavior is the same as it was in patch 4. >> So older user space works as-is with newer kernel. > > Ok, I see. Lets say, since the introduction of this syscall you have > added a couple of features and thus extended union bpf_attr where it > grew in size over time. > > You built your shiny new binary on that uapi setting, and later on > decide to run it on an older kernel. What will happen is that in your > bpf syscall handler you will return with -EINVAL on that kernel right > away since the size of union bpf_attr is greater. > > That would mean over time when new features will get added, applications > that want to make sure to run on _all_ kernels where the bpf syscall is > available have to make sure to either use the _initial_ version of > union bpf_attr in order to not get an -EINVAL, or worse they have > to probe though a syscall different versions of union bpf_attr if they > wish to make use of a particular feature until they do not get an -EINVAL > anymore. > > I guess that might be problematic for an application developer that > wants to ship its application across different distributions usually > running different kernels. At least those people might then consider > holding a private copy of the _initial_ version of union bpf_attr in > their own source code, but it's not pleasant I guess. > > I know you seem to have the constraint to run on NET-less systems, but > netlink could partially resolve that in the sense that it would allow > to at least load an eBPF program with initial feature set. Couldn't > there be some mechanism to make this interaction more pleasant? E.g. > in BPF extensions we can ask the kernel up to what extension it > supports and accordingly adapt to it up front. I know it's just a > /trivial/ example but have you thought about something on that kind for > the syscall? Hm, thinking out loudly ... perhaps this could be made a library problem. Such that the library which wraps the syscall needs to be aware of a marker where the initial version ends, and if the application doesn't make use of any of the new features, it would just pass in the length up to the marker as size attribute into the syscall. Similarly, if new features are always added to the end of a structure and the library truncates the overall-length after the last used member we might have a chance to load something on older kernels, haven't tried that though. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <5419DE51.1060904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <5419DE51.1060904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-09-17 19:37 ` Alexei Starovoitov 2014-09-17 23:45 ` Alexei Starovoitov 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2014-09-17 19:37 UTC (permalink / raw) To: Daniel Borkmann Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On Wed, Sep 17, 2014 at 12:17 PM, Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 09/17/2014 07:03 PM, Daniel Borkmann wrote: >> >> You built your shiny new binary on that uapi setting, and later on >> decide to run it on an older kernel. What will happen is that in your >> bpf syscall handler you will return with -EINVAL on that kernel right >> away since the size of union bpf_attr is greater. that's a correct description of the code in this set. What I mentioned in cover letter during v7 was: "I've decided not to bother with forward compatiblity for now. We can address it later the way perf_event_open did." I was trying not open this can of worms :) >> That would mean over time when new features will get added, applications >> that want to make sure to run on _all_ kernels where the bpf syscall is >> available have to make sure to either use the _initial_ version of >> union bpf_attr in order to not get an -EINVAL, or worse they have >> to probe though a syscall different versions of union bpf_attr if they >> wish to make use of a particular feature until they do not get an -EINVAL >> anymore. Correct as well. I say that would be the least of user space concerns. If user app is built with newer bpf_attr and relies on it for some functionality, it would need a lot more than probing. Depending how big the new bpf_attr feature is, the app might not have a workaround for older kernels at all. So seeing EINVAL on older kernel might be a preferred way. >> in BPF extensions we can ask the kernel up to what extension it >> supports and accordingly adapt to it up front. I know it's just a >> /trivial/ example but have you thought about something on that kind for >> the syscall? A 2nd option would be to do what perf_copy_attr() does: when newer struct is coming from user space, kernel checks that space beyond known last field is all zeros. We can do the same, but I'm not convinced that it will help userspace much. Newer user space can only be _compiled_ with newer bpf_attr. It is still cannot use any new features and it needs to make sure that all new fields are zero. I'm not sure how it helps. perf_event_open() also returns size. In that case it's helpful, since it's one structure. In ebpf case we have multiple structures under one union, so returning size of the whole bpf_attr doesn't help one particular command. > Hm, thinking out loudly ... perhaps this could be made a library problem. > Such that the library which wraps the syscall needs to be aware of a > marker where the initial version ends, and if the application doesn't > make use of any of the new features, it would just pass in the length up > to the marker as size attribute into the syscall. Similarly, if new > features are always added to the end of a structure and the library > truncates the overall-length after the last used member we might have > a chance to load something on older kernels, haven't tried that though. that's a 3rd option. I think it's cleaner than 2nd, since it solves it completely from user space. It can even be smarter than that. If this syscall wrapper library sees that newer features are used and it can workaround them: it can chop size and pass older fields into the older kernel and when it returns, do a workaround based on newer fields. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) 2014-09-17 19:37 ` Alexei Starovoitov @ 2014-09-17 23:45 ` Alexei Starovoitov [not found] ` <CAMEtUuw0aDPpQhd13ssk2PDLffBJNeef9jicOnhQLH6KtJ8gsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2014-09-17 23:45 UTC (permalink / raw) To: Daniel Borkmann Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On Wed, Sep 17, 2014 at 12:37 PM, Alexei Starovoitov <ast@plumgrid.com> wrote: > >> Hm, thinking out loudly ... perhaps this could be made a library problem. >> Such that the library which wraps the syscall needs to be aware of a >> marker where the initial version ends, and if the application doesn't >> make use of any of the new features, it would just pass in the length up >> to the marker as size attribute into the syscall. Similarly, if new >> features are always added to the end of a structure and the library >> truncates the overall-length after the last used member we might have >> a chance to load something on older kernels, haven't tried that though. > > that's a 3rd option. I think it's cleaner than 2nd, since it solves it > completely from user space. > It can even be smarter than that. If this syscall wrapper library > sees that newer features are used and it can workaround them: > it can chop size and pass older fields into the older kernel > and when it returns, do a workaround based on newer fields. the more I think about 'new user space + old kernel' problem, the more certain I am that kernel should not try to help user space, since most of the time it's not going to be enough, but additional code in kernel would need to be maintained. syscall commands and size of bpf_attr is the least of problems. New map_type and prog_type will be added, new helper functions will be available to programs. One would think that md5 of uapi/linux/bpf.h would be enough to say that user app is compatible... In reality, it's not. The 'state pruning' verifier optimization I've talked about will not change a single bit in bpf.h, but it will be able to recognize more programs as safe. A program developed on a new kernel with more advanced verifier will load just fine on new kernel, but this valid program will not load on old kernel, only because verifier is not smart enough. Now we would need a version of verifier exposed all the way to user space?! imo that's too much. I think for eBPF infra kernel should only guarantee backwards compatibility (old user space must work with new kernel) and that's it. That's what this patch is trying to do. Thoughts? ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CAMEtUuw0aDPpQhd13ssk2PDLffBJNeef9jicOnhQLH6KtJ8gsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <CAMEtUuw0aDPpQhd13ssk2PDLffBJNeef9jicOnhQLH6KtJ8gsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-09-18 6:44 ` Daniel Borkmann 2014-09-18 14:34 ` Alexei Starovoitov 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2014-09-18 6:44 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On 09/18/2014 01:45 AM, Alexei Starovoitov wrote: > On Wed, Sep 17, 2014 at 12:37 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote: >> >>> Hm, thinking out loudly ... perhaps this could be made a library problem. >>> Such that the library which wraps the syscall needs to be aware of a >>> marker where the initial version ends, and if the application doesn't >>> make use of any of the new features, it would just pass in the length up >>> to the marker as size attribute into the syscall. Similarly, if new >>> features are always added to the end of a structure and the library >>> truncates the overall-length after the last used member we might have >>> a chance to load something on older kernels, haven't tried that though. >> >> that's a 3rd option. I think it's cleaner than 2nd, since it solves it >> completely from user space. >> It can even be smarter than that. If this syscall wrapper library >> sees that newer features are used and it can workaround them: >> it can chop size and pass older fields into the older kernel >> and when it returns, do a workaround based on newer fields. > > the more I think about 'new user space + old kernel' problem, > the more certain I am that kernel should not try to help > user space, since most of the time it's not going to be enough, > but additional code in kernel would need to be maintained. > > syscall commands and size of bpf_attr is the least of problems. > New map_type and prog_type will be added, new helper > functions will be available to programs. > One would think that md5 of uapi/linux/bpf.h would be > enough to say that user app is compatible... In reality, > it's not. The 'state pruning' verifier optimization I've talked > about will not change a single bit in bpf.h, but it will be > able to recognize more programs as safe. > A program developed on a new kernel with more > advanced verifier will load just fine on new kernel, but > this valid program will not load on old kernel, only because > verifier is not smart enough. Now we would need a version > of verifier exposed all the way to user space?! > imo that's too much. I think for eBPF infra kernel > should only guarantee backwards compatibility > (old user space must work with new kernel) and that's it. > That's what this patch is trying to do. > Thoughts? Sure, you will never get a full compatibility on that regard while backwards compatibility needs to be guaranteed on the other hand. I looked at perf_copy_attr() implementation and I think that we should mimic it in a very similar way as it exactly solves what we need. For example, it will return with -EINVAL for (size > PAGE_SIZE) and (size < PERF_ATTR_SIZE_VER0) where PAGE_SIZE has been chosen as an arbitrary hard upper limit where it is believed that it will never grow beyond that large limit in future. So this is a more loose constraint than what we currently do, that is, -EINVAL on (size > sizeof(attr)) where attr is the currently known size of a specific kernel. That would at least be a start, you won't be able to cover everything though, but it would allow to address the issue raised when running with a basic feature set. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) 2014-09-18 6:44 ` Daniel Borkmann @ 2014-09-18 14:34 ` Alexei Starovoitov 2014-09-18 14:50 ` Daniel Borkmann 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2014-09-18 14:34 UTC (permalink / raw) To: Daniel Borkmann Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On Wed, Sep 17, 2014 at 11:44 PM, Daniel Borkmann <dborkman@redhat.com> wrote: > On 09/18/2014 01:45 AM, Alexei Starovoitov wrote: >> >> On Wed, Sep 17, 2014 at 12:37 PM, Alexei Starovoitov <ast@plumgrid.com> >> wrote: >>> >>> >>>> Hm, thinking out loudly ... perhaps this could be made a library >>>> problem. >>>> Such that the library which wraps the syscall needs to be aware of a >>>> marker where the initial version ends, and if the application doesn't >>>> make use of any of the new features, it would just pass in the length up >>>> to the marker as size attribute into the syscall. Similarly, if new >>>> features are always added to the end of a structure and the library >>>> truncates the overall-length after the last used member we might have >>>> a chance to load something on older kernels, haven't tried that though. >>> >>> >>> that's a 3rd option. I think it's cleaner than 2nd, since it solves it >>> completely from user space. >>> It can even be smarter than that. If this syscall wrapper library >>> sees that newer features are used and it can workaround them: >>> it can chop size and pass older fields into the older kernel >>> and when it returns, do a workaround based on newer fields. >> >> >> the more I think about 'new user space + old kernel' problem, >> the more certain I am that kernel should not try to help >> user space, since most of the time it's not going to be enough, >> but additional code in kernel would need to be maintained. >> >> syscall commands and size of bpf_attr is the least of problems. >> New map_type and prog_type will be added, new helper >> functions will be available to programs. >> One would think that md5 of uapi/linux/bpf.h would be >> enough to say that user app is compatible... In reality, >> it's not. The 'state pruning' verifier optimization I've talked >> about will not change a single bit in bpf.h, but it will be >> able to recognize more programs as safe. >> A program developed on a new kernel with more >> advanced verifier will load just fine on new kernel, but >> this valid program will not load on old kernel, only because >> verifier is not smart enough. Now we would need a version >> of verifier exposed all the way to user space?! >> imo that's too much. I think for eBPF infra kernel >> should only guarantee backwards compatibility >> (old user space must work with new kernel) and that's it. >> That's what this patch is trying to do. >> Thoughts? > > > Sure, you will never get a full compatibility on that regard > while backwards compatibility needs to be guaranteed on the > other hand. I looked at perf_copy_attr() implementation and I > think that we should mimic it in a very similar way as it > exactly solves what we need. > > For example, it will return with -EINVAL for (size > PAGE_SIZE) > and (size < PERF_ATTR_SIZE_VER0) where PAGE_SIZE has been chosen > as an arbitrary hard upper limit where it is believed that it will > never grow beyond that large limit in future. > > So this is a more loose constraint than what we currently do, > that is, -EINVAL on (size > sizeof(attr)) where attr is the > currently known size of a specific kernel. That would at least > be a start, you won't be able to cover everything though, but > it would allow to address the issue raised when running with > a basic feature set. you missed my point. We should not 'do a start', since it doesn't help user space in the long run and only makes kernel more complex. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) 2014-09-18 14:34 ` Alexei Starovoitov @ 2014-09-18 14:50 ` Daniel Borkmann [not found] ` <541AF11C.9050405-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2014-09-18 14:50 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On 09/18/2014 04:34 PM, Alexei Starovoitov wrote: > On Wed, Sep 17, 2014 at 11:44 PM, Daniel Borkmann <dborkman@redhat.com> wrote: ... >> Sure, you will never get a full compatibility on that regard >> while backwards compatibility needs to be guaranteed on the >> other hand. I looked at perf_copy_attr() implementation and I >> think that we should mimic it in a very similar way as it >> exactly solves what we need. >> >> For example, it will return with -EINVAL for (size > PAGE_SIZE) >> and (size < PERF_ATTR_SIZE_VER0) where PAGE_SIZE has been chosen >> as an arbitrary hard upper limit where it is believed that it will >> never grow beyond that large limit in future. >> >> So this is a more loose constraint than what we currently do, >> that is, -EINVAL on (size > sizeof(attr)) where attr is the >> currently known size of a specific kernel. That would at least >> be a start, you won't be able to cover everything though, but >> it would allow to address the issue raised when running with >> a basic feature set. > > you missed my point. We should not 'do a start', since it > doesn't help user space in the long run and only makes > kernel more complex. Sorry, I don't think I missed your point. But if you see things differently, fair enough, it was just a suggestion. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <541AF11C.9050405-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <541AF11C.9050405-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-09-18 15:24 ` Alexei Starovoitov [not found] ` <CAMEtUuykgG70sP0tAmovLvPxgjwaR8h=R-c6Tyo6m96+Lg3sXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2014-09-18 15:24 UTC (permalink / raw) To: Daniel Borkmann Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On Thu, Sep 18, 2014 at 7:50 AM, Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 09/18/2014 04:34 PM, Alexei Starovoitov wrote: >> >> On Wed, Sep 17, 2014 at 11:44 PM, Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> wrote: > > ... >>> >>> Sure, you will never get a full compatibility on that regard >>> while backwards compatibility needs to be guaranteed on the >>> other hand. I looked at perf_copy_attr() implementation and I >>> think that we should mimic it in a very similar way as it >>> exactly solves what we need. >>> >>> For example, it will return with -EINVAL for (size > PAGE_SIZE) >>> and (size < PERF_ATTR_SIZE_VER0) where PAGE_SIZE has been chosen >>> as an arbitrary hard upper limit where it is believed that it will >>> never grow beyond that large limit in future. >>> >>> So this is a more loose constraint than what we currently do, >>> that is, -EINVAL on (size > sizeof(attr)) where attr is the >>> currently known size of a specific kernel. That would at least >>> be a start, you won't be able to cover everything though, but >>> it would allow to address the issue raised when running with >>> a basic feature set. >> >> >> you missed my point. We should not 'do a start', since it >> doesn't help user space in the long run and only makes >> kernel more complex. > > > Sorry, I don't think I missed your point. But if you see things > differently, fair enough, it was just a suggestion. now you probably think I'm shutting you up. Sorry. Was not my intention. Let me rephrase what I meant: I think we should decide right now whether 'new user + old kernel' is really a problem we're going to solve or not. If we decide to solve it, we need to have a plan to solve it all the way. Partial fix for size of bpf_attr is not a plan. It's something that is not addressing the problem completely. Little bit of help is not useful for userspace. It would need to deal with new types, verifier differences and other things that I mentioned earlier. So we either decide that we're going to spend time to solve all of them (not necessarily today, but over long haul) or we're not doing any of it. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CAMEtUuykgG70sP0tAmovLvPxgjwaR8h=R-c6Tyo6m96+Lg3sXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) [not found] ` <CAMEtUuykgG70sP0tAmovLvPxgjwaR8h=R-c6Tyo6m96+Lg3sXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-09-18 17:28 ` Daniel Borkmann 0 siblings, 0 replies; 14+ messages in thread From: Daniel Borkmann @ 2014-09-18 17:28 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski, Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet, Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton, Kees Cook, Linux API, Network Development, LKML On 09/18/2014 05:24 PM, Alexei Starovoitov wrote: ... > solve or not. If we decide to solve it, we need to have > a plan to solve it all the way. Partial fix for size of bpf_attr > is not a plan. It's something that is not addressing the problem > completely. Little bit of help is not useful for userspace. It > would need to deal with new types, verifier differences and > other things that I mentioned earlier. Hm, I don't think it would be a strict requirement to solve it all the way, and I think that perf_event_open() with perf_copy_attr() is not trying to do so either. It, however, is trying on a ``best effort basis'' to still load something if new features are unused by the binary (I guess you saw the comment in perf_copy_attr()). Iff, e.g. due to new types we fail at the verifier stage, sure, that's life since we only have backwards-compatible guarantee, but in case we tried to use features we support, we're still able to load the eBPF program while right now, we're rejecting it right up-front. That's just my $0.02 ... ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-09-19 21:38 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-19 21:04 [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) Alexei Starovoitov
[not found] ` <CAMEtUux071cELLdoWs21WL0dqgdwj+O=P64aeXSfyUtFW9U69w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-19 21:38 ` David Miller
-- strict thread matches above, loose matches on Subject: below --
2014-09-17 0:39 [PATCH v13 net-next 00/11] eBPF syscall, verifier, testsuite Alexei Starovoitov
[not found] ` <1410914370-29883-1-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
2014-09-17 0:39 ` [PATCH v13 net-next 07/11] bpf: verifier (add ability to receive verification log) Alexei Starovoitov
[not found] ` <1410914370-29883-8-git-send-email-ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
2014-09-17 6:51 ` Daniel Borkmann
2014-09-17 16:08 ` Alexei Starovoitov
2014-09-17 17:03 ` Daniel Borkmann
2014-09-17 19:17 ` Daniel Borkmann
[not found] ` <5419DE51.1060904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-17 19:37 ` Alexei Starovoitov
2014-09-17 23:45 ` Alexei Starovoitov
[not found] ` <CAMEtUuw0aDPpQhd13ssk2PDLffBJNeef9jicOnhQLH6KtJ8gsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-18 6:44 ` Daniel Borkmann
2014-09-18 14:34 ` Alexei Starovoitov
2014-09-18 14:50 ` Daniel Borkmann
[not found] ` <541AF11C.9050405-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-18 15:24 ` Alexei Starovoitov
[not found] ` <CAMEtUuykgG70sP0tAmovLvPxgjwaR8h=R-c6Tyo6m96+Lg3sXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-18 17:28 ` Daniel Borkmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).