From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnaldo Carvalho de Melo Subject: Re: Help with the BPF verifier Date: Fri, 2 Nov 2018 12:02:39 -0300 Message-ID: <20181102150239.GG20495@kernel.org> References: <20181101185217.GA20495@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Yonghong Song , Daniel Borkmann , Jiri Olsa , Martin Lau , Alexei Starovoitov , Linux Networking Development Mailing List To: Edward Cree Return-path: Received: from mail.kernel.org ([198.145.29.99]:33968 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726557AbeKCAKI (ORCPT ); Fri, 2 Nov 2018 20:10:08 -0400 Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Em Thu, Nov 01, 2018 at 08:05:07PM +0000, Edward Cree escreveu: > On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote: > > R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 > > 15: (b7) r2 = 0 > > 16: (63) *(u32 *)(r10 -260) = r2 > > 17: (67) r1 <<= 32 > > 18: (77) r1 >>= 32 > > 19: (67) r1 <<= 3 > > 20: (bf) r2 = r6 > > 21: (0f) r2 += r1 > > 22: (79) r3 = *(u64 *)(r2 +16) > > R2 invalid mem access 'inv' > I wonder if you could run this with verifier log level 2?  (I'm not sure how >  you would go about plumbing that through the perf tooling.)  It seems very >  odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown >  during the shifts or whether the addition in insn 21 somehow produces the >  unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and >  then also having an offset in the LDX produced an error about >  ptr+const+const, but that seems to have been fixed at some point.) > > Note however that even if we get past this, R1 at this point holds 6, so it >  looks like the verifier is walking the impossible path where we're inside the >  'if' even though filename_arg = 6.  This is a (slightly annoying) verifier >  limitation, that it walks paths with impossible combinations of constraints >  (we've previously had cases where assertions in the verifier would blow up >  because of this, e.g. registers with max_val less than min_val).  So if the >  check_ctx_access() is going to worry about whether you're off the end of the >  array (I'm not sure what your program type is and thus which is_valid_access >  callback is involved), then it'll complain about this. > If filename_arg came from some external source you'd have a different >  problem, because then it would have a totally unknown value, that on entering >  the 'if' becomes "unknown but < 6", which is still too variable to have as >  the offset of a ctx access.  Those have to be at a known constant offset, so >  that we can determine the type of the returned value. > > As a way to fix this, how about [UNTESTED!]: >     const void *filename_arg = NULL; >     /* ... */ >     switch (augmented_args.args.syscall_nr) { >         case SYS_OPEN: filename_arg = args->args[0]; break; >         case SYS_OPENAT: filename_arg = args->args[1]; break; >     } >     /* ... */ >     if (filename_arg) { >         /* stuff */ >         blah = probe_read_str(/* ... */, filename_arg); >     } else { >         /* the other stuff */ >     } > That way, you're only ever dealing in constant pointers (although judging by >  an old thread I found[1] about ptr+const+const, the compiler might decide to >  make some optimisations that end up looking like your existing code). Yeah, didn't work as well: SEC("raw_syscalls:sys_enter") int sys_enter(struct syscall_enter_args *args) { struct { struct syscall_enter_args args; struct augmented_filename filename; } augmented_args; unsigned int len = sizeof(augmented_args); const void *filename_arg = NULL; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); switch (augmented_args.args.syscall_nr) { case SYS_OPEN: filename_arg = (const void *)args->args[0]; break; case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break; } if (filename_arg != NULL) { augmented_args.filename.reserved = 0; augmented_args.filename.size = probe_read_str(&augmented_args.filename.value, sizeof(augmented_args.filename.value), filename_arg); if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) { len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size; len &= sizeof(augmented_args.filename.value) - 1; } } else { len = sizeof(augmented_args.args); } perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len); return 0; } And the -vv in 'perf trace' didn't seem to map to further details in the output of the verifier debug: # trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1 bpf: builtin compilation failed: -95, try external compiler Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build unset env: KBUILD_OPTS include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: NR_CPUS=4 set env: LINUX_VERSION_CODE=0x41300 set env: CLANG_EXEC=/usr/local/bin/clang unset env: CLANG_OPTIONS set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O2 -o - libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3 libbpf: skip section(1) .strtab libbpf: section(2) .text, size 0, link 0, flags 6, type=1 libbpf: skip section(2) .text libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1 libbpf: found program raw_syscalls:sys_enter libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9 libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1 libbpf: found program raw_syscalls:sys_exit libbpf: section(6) maps, size 56, link 0, flags 3, type=1 libbpf: section(7) license, size 4, link 0, flags 3, type=1 libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL libbpf: section(8) version, size 4, link 0, flags 3, type=1 libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300 libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115 libbpf: skip section(9) .llvm_addrsig libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2 libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes libbpf: map 0 is "__augmented_syscalls__" libbpf: map 1 is "__bpf_stdout__" libbpf: collecting relocating info for: 'raw_syscalls:sys_enter' libbpf: relo for 4 value 28 name 124 libbpf: relocation: insn_idx=35 libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35 Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000006000-fffffe0000007000 Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000032000-fffffe0000033000 Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000005e000-fffffe000005f000 Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000008a000-fffffe000008b000 bpf: config program 'raw_syscalls:sys_enter' bpf: config program 'raw_syscalls:sys_exit' libbpf: create map __bpf_stdout__: fd=3 libbpf: create map __augmented_syscalls__: fd=4 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: 0: (bf) r6 = r1 1: (bf) r1 = r10 2: (07) r1 += -328 3: (b7) r7 = 64 4: (b7) r2 = 64 5: (bf) r3 = r6 6: (85) call bpf_probe_read#4 7: (79) r1 = *(u64 *)(r10 -320) 8: (15) if r1 == 0x101 goto pc+4 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 9: (55) if r1 != 0x2 goto pc+22 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 10: (bf) r1 = r6 11: (07) r1 += 16 12: (05) goto pc+2 15: (79) r3 = *(u64 *)(r1 +0) dereference of modified ctx ptr R1 off=16 disallowed libbpf: -- END LOG -- libbpf: failed to load program 'raw_syscalls:sys_enter' libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading) event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c' \___ Kernel verifier blocks program loading (add -v to see detail) Run 'perf list' for a list of valid events Usage: perf trace [] [] or: perf trace [] -- [] or: perf trace record [] [] or: perf trace record [] -- [] -e, --event event/syscall selector. use 'perf list' to list available events [root@seventh perf]# I'll check how to plumb that, but its a holiday down here in Brazil, kids at home... > As for what you want to do with the index coming from userspace, the verifier >  will not like that at all, as mentioned above, so I think you'll need to do >  something like: >     switch (filename_arg_from_userspace) { >         case 0: filename_arg = args->args[0]; break; >         case 1: filename_arg = args->args[1]; break; >         /* etc */ >         default: filename_arg = NULL; >     } >  thus ensuring that you only ever have ctx pointers with constant offsets. > > -Ed > > [1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302