* Question: CO-RE-enabled PT_REGS macros give strange results @ 2023-07-24 10:32 Timofei Pushkin 2023-07-24 12:36 ` Alan Maguire 0 siblings, 1 reply; 11+ messages in thread From: Timofei Pushkin @ 2023-07-24 10:32 UTC (permalink / raw) To: bpf Dear BPF community, I'm developing a perf_event BPF program which reads some register values (frame and instruction pointers in particular) from the context provided to it. I found that CO-RE-enabled PT_REGS macros give results different from the results of the usual PT_REGS macros. I run the program on the same system I compiled it on, and so I cannot understand why the results differ and which ones should I use? From my tests, the results of the usual macros are the correct ones (e.g. I can symbolize the instruction pointers I get this way), but since I try to follow the CO-RE principle, it seems like I should be using the CO-RE-enabled variants instead. I did some experiments and found out that it is the bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that change the results and not __builtin_preserve_access_index. But I still don't get why exactly it changes the results. Thank you in advance, Timofei ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-24 10:32 Question: CO-RE-enabled PT_REGS macros give strange results Timofei Pushkin @ 2023-07-24 12:36 ` Alan Maguire 2023-07-24 15:04 ` Timofei Pushkin 0 siblings, 1 reply; 11+ messages in thread From: Alan Maguire @ 2023-07-24 12:36 UTC (permalink / raw) To: Timofei Pushkin, bpf On 24/07/2023 11:32, Timofei Pushkin wrote: > Dear BPF community, > > I'm developing a perf_event BPF program which reads some register > values (frame and instruction pointers in particular) from the context > provided to it. I found that CO-RE-enabled PT_REGS macros give results > different from the results of the usual PT_REGS macros. I run the > program on the same system I compiled it on, and so I cannot > understand why the results differ and which ones should I use? > > From my tests, the results of the usual macros are the correct ones > (e.g. I can symbolize the instruction pointers I get this way), but > since I try to follow the CO-RE principle, it seems like I should be > using the CO-RE-enabled variants instead. > > I did some experiments and found out that it is the > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that > change the results and not __builtin_preserve_access_index. But I > still don't get why exactly it changes the results. > Can you provide the exact usage of the BPF CO-RE macros that isn't working, and the equivalent non-CO-RE version that is? Also if you can provide details on the platform you're running on that will help narrow down the issue. Thanks! Alan > Thank you in advance, > Timofei > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-24 12:36 ` Alan Maguire @ 2023-07-24 15:04 ` Timofei Pushkin 2023-07-24 23:00 ` Alan Maguire 0 siblings, 1 reply; 11+ messages in thread From: Timofei Pushkin @ 2023-07-24 15:04 UTC (permalink / raw) To: Alan Maguire; +Cc: bpf On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote: > > On 24/07/2023 11:32, Timofei Pushkin wrote: > > Dear BPF community, > > > > I'm developing a perf_event BPF program which reads some register > > values (frame and instruction pointers in particular) from the context > > provided to it. I found that CO-RE-enabled PT_REGS macros give results > > different from the results of the usual PT_REGS macros. I run the > > program on the same system I compiled it on, and so I cannot > > understand why the results differ and which ones should I use? > > > > From my tests, the results of the usual macros are the correct ones > > (e.g. I can symbolize the instruction pointers I get this way), but > > since I try to follow the CO-RE principle, it seems like I should be > > using the CO-RE-enabled variants instead. > > > > I did some experiments and found out that it is the > > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that > > change the results and not __builtin_preserve_access_index. But I > > still don't get why exactly it changes the results. > > > > Can you provide the exact usage of the BPF CO-RE macros that isn't > working, and the equivalent non-CO-RE version that is? Also if you As a minimal example, I wrote the following little BPF program which prints instruction pointers obtained with non-CO-RE and CO-RE macros: volatile const pid_t target_pid; SEC("perf_event") int do_test(struct bpf_perf_event_data *ctx) { pid_t pid = bpf_get_current_pid_tgid(); if (pid != target_pid) return 0; unsigned long p = PT_REGS_IP(&ctx->regs); unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); return 0; } From user space, I set the target PID and attach the program to CPU clock perf events (error checking and cleanup omitted for brevity): int main(int argc, char *argv[]) { // Load the program also setting the target PID struct test_program_bpf *skel = test_program_bpf__open(); skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10); test_program_bpf__load(skel); // Attach to perf events struct perf_event_attr attr = { .type = PERF_TYPE_SOFTWARE, .size = sizeof(struct perf_event_attr), .config = PERF_COUNT_SW_CPU_CLOCK, .sample_freq = 1, .freq = true }; for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) { int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0); bpf_program__attach_perf_event(skel->progs.do_test, perf_fd); } // Wait for Ctrl-C pause(); return 0; } As an experiment, I launched a simple C program with an endless loop in main and started the BPF program above with its target PID set to the PID of this simple C program. Then by checking the virtual memory mapped for the C program (with "cat /proc/<PID>/maps"), I found out that its .text section got mapped into 55ca2577b000-55ca2577c000 address space. When I checked the output of the BPF program, I got "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see, the non-CO-RE result maps into the .text section of the launched C program (as it should since this is the value of the instruction pointer), while the CO-RE result does not. Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I get results that correspond to the stack address space from the non-CO-RE macro, but I always get 0 from the CO-RE macro. > can provide details on the platform you're running on that will > help narrow down the issue. Thanks! Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic, the architecture is x86_64, clang 14.0.0 is used to compile BPF programs with flags -g -O2 -D__TARGET_ARCH_x86. Thanks, Timofei > > Alan > > > Thank you in advance, > > Timofei > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-24 15:04 ` Timofei Pushkin @ 2023-07-24 23:00 ` Alan Maguire 2023-07-25 14:04 ` Alan Maguire 0 siblings, 1 reply; 11+ messages in thread From: Alan Maguire @ 2023-07-24 23:00 UTC (permalink / raw) To: Timofei Pushkin; +Cc: bpf On 24/07/2023 16:04, Timofei Pushkin wrote: > On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote: >> >> On 24/07/2023 11:32, Timofei Pushkin wrote: >>> Dear BPF community, >>> >>> I'm developing a perf_event BPF program which reads some register >>> values (frame and instruction pointers in particular) from the context >>> provided to it. I found that CO-RE-enabled PT_REGS macros give results >>> different from the results of the usual PT_REGS macros. I run the >>> program on the same system I compiled it on, and so I cannot >>> understand why the results differ and which ones should I use? >>> >>> From my tests, the results of the usual macros are the correct ones >>> (e.g. I can symbolize the instruction pointers I get this way), but >>> since I try to follow the CO-RE principle, it seems like I should be >>> using the CO-RE-enabled variants instead. >>> >>> I did some experiments and found out that it is the >>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that >>> change the results and not __builtin_preserve_access_index. But I >>> still don't get why exactly it changes the results. >>> >> >> Can you provide the exact usage of the BPF CO-RE macros that isn't >> working, and the equivalent non-CO-RE version that is? Also if you > > As a minimal example, I wrote the following little BPF program which > prints instruction pointers obtained with non-CO-RE and CO-RE macros: > > volatile const pid_t target_pid; > > SEC("perf_event") > int do_test(struct bpf_perf_event_data *ctx) { > pid_t pid = bpf_get_current_pid_tgid(); > if (pid != target_pid) return 0; > > unsigned long p = PT_REGS_IP(&ctx->regs); > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); > > return 0; > } > > From user space, I set the target PID and attach the program to CPU > clock perf events (error checking and cleanup omitted for brevity): > > int main(int argc, char *argv[]) { > // Load the program also setting the target PID > struct test_program_bpf *skel = test_program_bpf__open(); > skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10); > test_program_bpf__load(skel); > > // Attach to perf events > struct perf_event_attr attr = { > .type = PERF_TYPE_SOFTWARE, > .size = sizeof(struct perf_event_attr), > .config = PERF_COUNT_SW_CPU_CLOCK, > .sample_freq = 1, > .freq = true > }; > for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) { > int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0); > bpf_program__attach_perf_event(skel->progs.do_test, perf_fd); > } > > // Wait for Ctrl-C > pause(); > return 0; > } > > As an experiment, I launched a simple C program with an endless loop > in main and started the BPF program above with its target PID set to > the PID of this simple C program. Then by checking the virtual memory > mapped for the C program (with "cat /proc/<PID>/maps"), I found out > that its .text section got mapped into 55ca2577b000-55ca2577c000 > address space. When I checked the output of the BPF program, I got > "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see, > the non-CO-RE result maps into the .text section of the launched C > program (as it should since this is the value of the instruction > pointer), while the CO-RE result does not. > > Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the > equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I > get results that correspond to the stack address space from the > non-CO-RE macro, but I always get 0 from the CO-RE macro. > >> can provide details on the platform you're running on that will >> help narrow down the issue. Thanks! > > Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic, > the architecture is x86_64, clang 14.0.0 is used to compile BPF > programs with flags -g -O2 -D__TARGET_ARCH_x86. > Thanks for the additional details! I've reproduced this on bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE macros, and with BPF_CORE_READ(). However with extra libbpf debugging I do see that we pick up the right type id/index for the ip field in pt_regs: libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216] struct pt_regs.ip (0:16 @ offset 128) One thing I noticed - perhaps this will ring some bells for someone - if I use __builtin_preserve_access_index() I get the same (correct) value for ip as is retrieved with PT_REGS_IP(): __builtin_preserve_access_index(({ p_core = ctx->regs.ip; })); I'll check with latest LLVM to see if the issue persists there. Alan > Thanks, > Timofei > >> >> Alan >> >>> Thank you in advance, >>> Timofei >>> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-24 23:00 ` Alan Maguire @ 2023-07-25 14:04 ` Alan Maguire 2023-07-26 0:03 ` Eduard Zingerman 0 siblings, 1 reply; 11+ messages in thread From: Alan Maguire @ 2023-07-25 14:04 UTC (permalink / raw) To: Timofei Pushkin; +Cc: bpf On 25/07/2023 00:00, Alan Maguire wrote: > On 24/07/2023 16:04, Timofei Pushkin wrote: >> On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote: >>> >>> On 24/07/2023 11:32, Timofei Pushkin wrote: >>>> Dear BPF community, >>>> >>>> I'm developing a perf_event BPF program which reads some register >>>> values (frame and instruction pointers in particular) from the context >>>> provided to it. I found that CO-RE-enabled PT_REGS macros give results >>>> different from the results of the usual PT_REGS macros. I run the >>>> program on the same system I compiled it on, and so I cannot >>>> understand why the results differ and which ones should I use? >>>> >>>> From my tests, the results of the usual macros are the correct ones >>>> (e.g. I can symbolize the instruction pointers I get this way), but >>>> since I try to follow the CO-RE principle, it seems like I should be >>>> using the CO-RE-enabled variants instead. >>>> >>>> I did some experiments and found out that it is the >>>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that >>>> change the results and not __builtin_preserve_access_index. But I >>>> still don't get why exactly it changes the results. >>>> >>> >>> Can you provide the exact usage of the BPF CO-RE macros that isn't >>> working, and the equivalent non-CO-RE version that is? Also if you >> >> As a minimal example, I wrote the following little BPF program which >> prints instruction pointers obtained with non-CO-RE and CO-RE macros: >> >> volatile const pid_t target_pid; >> >> SEC("perf_event") >> int do_test(struct bpf_perf_event_data *ctx) { >> pid_t pid = bpf_get_current_pid_tgid(); >> if (pid != target_pid) return 0; >> >> unsigned long p = PT_REGS_IP(&ctx->regs); >> unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); >> bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); >> >> return 0; >> } >> >> From user space, I set the target PID and attach the program to CPU >> clock perf events (error checking and cleanup omitted for brevity): >> >> int main(int argc, char *argv[]) { >> // Load the program also setting the target PID >> struct test_program_bpf *skel = test_program_bpf__open(); >> skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10); >> test_program_bpf__load(skel); >> >> // Attach to perf events >> struct perf_event_attr attr = { >> .type = PERF_TYPE_SOFTWARE, >> .size = sizeof(struct perf_event_attr), >> .config = PERF_COUNT_SW_CPU_CLOCK, >> .sample_freq = 1, >> .freq = true >> }; >> for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) { >> int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0); >> bpf_program__attach_perf_event(skel->progs.do_test, perf_fd); >> } >> >> // Wait for Ctrl-C >> pause(); >> return 0; >> } >> >> As an experiment, I launched a simple C program with an endless loop >> in main and started the BPF program above with its target PID set to >> the PID of this simple C program. Then by checking the virtual memory >> mapped for the C program (with "cat /proc/<PID>/maps"), I found out >> that its .text section got mapped into 55ca2577b000-55ca2577c000 >> address space. When I checked the output of the BPF program, I got >> "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see, >> the non-CO-RE result maps into the .text section of the launched C >> program (as it should since this is the value of the instruction >> pointer), while the CO-RE result does not. >> >> Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the >> equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I >> get results that correspond to the stack address space from the >> non-CO-RE macro, but I always get 0 from the CO-RE macro. >> >>> can provide details on the platform you're running on that will >>> help narrow down the issue. Thanks! >> >> Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic, >> the architecture is x86_64, clang 14.0.0 is used to compile BPF >> programs with flags -g -O2 -D__TARGET_ARCH_x86. >> > > Thanks for the additional details! I've reproduced this on > bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE > macros, and with BPF_CORE_READ(). However with extra libbpf debugging > I do see that we pick up the right type id/index for the ip field in > pt_regs: > > libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216] > struct pt_regs.ip (0:16 @ offset 128) > > One thing I noticed - perhaps this will ring some bells for someone - > if I use __builtin_preserve_access_index() I get the same (correct) > value for ip as is retrieved with PT_REGS_IP(): > > __builtin_preserve_access_index(({ > p_core = ctx->regs.ip; > })); > > I'll check with latest LLVM to see if the issue persists there. > The problem occurs with latest bpf-next + latest LLVM too. Perf event programs fix up context accesses to the "struct bpf_perf_event_data *" context, so accessing ctx->regs in your program becomes accessing the "struct bpf_perf_event_data_kern *" regs, which is a pointer to struct pt_regs. So I _think_ that's why the __builtin_preserve_access_index(({ p_core = ctx->regs.ip; })); ...works; ctx->regs is fixed up to point at the right place, then CO-RE does its thing with the results. Contrast this with bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip); In the latter case, the fixups don't seem to happen and we get a bogus address which appears to be consistently 218 bytes after the ctx pointer. I've confirmed that a basic bpf_probe_read_kernel() exposes the issue (and gives the same wrong address as a CO-RE-wrapped bpf_probe_read_kernel()). I tried some permutations like defining struct pt_regs *regs = &ctx->regs; ...to see if that helps, but I think in that case the accesses aren't caught by the verifier because we use the & operator on the ctx->regs. Not sure how smart the verifier can be about context accesses like this; can someone who understands that code better than me take a look at this? In the meantime the workaround described above should do the trick. Thanks! Alan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-25 14:04 ` Alan Maguire @ 2023-07-26 0:03 ` Eduard Zingerman 2023-07-26 13:46 ` Alan Maguire 0 siblings, 1 reply; 11+ messages in thread From: Eduard Zingerman @ 2023-07-26 0:03 UTC (permalink / raw) To: Alan Maguire, Timofei Pushkin; +Cc: bpf On Tue, 2023-07-25 at 15:04 +0100, Alan Maguire wrote: > On 25/07/2023 00:00, Alan Maguire wrote: > > On 24/07/2023 16:04, Timofei Pushkin wrote: > > > On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > > > > > On 24/07/2023 11:32, Timofei Pushkin wrote: > > > > > Dear BPF community, > > > > > > > > > > I'm developing a perf_event BPF program which reads some register > > > > > values (frame and instruction pointers in particular) from the context > > > > > provided to it. I found that CO-RE-enabled PT_REGS macros give results > > > > > different from the results of the usual PT_REGS macros. I run the > > > > > program on the same system I compiled it on, and so I cannot > > > > > understand why the results differ and which ones should I use? > > > > > > > > > > From my tests, the results of the usual macros are the correct ones > > > > > (e.g. I can symbolize the instruction pointers I get this way), but > > > > > since I try to follow the CO-RE principle, it seems like I should be > > > > > using the CO-RE-enabled variants instead. > > > > > > > > > > I did some experiments and found out that it is the > > > > > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that > > > > > change the results and not __builtin_preserve_access_index. But I > > > > > still don't get why exactly it changes the results. > > > > > > > > > > > > > Can you provide the exact usage of the BPF CO-RE macros that isn't > > > > working, and the equivalent non-CO-RE version that is? Also if you > > > > > > As a minimal example, I wrote the following little BPF program which > > > prints instruction pointers obtained with non-CO-RE and CO-RE macros: > > > > > > volatile const pid_t target_pid; > > > > > > SEC("perf_event") > > > int do_test(struct bpf_perf_event_data *ctx) { > > > pid_t pid = bpf_get_current_pid_tgid(); > > > if (pid != target_pid) return 0; > > > > > > unsigned long p = PT_REGS_IP(&ctx->regs); > > > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); > > > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); > > > > > > return 0; > > > } > > > > > > From user space, I set the target PID and attach the program to CPU > > > clock perf events (error checking and cleanup omitted for brevity): > > > > > > int main(int argc, char *argv[]) { > > > // Load the program also setting the target PID > > > struct test_program_bpf *skel = test_program_bpf__open(); > > > skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10); > > > test_program_bpf__load(skel); > > > > > > // Attach to perf events > > > struct perf_event_attr attr = { > > > .type = PERF_TYPE_SOFTWARE, > > > .size = sizeof(struct perf_event_attr), > > > .config = PERF_COUNT_SW_CPU_CLOCK, > > > .sample_freq = 1, > > > .freq = true > > > }; > > > for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) { > > > int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0); > > > bpf_program__attach_perf_event(skel->progs.do_test, perf_fd); > > > } > > > > > > // Wait for Ctrl-C > > > pause(); > > > return 0; > > > } > > > > > > As an experiment, I launched a simple C program with an endless loop > > > in main and started the BPF program above with its target PID set to > > > the PID of this simple C program. Then by checking the virtual memory > > > mapped for the C program (with "cat /proc/<PID>/maps"), I found out > > > that its .text section got mapped into 55ca2577b000-55ca2577c000 > > > address space. When I checked the output of the BPF program, I got > > > "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see, > > > the non-CO-RE result maps into the .text section of the launched C > > > program (as it should since this is the value of the instruction > > > pointer), while the CO-RE result does not. > > > > > > Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the > > > equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I > > > get results that correspond to the stack address space from the > > > non-CO-RE macro, but I always get 0 from the CO-RE macro. > > > > > > > can provide details on the platform you're running on that will > > > > help narrow down the issue. Thanks! > > > > > > Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic, > > > the architecture is x86_64, clang 14.0.0 is used to compile BPF > > > programs with flags -g -O2 -D__TARGET_ARCH_x86. > > > > > > > Thanks for the additional details! I've reproduced this on > > bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE > > macros, and with BPF_CORE_READ(). However with extra libbpf debugging > > I do see that we pick up the right type id/index for the ip field in > > pt_regs: > > > > libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216] > > struct pt_regs.ip (0:16 @ offset 128) > > > > One thing I noticed - perhaps this will ring some bells for someone - > > if I use __builtin_preserve_access_index() I get the same (correct) > > value for ip as is retrieved with PT_REGS_IP(): > > > > __builtin_preserve_access_index(({ > > p_core = ctx->regs.ip; > > })); > > > > I'll check with latest LLVM to see if the issue persists there. > > > > > The problem occurs with latest bpf-next + latest LLVM too. Perf event > programs fix up context accesses to the "struct bpf_perf_event_data *" > context, so accessing ctx->regs in your program becomes accessing the > "struct bpf_perf_event_data_kern *" regs, which is a pointer to > struct pt_regs. So I _think_ that's why the > > __builtin_preserve_access_index(({ > p_core = ctx->regs.ip; > })); > > > ...works; ctx->regs is fixed up to point at the right place, then > CO-RE does its thing with the results. Contrast this with > > bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip); > > In the latter case, the fixups don't seem to happen and we get a > bogus address which appears to be consistently 218 bytes after the ctx > pointer. I've confirmed that a basic bpf_probe_read_kernel() > exposes the issue (and gives the same wrong address as a CO-RE-wrapped > bpf_probe_read_kernel()). > > I tried some permutations like defining > > struct pt_regs *regs = &ctx->regs; > > ...to see if that helps, but I think in that case the accesses aren't > caught by the verifier because we use the & operator on the ctx->regs. > > Not sure how smart the verifier can be about context accesses like this; > can someone who understands that code better than me take a look at this? Hi Alan, Your analysis is correct: verifier applies rewrites to instructions that read/write from/to certain context fields, including `struct bpf_perf_event_data`. This is done by function verifier.c:convert_ctx_accesses(). This function handles BPF_LDX, BPF_STX and BPF_ST instructions, but it does not handle calls to helpers like bpf_probe_read_kernel(). So, when code generated for PT_REGS_IP(&ctx->regs) is processed, the correct access sequence is inserted by function bpf_trace.c:pe_prog_convert_ctx_access() (see below). But code generated for `PT_REGS_IP_CORE(&ctx->regs)` is not modified. It looks like `PT_REGS_IP_CORE` macro should not be defined through bpf_probe_read_kernel(). I'll dig through commit history tomorrow to understand why is it defined like that now. Thanks, Eduard --- Below is annotated example, inpatient reader might skip it For the following test program: #include "vmlinux.h" ... SEC("perf_event") int do_test(struct bpf_perf_event_data *ctx) { unsigned long p = PT_REGS_IP(&ctx->regs); unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); return 0; } Generated BPF code looks as follows: $ llvm-objdump --no-show-raw-insn -rd bpf.linked.o ... 0000000000000000 <do_test>: # Third argument for bpf_probe_read_kernel: offset of bpf_perf_event_data::regs.ip 0: r2 = 0x80 0000000000000000: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16) 1: r3 = r1 2: r3 += r2 # The "non CO-RE" version of PT_REGS_IP is, in fact, CO-RE # because `struct bpf_perf_event_data` has preserve_access_index # tag in the vmlinux.h. # Here the regs.ip is stored in r6 to be used after the call # to bpf_probe_read_kernel() (from PT_REGS_IP_CORE). 3: r6 = *(u64 *)(r1 + 0x80) 0000000000000018: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16) # First argument for bpf_probe_read_kernel: a place on stack to put read result to. 4: r1 = r10 5: r1 += -0x8 # Second argument for bpf_probe_read_kernel: the size of the field to read. 6: w2 = 0x8 # Call to bpf_probe_read_kernel() 7: call 0x71 # Fourth parameter of bpf_printk: p_core read from stack # (was written by call to bpf_probe_read_kernel) 8: r4 = *(u64 *)(r10 - 0x8) # First parameter of bpf_printk: control string 9: r1 = 0x0 ll 0000000000000048: R_BPF_64_64 .rodata # Second parameter of bpf_printk: size of the control string 11: w2 = 0x1b # Third parameter of bpf_printk: p (see addr 3) 12: r3 = r6 # Call to bpf_printk 13: call 0x6 ; return 0; 14: w0 = 0x0 15: exit I get the following BPF after all verifier rewrites are applied (including verifier.c:convert_ctx_accesses()): # ./tools/bpf/bpftool/bpftool prog dump xlated id 114 int do_test(struct bpf_perf_event_data * ctx): ; int do_test(struct bpf_perf_event_data *ctx) { 0: (b7) r2 = 128 | CO-RE replacement, 128 is a valid offset for | bpf_perf_event_data::regs.ip in my kernel 1: (bf) r3 = r1 2: (0f) r3 += r2 3: (79) r6 = *(u64 *)(r1 +0) | This is an expantion of the 4: (79) r6 = *(u64 *)(r6 +128) | r6 = *(u64 *)(r1 + 0x80) 5: (bf) r1 = r10 | Created by bpf_trace.c:pe_prog_convert_ctx_access() 6: (07) r1 += -8 7: (b4) w2 = 8 8: (85) call bpf_probe_read_kernel#-91984 9: (79) r4 = *(u64 *)(r10 -8) 10: (18) r1 = map[id:59][0]+0 12: (b4) w2 = 27 13: (bf) r3 = r6 14: (85) call bpf_trace_printk#-85520 15: (b4) w0 = 0 16: (95) exit ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-26 0:03 ` Eduard Zingerman @ 2023-07-26 13:46 ` Alan Maguire 2023-07-26 20:03 ` Eduard Zingerman 0 siblings, 1 reply; 11+ messages in thread From: Alan Maguire @ 2023-07-26 13:46 UTC (permalink / raw) To: Eduard Zingerman, Timofei Pushkin; +Cc: bpf On 26/07/2023 01:03, Eduard Zingerman wrote: > On Tue, 2023-07-25 at 15:04 +0100, Alan Maguire wrote: >> On 25/07/2023 00:00, Alan Maguire wrote: >>> On 24/07/2023 16:04, Timofei Pushkin wrote: >>>> On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote: >>>>> >>>>> On 24/07/2023 11:32, Timofei Pushkin wrote: >>>>>> Dear BPF community, >>>>>> >>>>>> I'm developing a perf_event BPF program which reads some register >>>>>> values (frame and instruction pointers in particular) from the context >>>>>> provided to it. I found that CO-RE-enabled PT_REGS macros give results >>>>>> different from the results of the usual PT_REGS macros. I run the >>>>>> program on the same system I compiled it on, and so I cannot >>>>>> understand why the results differ and which ones should I use? >>>>>> >>>>>> From my tests, the results of the usual macros are the correct ones >>>>>> (e.g. I can symbolize the instruction pointers I get this way), but >>>>>> since I try to follow the CO-RE principle, it seems like I should be >>>>>> using the CO-RE-enabled variants instead. >>>>>> >>>>>> I did some experiments and found out that it is the >>>>>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that >>>>>> change the results and not __builtin_preserve_access_index. But I >>>>>> still don't get why exactly it changes the results. >>>>>> >>>>> >>>>> Can you provide the exact usage of the BPF CO-RE macros that isn't >>>>> working, and the equivalent non-CO-RE version that is? Also if you >>>> >>>> As a minimal example, I wrote the following little BPF program which >>>> prints instruction pointers obtained with non-CO-RE and CO-RE macros: >>>> >>>> volatile const pid_t target_pid; >>>> >>>> SEC("perf_event") >>>> int do_test(struct bpf_perf_event_data *ctx) { >>>> pid_t pid = bpf_get_current_pid_tgid(); >>>> if (pid != target_pid) return 0; >>>> >>>> unsigned long p = PT_REGS_IP(&ctx->regs); >>>> unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); >>>> bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); >>>> >>>> return 0; >>>> } >>>> >>>> From user space, I set the target PID and attach the program to CPU >>>> clock perf events (error checking and cleanup omitted for brevity): >>>> >>>> int main(int argc, char *argv[]) { >>>> // Load the program also setting the target PID >>>> struct test_program_bpf *skel = test_program_bpf__open(); >>>> skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10); >>>> test_program_bpf__load(skel); >>>> >>>> // Attach to perf events >>>> struct perf_event_attr attr = { >>>> .type = PERF_TYPE_SOFTWARE, >>>> .size = sizeof(struct perf_event_attr), >>>> .config = PERF_COUNT_SW_CPU_CLOCK, >>>> .sample_freq = 1, >>>> .freq = true >>>> }; >>>> for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) { >>>> int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0); >>>> bpf_program__attach_perf_event(skel->progs.do_test, perf_fd); >>>> } >>>> >>>> // Wait for Ctrl-C >>>> pause(); >>>> return 0; >>>> } >>>> >>>> As an experiment, I launched a simple C program with an endless loop >>>> in main and started the BPF program above with its target PID set to >>>> the PID of this simple C program. Then by checking the virtual memory >>>> mapped for the C program (with "cat /proc/<PID>/maps"), I found out >>>> that its .text section got mapped into 55ca2577b000-55ca2577c000 >>>> address space. When I checked the output of the BPF program, I got >>>> "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see, >>>> the non-CO-RE result maps into the .text section of the launched C >>>> program (as it should since this is the value of the instruction >>>> pointer), while the CO-RE result does not. >>>> >>>> Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the >>>> equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I >>>> get results that correspond to the stack address space from the >>>> non-CO-RE macro, but I always get 0 from the CO-RE macro. >>>> >>>>> can provide details on the platform you're running on that will >>>>> help narrow down the issue. Thanks! >>>> >>>> Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic, >>>> the architecture is x86_64, clang 14.0.0 is used to compile BPF >>>> programs with flags -g -O2 -D__TARGET_ARCH_x86. >>>> >>> >>> Thanks for the additional details! I've reproduced this on >>> bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE >>> macros, and with BPF_CORE_READ(). However with extra libbpf debugging >>> I do see that we pick up the right type id/index for the ip field in >>> pt_regs: >>> >>> libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216] >>> struct pt_regs.ip (0:16 @ offset 128) >>> >>> One thing I noticed - perhaps this will ring some bells for someone - >>> if I use __builtin_preserve_access_index() I get the same (correct) >>> value for ip as is retrieved with PT_REGS_IP(): >>> >>> __builtin_preserve_access_index(({ >>> p_core = ctx->regs.ip; >>> })); >>> >>> I'll check with latest LLVM to see if the issue persists there. >>> >> >> >> The problem occurs with latest bpf-next + latest LLVM too. Perf event >> programs fix up context accesses to the "struct bpf_perf_event_data *" >> context, so accessing ctx->regs in your program becomes accessing the >> "struct bpf_perf_event_data_kern *" regs, which is a pointer to >> struct pt_regs. So I _think_ that's why the >> >> __builtin_preserve_access_index(({ >> p_core = ctx->regs.ip; >> })); >> >> >> ...works; ctx->regs is fixed up to point at the right place, then >> CO-RE does its thing with the results. Contrast this with >> >> bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip); >> >> In the latter case, the fixups don't seem to happen and we get a >> bogus address which appears to be consistently 218 bytes after the ctx >> pointer. I've confirmed that a basic bpf_probe_read_kernel() >> exposes the issue (and gives the same wrong address as a CO-RE-wrapped >> bpf_probe_read_kernel()). >> >> I tried some permutations like defining >> >> struct pt_regs *regs = &ctx->regs; >> >> ...to see if that helps, but I think in that case the accesses aren't >> caught by the verifier because we use the & operator on the ctx->regs. >> >> Not sure how smart the verifier can be about context accesses like this; >> can someone who understands that code better than me take a look at this? > > Hi Alan, > > Your analysis is correct: verifier applies rewrites to instructions > that read/write from/to certain context fields, including > `struct bpf_perf_event_data`. > > This is done by function verifier.c:convert_ctx_accesses(). > This function handles BPF_LDX, BPF_STX and BPF_ST instructions, but it > does not handle calls to helpers like bpf_probe_read_kernel(). > > So, when code generated for PT_REGS_IP(&ctx->regs) is processed, the > correct access sequence is inserted by function > bpf_trace.c:pe_prog_convert_ctx_access() (see below). > > But code generated for `PT_REGS_IP_CORE(&ctx->regs)` is not modified. > Ah, makes sense. Would you consider it a bug that helper parameters don't get context conversions applied, or are there additional complexities here that mean that's not doable? (I'm wondering if we should fix versus document this?). I would have thought the only difference is the destination register, but the verifier is a mysterious land to me.. > It looks like `PT_REGS_IP_CORE` macro should not be defined through > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to > understand why is it defined like that now. > help If I recall the rationale was to allow the macros to work for both BPF programs that can do direct dereference (fentry, fexit, tp_btf etc) and for kprobe-style that need to use bpf_probe_read_kernel(). Not sure if it would be worth having variants that are purely dereference-based, since we can just use PT_REGS_IP() due to the __builtin_preserve_access_index attributes applied in vmlinux.h. Thanks! Alan > Thanks, > Eduard > > --- > Below is annotated example, inpatient reader might skip it > > For the following test program: > > #include "vmlinux.h" > ... > SEC("perf_event") > int do_test(struct bpf_perf_event_data *ctx) { > unsigned long p = PT_REGS_IP(&ctx->regs); > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); > return 0; > } > > Generated BPF code looks as follows: > > $ llvm-objdump --no-show-raw-insn -rd bpf.linked.o > ... > 0000000000000000 <do_test>: > # Third argument for bpf_probe_read_kernel: offset of bpf_perf_event_data::regs.ip > 0: r2 = 0x80 > 0000000000000000: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16) > 1: r3 = r1 > 2: r3 += r2 > # The "non CO-RE" version of PT_REGS_IP is, in fact, CO-RE > # because `struct bpf_perf_event_data` has preserve_access_index > # tag in the vmlinux.h. > # Here the regs.ip is stored in r6 to be used after the call > # to bpf_probe_read_kernel() (from PT_REGS_IP_CORE). > 3: r6 = *(u64 *)(r1 + 0x80) > 0000000000000018: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16) > # First argument for bpf_probe_read_kernel: a place on stack to put read result to. > 4: r1 = r10 > 5: r1 += -0x8 > # Second argument for bpf_probe_read_kernel: the size of the field to read. > 6: w2 = 0x8 > # Call to bpf_probe_read_kernel() > 7: call 0x71 > # Fourth parameter of bpf_printk: p_core read from stack > # (was written by call to bpf_probe_read_kernel) > 8: r4 = *(u64 *)(r10 - 0x8) > # First parameter of bpf_printk: control string > 9: r1 = 0x0 ll > 0000000000000048: R_BPF_64_64 .rodata > # Second parameter of bpf_printk: size of the control string > 11: w2 = 0x1b > # Third parameter of bpf_printk: p (see addr 3) > 12: r3 = r6 > # Call to bpf_printk > 13: call 0x6 > ; return 0; > 14: w0 = 0x0 > 15: exit > > I get the following BPF after all verifier rewrites are applied > (including verifier.c:convert_ctx_accesses()): > > # ./tools/bpf/bpftool/bpftool prog dump xlated id 114 > int do_test(struct bpf_perf_event_data * ctx): > ; int do_test(struct bpf_perf_event_data *ctx) { > 0: (b7) r2 = 128 | CO-RE replacement, 128 is a valid offset for > | bpf_perf_event_data::regs.ip in my kernel > 1: (bf) r3 = r1 > 2: (0f) r3 += r2 > > 3: (79) r6 = *(u64 *)(r1 +0) | This is an expantion of the > 4: (79) r6 = *(u64 *)(r6 +128) | r6 = *(u64 *)(r1 + 0x80) > 5: (bf) r1 = r10 | Created by bpf_trace.c:pe_prog_convert_ctx_access() > > 6: (07) r1 += -8 > 7: (b4) w2 = 8 > 8: (85) call bpf_probe_read_kernel#-91984 > 9: (79) r4 = *(u64 *)(r10 -8) > 10: (18) r1 = map[id:59][0]+0 > 12: (b4) w2 = 27 > 13: (bf) r3 = r6 > 14: (85) call bpf_trace_printk#-85520 > 15: (b4) w0 = 0 > 16: (95) exit > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-26 13:46 ` Alan Maguire @ 2023-07-26 20:03 ` Eduard Zingerman 2023-07-26 23:39 ` Eduard Zingerman 0 siblings, 1 reply; 11+ messages in thread From: Eduard Zingerman @ 2023-07-26 20:03 UTC (permalink / raw) To: Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf On Wed, 2023-07-26 at 14:46 +0100, Alan Maguire wrote: > On 26/07/2023 01:03, Eduard Zingerman wrote: > > On Tue, 2023-07-25 at 15:04 +0100, Alan Maguire wrote: > > > On 25/07/2023 00:00, Alan Maguire wrote: > > > > On 24/07/2023 16:04, Timofei Pushkin wrote: > > > > > On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote: > > > > > > > > > > > > On 24/07/2023 11:32, Timofei Pushkin wrote: > > > > > > > Dear BPF community, > > > > > > > > > > > > > > I'm developing a perf_event BPF program which reads some register > > > > > > > values (frame and instruction pointers in particular) from the context > > > > > > > provided to it. I found that CO-RE-enabled PT_REGS macros give results > > > > > > > different from the results of the usual PT_REGS macros. I run the > > > > > > > program on the same system I compiled it on, and so I cannot > > > > > > > understand why the results differ and which ones should I use? > > > > > > > > > > > > > > From my tests, the results of the usual macros are the correct ones > > > > > > > (e.g. I can symbolize the instruction pointers I get this way), but > > > > > > > since I try to follow the CO-RE principle, it seems like I should be > > > > > > > using the CO-RE-enabled variants instead. > > > > > > > > > > > > > > I did some experiments and found out that it is the > > > > > > > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that > > > > > > > change the results and not __builtin_preserve_access_index. But I > > > > > > > still don't get why exactly it changes the results. > > > > > > > > > > > > > > > > > > > Can you provide the exact usage of the BPF CO-RE macros that isn't > > > > > > working, and the equivalent non-CO-RE version that is? Also if you > > > > > > > > > > As a minimal example, I wrote the following little BPF program which > > > > > prints instruction pointers obtained with non-CO-RE and CO-RE macros: > > > > > > > > > > volatile const pid_t target_pid; > > > > > > > > > > SEC("perf_event") > > > > > int do_test(struct bpf_perf_event_data *ctx) { > > > > > pid_t pid = bpf_get_current_pid_tgid(); > > > > > if (pid != target_pid) return 0; > > > > > > > > > > unsigned long p = PT_REGS_IP(&ctx->regs); > > > > > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); > > > > > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); > > > > > > > > > > return 0; > > > > > } > > > > > > > > > > From user space, I set the target PID and attach the program to CPU > > > > > clock perf events (error checking and cleanup omitted for brevity): > > > > > > > > > > int main(int argc, char *argv[]) { > > > > > // Load the program also setting the target PID > > > > > struct test_program_bpf *skel = test_program_bpf__open(); > > > > > skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10); > > > > > test_program_bpf__load(skel); > > > > > > > > > > // Attach to perf events > > > > > struct perf_event_attr attr = { > > > > > .type = PERF_TYPE_SOFTWARE, > > > > > .size = sizeof(struct perf_event_attr), > > > > > .config = PERF_COUNT_SW_CPU_CLOCK, > > > > > .sample_freq = 1, > > > > > .freq = true > > > > > }; > > > > > for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) { > > > > > int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0); > > > > > bpf_program__attach_perf_event(skel->progs.do_test, perf_fd); > > > > > } > > > > > > > > > > // Wait for Ctrl-C > > > > > pause(); > > > > > return 0; > > > > > } > > > > > > > > > > As an experiment, I launched a simple C program with an endless loop > > > > > in main and started the BPF program above with its target PID set to > > > > > the PID of this simple C program. Then by checking the virtual memory > > > > > mapped for the C program (with "cat /proc/<PID>/maps"), I found out > > > > > that its .text section got mapped into 55ca2577b000-55ca2577c000 > > > > > address space. When I checked the output of the BPF program, I got > > > > > "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see, > > > > > the non-CO-RE result maps into the .text section of the launched C > > > > > program (as it should since this is the value of the instruction > > > > > pointer), while the CO-RE result does not. > > > > > > > > > > Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the > > > > > equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I > > > > > get results that correspond to the stack address space from the > > > > > non-CO-RE macro, but I always get 0 from the CO-RE macro. > > > > > > > > > > > can provide details on the platform you're running on that will > > > > > > help narrow down the issue. Thanks! > > > > > > > > > > Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic, > > > > > the architecture is x86_64, clang 14.0.0 is used to compile BPF > > > > > programs with flags -g -O2 -D__TARGET_ARCH_x86. > > > > > > > > > > > > > Thanks for the additional details! I've reproduced this on > > > > bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE > > > > macros, and with BPF_CORE_READ(). However with extra libbpf debugging > > > > I do see that we pick up the right type id/index for the ip field in > > > > pt_regs: > > > > > > > > libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216] > > > > struct pt_regs.ip (0:16 @ offset 128) > > > > > > > > One thing I noticed - perhaps this will ring some bells for someone - > > > > if I use __builtin_preserve_access_index() I get the same (correct) > > > > value for ip as is retrieved with PT_REGS_IP(): > > > > > > > > __builtin_preserve_access_index(({ > > > > p_core = ctx->regs.ip; > > > > })); > > > > > > > > I'll check with latest LLVM to see if the issue persists there. > > > > > > > > > > > > > The problem occurs with latest bpf-next + latest LLVM too. Perf event > > > programs fix up context accesses to the "struct bpf_perf_event_data *" > > > context, so accessing ctx->regs in your program becomes accessing the > > > "struct bpf_perf_event_data_kern *" regs, which is a pointer to > > > struct pt_regs. So I _think_ that's why the > > > > > > __builtin_preserve_access_index(({ > > > p_core = ctx->regs.ip; > > > })); > > > > > > > > > ...works; ctx->regs is fixed up to point at the right place, then > > > CO-RE does its thing with the results. Contrast this with > > > > > > bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip); > > > > > > In the latter case, the fixups don't seem to happen and we get a > > > bogus address which appears to be consistently 218 bytes after the ctx > > > pointer. I've confirmed that a basic bpf_probe_read_kernel() > > > exposes the issue (and gives the same wrong address as a CO-RE-wrapped > > > bpf_probe_read_kernel()). > > > > > > I tried some permutations like defining > > > > > > struct pt_regs *regs = &ctx->regs; > > > > > > ...to see if that helps, but I think in that case the accesses aren't > > > caught by the verifier because we use the & operator on the ctx->regs. > > > > > > Not sure how smart the verifier can be about context accesses like this; > > > can someone who understands that code better than me take a look at this? > > > > Hi Alan, > > > > Your analysis is correct: verifier applies rewrites to instructions > > that read/write from/to certain context fields, including > > `struct bpf_perf_event_data`. > > > > This is done by function verifier.c:convert_ctx_accesses(). > > This function handles BPF_LDX, BPF_STX and BPF_ST instructions, but it > > does not handle calls to helpers like bpf_probe_read_kernel(). > > > > So, when code generated for PT_REGS_IP(&ctx->regs) is processed, the > > correct access sequence is inserted by function > > bpf_trace.c:pe_prog_convert_ctx_access() (see below). > > > > But code generated for `PT_REGS_IP_CORE(&ctx->regs)` is not modified. > > > > Ah, makes sense. Would you consider it a bug that helper parameters > don't get context conversions applied, or are there additional > complexities here that mean that's not doable? (I'm wondering if > we should fix versus document this?). I would have thought the > only difference is the destination register, but the verifier is > a mysterious land to me.. Sorry for delay, had to read through implementation. Verifier already reports an error when pointer to a context is passed to helper function if context is subject to convert_ctx_accesses(). This is done in function verifier.c:check_helper_mem_access() (see `case PTR_TO_CTX` at the end of the function). For example, the following program is rejected: struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1); __type(key, void *); __type(value, int); } map SEC(".maps"); SEC("perf_event") int do_test(struct bpf_perf_event_data *ctx) { void *p = bpf_map_lookup_elem(&map, &ctx->regs); return p ? 0 : 1; } Here is the log: ... 0: R1=ctx(off=0,imm=0) R10=fp0 ; int do_test(struct bpf_perf_event_data *ctx) { 0: (bf) r2 = r1 ; R1=ctx(off=0,imm=0) R2_w=ctx(off=0,imm=0) 1: (b7) r1 = 0 ; R1_w=0 2: (0f) r2 += r1 mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1 mark_precise: frame0: regs=r1 stack= before 1: (b7) r1 = 0 3: R1_w=0 R2_w=ctx(off=0,imm=0) ; void *p = bpf_map_lookup_elem(&map, &ctx->regs); 3: (18) r1 = 0xffff888102c22000 ; R1_w=map_ptr(off=0,ks=8,vs=4,imm=0) 5: (85) call bpf_map_lookup_elem#1 R2 type=ctx expected=fp, pkt, pkt_meta, map_key, map_value, mem, ringbuf_mem, buf, trusted_ptr_ ^^^^^^ error message (a bit cryptic) ... However, this error is checked conditionally depending on the proto defined for the helper function. For example, here is the proto for bpf_map_lookup_elem(): const struct bpf_func_proto bpf_map_lookup_elem_proto = { .func = bpf_map_lookup_elem, .gpl_only = false, .pkt_access = true, .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, .arg1_type = ARG_CONST_MAP_PTR, // <-- check_helper_mem_access() is done .arg2_type = ARG_PTR_TO_MAP_KEY, // <-- check_helper_mem_access() is done }; And here is prototype for bpf_probe_read_kernel(): const struct bpf_func_proto bpf_probe_read_kernel_proto = { .func = bpf_probe_read_kernel, .gpl_only = true, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_UNINIT_MEM, // <-- check_helper_mem_access() is done .arg2_type = ARG_CONST_SIZE_OR_ZERO, // <-- check_helper_mem_access() is done for // the previous argument (a bit complicated) .arg3_type = ARG_ANYTHING, // <-- check_helper_mem_access() is *NOT* done }; So, the way bpf_probe_read_kernel() is defined for verifier it does not check that last argument might point to "fake" location. I think this should be fixed, but would be interesting to hear what people on the mailing list think. > > It looks like `PT_REGS_IP_CORE` macro should not be defined through > > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to > > understand why is it defined like that now. > > help > > If I recall the rationale was to allow the macros to work for both > BPF programs that can do direct dereference (fentry, fexit, tp_btf etc) > and for kprobe-style that need to use bpf_probe_read_kernel(). > Not sure if it would be worth having variants that are purely > dereference-based, since we can just use PT_REGS_IP() due to > the __builtin_preserve_access_index attributes applied in vmlinux.h. Sorry, need a bit more time, thanks for the context. > > Thanks! > > Alan > > > Thanks, > > Eduard > > > > --- > > Below is annotated example, inpatient reader might skip it > > > > For the following test program: > > > > #include "vmlinux.h" > > ... > > SEC("perf_event") > > int do_test(struct bpf_perf_event_data *ctx) { > > unsigned long p = PT_REGS_IP(&ctx->regs); > > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs); > > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core); > > return 0; > > } > > > > Generated BPF code looks as follows: > > > > $ llvm-objdump --no-show-raw-insn -rd bpf.linked.o > > ... > > 0000000000000000 <do_test>: > > # Third argument for bpf_probe_read_kernel: offset of bpf_perf_event_data::regs.ip > > 0: r2 = 0x80 > > 0000000000000000: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16) > > 1: r3 = r1 > > 2: r3 += r2 > > # The "non CO-RE" version of PT_REGS_IP is, in fact, CO-RE > > # because `struct bpf_perf_event_data` has preserve_access_index > > # tag in the vmlinux.h. > > # Here the regs.ip is stored in r6 to be used after the call > > # to bpf_probe_read_kernel() (from PT_REGS_IP_CORE). > > 3: r6 = *(u64 *)(r1 + 0x80) > > 0000000000000018: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16) > > # First argument for bpf_probe_read_kernel: a place on stack to put read result to. > > 4: r1 = r10 > > 5: r1 += -0x8 > > # Second argument for bpf_probe_read_kernel: the size of the field to read. > > 6: w2 = 0x8 > > # Call to bpf_probe_read_kernel() > > 7: call 0x71 > > # Fourth parameter of bpf_printk: p_core read from stack > > # (was written by call to bpf_probe_read_kernel) > > 8: r4 = *(u64 *)(r10 - 0x8) > > # First parameter of bpf_printk: control string > > 9: r1 = 0x0 ll > > 0000000000000048: R_BPF_64_64 .rodata > > # Second parameter of bpf_printk: size of the control string > > 11: w2 = 0x1b > > # Third parameter of bpf_printk: p (see addr 3) > > 12: r3 = r6 > > # Call to bpf_printk > > 13: call 0x6 > > ; return 0; > > 14: w0 = 0x0 > > 15: exit > > > > I get the following BPF after all verifier rewrites are applied > > (including verifier.c:convert_ctx_accesses()): > > > > # ./tools/bpf/bpftool/bpftool prog dump xlated id 114 > > int do_test(struct bpf_perf_event_data * ctx): > > ; int do_test(struct bpf_perf_event_data *ctx) { > > 0: (b7) r2 = 128 | CO-RE replacement, 128 is a valid offset for > > | bpf_perf_event_data::regs.ip in my kernel > > 1: (bf) r3 = r1 > > 2: (0f) r3 += r2 > > > > 3: (79) r6 = *(u64 *)(r1 +0) | This is an expantion of the > > 4: (79) r6 = *(u64 *)(r6 +128) | r6 = *(u64 *)(r1 + 0x80) > > 5: (bf) r1 = r10 | Created by bpf_trace.c:pe_prog_convert_ctx_access() > > > > 6: (07) r1 += -8 > > 7: (b4) w2 = 8 > > 8: (85) call bpf_probe_read_kernel#-91984 > > 9: (79) r4 = *(u64 *)(r10 -8) > > 10: (18) r1 = map[id:59][0]+0 > > 12: (b4) w2 = 27 > > 13: (bf) r3 = r6 > > 14: (85) call bpf_trace_printk#-85520 > > 15: (b4) w0 = 0 > > 16: (95) exit > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-26 20:03 ` Eduard Zingerman @ 2023-07-26 23:39 ` Eduard Zingerman 2023-07-28 3:03 ` Yonghong Song 0 siblings, 1 reply; 11+ messages in thread From: Eduard Zingerman @ 2023-07-26 23:39 UTC (permalink / raw) To: Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf On Wed, 2023-07-26 at 23:03 +0300, Eduard Zingerman wrote: [...] > > > It looks like `PT_REGS_IP_CORE` macro should not be defined through > > > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to > > > understand why is it defined like that now. > > > help > > > > If I recall the rationale was to allow the macros to work for both > > BPF programs that can do direct dereference (fentry, fexit, tp_btf etc) > > and for kprobe-style that need to use bpf_probe_read_kernel(). > > Not sure if it would be worth having variants that are purely > > dereference-based, since we can just use PT_REGS_IP() due to > > the __builtin_preserve_access_index attributes applied in vmlinux.h. > > Sorry, need a bit more time, thanks for the context. The PT_REGS_*_CORE macros were added by Andrii Nakryiko in [1]. Stated intent there is to use those macros for raw tracepoint programs. Such programs have `struct pt_regs` as a parameter. Contexts of type `struct pt_regs` are *not* subject to rewrite by convert_ctx_access(), so it is valid to use PT_REGS_*_CORE for such programs. However, `struct pt_regs` is also a part of `struct bpf_perf_event_data`. Latter is used as a context parameter for "perf_event" programs and is a subject to rewrite by convert_ctx_access(). Thus, PT_REGS_*_CORE macros can't be used for such programs (because these macro are implemented through bpf_probe_read_kernel() of which convert_ctx_access() is not aware). If `struct pt_regs` is defined with `preserve_access_index` attribute CO-RE relocations are generated for both PT_REGS_IP_CORE and PT_REGS_IP invocations. So, there is no real need to use *_CORE variants in combination with `struct bpf_perf_event_data` to have all CO-RE benefits, e.g.: $ cat bpf.c #include "vmlinux.h" // ... SEC("perf_event") int do_test(struct bpf_perf_event_data *ctx) { return PT_REGS_IP(&ctx->regs); } // ... $ llvm-objdump --no-show-raw-insn -rd bpf.o ... 0000000000000000 <do_test>: 0: r0 = *(u64 *)(r1 + 0x80) 0000000000000000: CO-RE <byte_off> [11] struct bpf_perf_event_data::regs.ip (0:0:16) 1: exit [1] b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros") --- I think the following should be done: - Timofei's code should use PT_REGS_IP and make sure that `struct pt_regs` has preserve_access_index annotation (e.g. use vmlinux.h); - verifier should be adjusted to report error when bpf_probe_read_kernel() (and similar) are used to read from "fake" contexts. - (maybe?) update PT_REGS_*_CORE to use `__builtin_preserve_access_index` (to allow usage with `bpf_perf_event_data` context). [...] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-26 23:39 ` Eduard Zingerman @ 2023-07-28 3:03 ` Yonghong Song 2023-07-28 12:30 ` Eduard Zingerman 0 siblings, 1 reply; 11+ messages in thread From: Yonghong Song @ 2023-07-28 3:03 UTC (permalink / raw) To: Eduard Zingerman, Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf On 7/26/23 4:39 PM, Eduard Zingerman wrote: > On Wed, 2023-07-26 at 23:03 +0300, Eduard Zingerman wrote: > [...] >>>> It looks like `PT_REGS_IP_CORE` macro should not be defined through >>>> bpf_probe_read_kernel(). I'll dig through commit history tomorrow to >>>> understand why is it defined like that now. >>>> help >>> >>> If I recall the rationale was to allow the macros to work for both >>> BPF programs that can do direct dereference (fentry, fexit, tp_btf etc) >>> and for kprobe-style that need to use bpf_probe_read_kernel(). >>> Not sure if it would be worth having variants that are purely >>> dereference-based, since we can just use PT_REGS_IP() due to >>> the __builtin_preserve_access_index attributes applied in vmlinux.h. >> >> Sorry, need a bit more time, thanks for the context. > > The PT_REGS_*_CORE macros were added by Andrii Nakryiko in [1]. > Stated intent there is to use those macros for raw tracepoint > programs. Such programs have `struct pt_regs` as a parameter. > Contexts of type `struct pt_regs` are *not* subject to rewrite by > convert_ctx_access(), so it is valid to use PT_REGS_*_CORE for such > programs. > > However, `struct pt_regs` is also a part of `struct > bpf_perf_event_data`. Latter is used as a context parameter for > "perf_event" programs and is a subject to rewrite by > convert_ctx_access(). Thus, PT_REGS_*_CORE macros can't be used for > such programs (because these macro are implemented through > bpf_probe_read_kernel() of which convert_ctx_access() is not aware). > > If `struct pt_regs` is defined with `preserve_access_index` attribute > CO-RE relocations are generated for both PT_REGS_IP_CORE and > PT_REGS_IP invocations. So, there is no real need to use *_CORE > variants in combination with `struct bpf_perf_event_data` to have all > CO-RE benefits, e.g.: > > $ cat bpf.c > #include "vmlinux.h" > // ... > SEC("perf_event") > int do_test(struct bpf_perf_event_data *ctx) { > return PT_REGS_IP(&ctx->regs); > } > // ... > $ llvm-objdump --no-show-raw-insn -rd bpf.o > ... > 0000000000000000 <do_test>: > 0: r0 = *(u64 *)(r1 + 0x80) > 0000000000000000: CO-RE <byte_off> [11] struct bpf_perf_event_data::regs.ip (0:0:16) > 1: exit > > [1] b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros") > > --- > > I think the following should be done: > - Timofei's code should use PT_REGS_IP and make sure that `struct > pt_regs` has preserve_access_index annotation (e.g. use vmlinux.h); > - verifier should be adjusted to report error when > bpf_probe_read_kernel() (and similar) are used to read from "fake" > contexts. The func prototype of bpf_probe_read_kernel() is BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) { return bpf_probe_read_kernel_common(dst, size, unsafe_ptr); } Notice the argument name is 'unsafe_ptr'. So there is no checking in verifier for this argument. Some users may take advantage of this to initialize the 'dst' with 0 by providing an illegal address. > - (maybe?) update PT_REGS_*_CORE to use `__builtin_preserve_access_index` > (to allow usage with `bpf_perf_event_data` context). > > [...] > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results 2023-07-28 3:03 ` Yonghong Song @ 2023-07-28 12:30 ` Eduard Zingerman 0 siblings, 0 replies; 11+ messages in thread From: Eduard Zingerman @ 2023-07-28 12:30 UTC (permalink / raw) To: yonghong.song, Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf On Thu, 2023-07-27 at 20:03 -0700, Yonghong Song wrote: > > On 7/26/23 4:39 PM, Eduard Zingerman wrote: > > On Wed, 2023-07-26 at 23:03 +0300, Eduard Zingerman wrote: > > [...] > > > > > It looks like `PT_REGS_IP_CORE` macro should not be defined through > > > > > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to > > > > > understand why is it defined like that now. > > > > > help > > > > > > > > If I recall the rationale was to allow the macros to work for both > > > > BPF programs that can do direct dereference (fentry, fexit, tp_btf etc) > > > > and for kprobe-style that need to use bpf_probe_read_kernel(). > > > > Not sure if it would be worth having variants that are purely > > > > dereference-based, since we can just use PT_REGS_IP() due to > > > > the __builtin_preserve_access_index attributes applied in vmlinux.h. > > > > > > Sorry, need a bit more time, thanks for the context. > > > > The PT_REGS_*_CORE macros were added by Andrii Nakryiko in [1]. > > Stated intent there is to use those macros for raw tracepoint > > programs. Such programs have `struct pt_regs` as a parameter. > > Contexts of type `struct pt_regs` are *not* subject to rewrite by > > convert_ctx_access(), so it is valid to use PT_REGS_*_CORE for such > > programs. > > > > However, `struct pt_regs` is also a part of `struct > > bpf_perf_event_data`. Latter is used as a context parameter for > > "perf_event" programs and is a subject to rewrite by > > convert_ctx_access(). Thus, PT_REGS_*_CORE macros can't be used for > > such programs (because these macro are implemented through > > bpf_probe_read_kernel() of which convert_ctx_access() is not aware). > > > > If `struct pt_regs` is defined with `preserve_access_index` attribute > > CO-RE relocations are generated for both PT_REGS_IP_CORE and > > PT_REGS_IP invocations. So, there is no real need to use *_CORE > > variants in combination with `struct bpf_perf_event_data` to have all > > CO-RE benefits, e.g.: > > > > $ cat bpf.c > > #include "vmlinux.h" > > // ... > > SEC("perf_event") > > int do_test(struct bpf_perf_event_data *ctx) { > > return PT_REGS_IP(&ctx->regs); > > } > > // ... > > $ llvm-objdump --no-show-raw-insn -rd bpf.o > > ... > > 0000000000000000 <do_test>: > > 0: r0 = *(u64 *)(r1 + 0x80) > > 0000000000000000: CO-RE <byte_off> [11] struct bpf_perf_event_data::regs.ip (0:0:16) > > 1: exit > > > > [1] b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros") > > > > --- > > > > I think the following should be done: > > - Timofei's code should use PT_REGS_IP and make sure that `struct > > pt_regs` has preserve_access_index annotation (e.g. use vmlinux.h); > > - verifier should be adjusted to report error when > > bpf_probe_read_kernel() (and similar) are used to read from "fake" > > contexts. > > The func prototype of bpf_probe_read_kernel() is > > BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, > const void *, unsafe_ptr) > { > return bpf_probe_read_kernel_common(dst, size, unsafe_ptr); > } > > Notice the argument name is 'unsafe_ptr'. So there is no checking > in verifier for this argument. Some users may take advantage of this > to initialize the 'dst' with 0 by providing an illegal address. On the one hand yes, but on the other hand the address of context parameter like bpf_perf_event_data is a kind of fake, it does not exist. It would be meaningful to use bpf_probe_read_kernel() for this address only if someone knows the layout of the internal verifier structure `bpf_perf_event_data_kern` and wants to access it. Tbh, this appears to be a "footgun". > > > > - (maybe?) update PT_REGS_*_CORE to use `__builtin_preserve_access_index` > > (to allow usage with `bpf_perf_event_data` context). > > > > [...] > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-07-28 12:30 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-24 10:32 Question: CO-RE-enabled PT_REGS macros give strange results Timofei Pushkin 2023-07-24 12:36 ` Alan Maguire 2023-07-24 15:04 ` Timofei Pushkin 2023-07-24 23:00 ` Alan Maguire 2023-07-25 14:04 ` Alan Maguire 2023-07-26 0:03 ` Eduard Zingerman 2023-07-26 13:46 ` Alan Maguire 2023-07-26 20:03 ` Eduard Zingerman 2023-07-26 23:39 ` Eduard Zingerman 2023-07-28 3:03 ` Yonghong Song 2023-07-28 12:30 ` Eduard Zingerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox