* Question: CO-RE-enabled PT_REGS macros give strange results
@ 2023-07-24 10:32 Timofei Pushkin
2023-07-24 12:36 ` Alan Maguire
0 siblings, 1 reply; 11+ messages in thread
From: Timofei Pushkin @ 2023-07-24 10:32 UTC (permalink / raw)
To: bpf
Dear BPF community,
I'm developing a perf_event BPF program which reads some register
values (frame and instruction pointers in particular) from the context
provided to it. I found that CO-RE-enabled PT_REGS macros give results
different from the results of the usual PT_REGS macros. I run the
program on the same system I compiled it on, and so I cannot
understand why the results differ and which ones should I use?
From my tests, the results of the usual macros are the correct ones
(e.g. I can symbolize the instruction pointers I get this way), but
since I try to follow the CO-RE principle, it seems like I should be
using the CO-RE-enabled variants instead.
I did some experiments and found out that it is the
bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
change the results and not __builtin_preserve_access_index. But I
still don't get why exactly it changes the results.
Thank you in advance,
Timofei
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-24 10:32 Question: CO-RE-enabled PT_REGS macros give strange results Timofei Pushkin
@ 2023-07-24 12:36 ` Alan Maguire
2023-07-24 15:04 ` Timofei Pushkin
0 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2023-07-24 12:36 UTC (permalink / raw)
To: Timofei Pushkin, bpf
On 24/07/2023 11:32, Timofei Pushkin wrote:
> Dear BPF community,
>
> I'm developing a perf_event BPF program which reads some register
> values (frame and instruction pointers in particular) from the context
> provided to it. I found that CO-RE-enabled PT_REGS macros give results
> different from the results of the usual PT_REGS macros. I run the
> program on the same system I compiled it on, and so I cannot
> understand why the results differ and which ones should I use?
>
> From my tests, the results of the usual macros are the correct ones
> (e.g. I can symbolize the instruction pointers I get this way), but
> since I try to follow the CO-RE principle, it seems like I should be
> using the CO-RE-enabled variants instead.
>
> I did some experiments and found out that it is the
> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
> change the results and not __builtin_preserve_access_index. But I
> still don't get why exactly it changes the results.
>
Can you provide the exact usage of the BPF CO-RE macros that isn't
working, and the equivalent non-CO-RE version that is? Also if you
can provide details on the platform you're running on that will
help narrow down the issue. Thanks!
Alan
> Thank you in advance,
> Timofei
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-24 12:36 ` Alan Maguire
@ 2023-07-24 15:04 ` Timofei Pushkin
2023-07-24 23:00 ` Alan Maguire
0 siblings, 1 reply; 11+ messages in thread
From: Timofei Pushkin @ 2023-07-24 15:04 UTC (permalink / raw)
To: Alan Maguire; +Cc: bpf
On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 24/07/2023 11:32, Timofei Pushkin wrote:
> > Dear BPF community,
> >
> > I'm developing a perf_event BPF program which reads some register
> > values (frame and instruction pointers in particular) from the context
> > provided to it. I found that CO-RE-enabled PT_REGS macros give results
> > different from the results of the usual PT_REGS macros. I run the
> > program on the same system I compiled it on, and so I cannot
> > understand why the results differ and which ones should I use?
> >
> > From my tests, the results of the usual macros are the correct ones
> > (e.g. I can symbolize the instruction pointers I get this way), but
> > since I try to follow the CO-RE principle, it seems like I should be
> > using the CO-RE-enabled variants instead.
> >
> > I did some experiments and found out that it is the
> > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
> > change the results and not __builtin_preserve_access_index. But I
> > still don't get why exactly it changes the results.
> >
>
> Can you provide the exact usage of the BPF CO-RE macros that isn't
> working, and the equivalent non-CO-RE version that is? Also if you
As a minimal example, I wrote the following little BPF program which
prints instruction pointers obtained with non-CO-RE and CO-RE macros:
volatile const pid_t target_pid;
SEC("perf_event")
int do_test(struct bpf_perf_event_data *ctx) {
pid_t pid = bpf_get_current_pid_tgid();
if (pid != target_pid) return 0;
unsigned long p = PT_REGS_IP(&ctx->regs);
unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
return 0;
}
From user space, I set the target PID and attach the program to CPU
clock perf events (error checking and cleanup omitted for brevity):
int main(int argc, char *argv[]) {
// Load the program also setting the target PID
struct test_program_bpf *skel = test_program_bpf__open();
skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
test_program_bpf__load(skel);
// Attach to perf events
struct perf_event_attr attr = {
.type = PERF_TYPE_SOFTWARE,
.size = sizeof(struct perf_event_attr),
.config = PERF_COUNT_SW_CPU_CLOCK,
.sample_freq = 1,
.freq = true
};
for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
}
// Wait for Ctrl-C
pause();
return 0;
}
As an experiment, I launched a simple C program with an endless loop
in main and started the BPF program above with its target PID set to
the PID of this simple C program. Then by checking the virtual memory
mapped for the C program (with "cat /proc/<PID>/maps"), I found out
that its .text section got mapped into 55ca2577b000-55ca2577c000
address space. When I checked the output of the BPF program, I got
"non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
the non-CO-RE result maps into the .text section of the launched C
program (as it should since this is the value of the instruction
pointer), while the CO-RE result does not.
Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
get results that correspond to the stack address space from the
non-CO-RE macro, but I always get 0 from the CO-RE macro.
> can provide details on the platform you're running on that will
> help narrow down the issue. Thanks!
Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
the architecture is x86_64, clang 14.0.0 is used to compile BPF
programs with flags -g -O2 -D__TARGET_ARCH_x86.
Thanks,
Timofei
>
> Alan
>
> > Thank you in advance,
> > Timofei
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-24 15:04 ` Timofei Pushkin
@ 2023-07-24 23:00 ` Alan Maguire
2023-07-25 14:04 ` Alan Maguire
0 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2023-07-24 23:00 UTC (permalink / raw)
To: Timofei Pushkin; +Cc: bpf
On 24/07/2023 16:04, Timofei Pushkin wrote:
> On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 24/07/2023 11:32, Timofei Pushkin wrote:
>>> Dear BPF community,
>>>
>>> I'm developing a perf_event BPF program which reads some register
>>> values (frame and instruction pointers in particular) from the context
>>> provided to it. I found that CO-RE-enabled PT_REGS macros give results
>>> different from the results of the usual PT_REGS macros. I run the
>>> program on the same system I compiled it on, and so I cannot
>>> understand why the results differ and which ones should I use?
>>>
>>> From my tests, the results of the usual macros are the correct ones
>>> (e.g. I can symbolize the instruction pointers I get this way), but
>>> since I try to follow the CO-RE principle, it seems like I should be
>>> using the CO-RE-enabled variants instead.
>>>
>>> I did some experiments and found out that it is the
>>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
>>> change the results and not __builtin_preserve_access_index. But I
>>> still don't get why exactly it changes the results.
>>>
>>
>> Can you provide the exact usage of the BPF CO-RE macros that isn't
>> working, and the equivalent non-CO-RE version that is? Also if you
>
> As a minimal example, I wrote the following little BPF program which
> prints instruction pointers obtained with non-CO-RE and CO-RE macros:
>
> volatile const pid_t target_pid;
>
> SEC("perf_event")
> int do_test(struct bpf_perf_event_data *ctx) {
> pid_t pid = bpf_get_current_pid_tgid();
> if (pid != target_pid) return 0;
>
> unsigned long p = PT_REGS_IP(&ctx->regs);
> unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
> bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
>
> return 0;
> }
>
> From user space, I set the target PID and attach the program to CPU
> clock perf events (error checking and cleanup omitted for brevity):
>
> int main(int argc, char *argv[]) {
> // Load the program also setting the target PID
> struct test_program_bpf *skel = test_program_bpf__open();
> skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
> test_program_bpf__load(skel);
>
> // Attach to perf events
> struct perf_event_attr attr = {
> .type = PERF_TYPE_SOFTWARE,
> .size = sizeof(struct perf_event_attr),
> .config = PERF_COUNT_SW_CPU_CLOCK,
> .sample_freq = 1,
> .freq = true
> };
> for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
> int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
> bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
> }
>
> // Wait for Ctrl-C
> pause();
> return 0;
> }
>
> As an experiment, I launched a simple C program with an endless loop
> in main and started the BPF program above with its target PID set to
> the PID of this simple C program. Then by checking the virtual memory
> mapped for the C program (with "cat /proc/<PID>/maps"), I found out
> that its .text section got mapped into 55ca2577b000-55ca2577c000
> address space. When I checked the output of the BPF program, I got
> "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
> the non-CO-RE result maps into the .text section of the launched C
> program (as it should since this is the value of the instruction
> pointer), while the CO-RE result does not.
>
> Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
> equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
> get results that correspond to the stack address space from the
> non-CO-RE macro, but I always get 0 from the CO-RE macro.
>
>> can provide details on the platform you're running on that will
>> help narrow down the issue. Thanks!
>
> Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
> the architecture is x86_64, clang 14.0.0 is used to compile BPF
> programs with flags -g -O2 -D__TARGET_ARCH_x86.
>
Thanks for the additional details! I've reproduced this on
bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE
macros, and with BPF_CORE_READ(). However with extra libbpf debugging
I do see that we pick up the right type id/index for the ip field in
pt_regs:
libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216]
struct pt_regs.ip (0:16 @ offset 128)
One thing I noticed - perhaps this will ring some bells for someone -
if I use __builtin_preserve_access_index() I get the same (correct)
value for ip as is retrieved with PT_REGS_IP():
__builtin_preserve_access_index(({
p_core = ctx->regs.ip;
}));
I'll check with latest LLVM to see if the issue persists there.
Alan
> Thanks,
> Timofei
>
>>
>> Alan
>>
>>> Thank you in advance,
>>> Timofei
>>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-24 23:00 ` Alan Maguire
@ 2023-07-25 14:04 ` Alan Maguire
2023-07-26 0:03 ` Eduard Zingerman
0 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2023-07-25 14:04 UTC (permalink / raw)
To: Timofei Pushkin; +Cc: bpf
On 25/07/2023 00:00, Alan Maguire wrote:
> On 24/07/2023 16:04, Timofei Pushkin wrote:
>> On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>
>>> On 24/07/2023 11:32, Timofei Pushkin wrote:
>>>> Dear BPF community,
>>>>
>>>> I'm developing a perf_event BPF program which reads some register
>>>> values (frame and instruction pointers in particular) from the context
>>>> provided to it. I found that CO-RE-enabled PT_REGS macros give results
>>>> different from the results of the usual PT_REGS macros. I run the
>>>> program on the same system I compiled it on, and so I cannot
>>>> understand why the results differ and which ones should I use?
>>>>
>>>> From my tests, the results of the usual macros are the correct ones
>>>> (e.g. I can symbolize the instruction pointers I get this way), but
>>>> since I try to follow the CO-RE principle, it seems like I should be
>>>> using the CO-RE-enabled variants instead.
>>>>
>>>> I did some experiments and found out that it is the
>>>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
>>>> change the results and not __builtin_preserve_access_index. But I
>>>> still don't get why exactly it changes the results.
>>>>
>>>
>>> Can you provide the exact usage of the BPF CO-RE macros that isn't
>>> working, and the equivalent non-CO-RE version that is? Also if you
>>
>> As a minimal example, I wrote the following little BPF program which
>> prints instruction pointers obtained with non-CO-RE and CO-RE macros:
>>
>> volatile const pid_t target_pid;
>>
>> SEC("perf_event")
>> int do_test(struct bpf_perf_event_data *ctx) {
>> pid_t pid = bpf_get_current_pid_tgid();
>> if (pid != target_pid) return 0;
>>
>> unsigned long p = PT_REGS_IP(&ctx->regs);
>> unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
>> bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
>>
>> return 0;
>> }
>>
>> From user space, I set the target PID and attach the program to CPU
>> clock perf events (error checking and cleanup omitted for brevity):
>>
>> int main(int argc, char *argv[]) {
>> // Load the program also setting the target PID
>> struct test_program_bpf *skel = test_program_bpf__open();
>> skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
>> test_program_bpf__load(skel);
>>
>> // Attach to perf events
>> struct perf_event_attr attr = {
>> .type = PERF_TYPE_SOFTWARE,
>> .size = sizeof(struct perf_event_attr),
>> .config = PERF_COUNT_SW_CPU_CLOCK,
>> .sample_freq = 1,
>> .freq = true
>> };
>> for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
>> int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
>> bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
>> }
>>
>> // Wait for Ctrl-C
>> pause();
>> return 0;
>> }
>>
>> As an experiment, I launched a simple C program with an endless loop
>> in main and started the BPF program above with its target PID set to
>> the PID of this simple C program. Then by checking the virtual memory
>> mapped for the C program (with "cat /proc/<PID>/maps"), I found out
>> that its .text section got mapped into 55ca2577b000-55ca2577c000
>> address space. When I checked the output of the BPF program, I got
>> "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
>> the non-CO-RE result maps into the .text section of the launched C
>> program (as it should since this is the value of the instruction
>> pointer), while the CO-RE result does not.
>>
>> Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
>> equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
>> get results that correspond to the stack address space from the
>> non-CO-RE macro, but I always get 0 from the CO-RE macro.
>>
>>> can provide details on the platform you're running on that will
>>> help narrow down the issue. Thanks!
>>
>> Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
>> the architecture is x86_64, clang 14.0.0 is used to compile BPF
>> programs with flags -g -O2 -D__TARGET_ARCH_x86.
>>
>
> Thanks for the additional details! I've reproduced this on
> bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE
> macros, and with BPF_CORE_READ(). However with extra libbpf debugging
> I do see that we pick up the right type id/index for the ip field in
> pt_regs:
>
> libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216]
> struct pt_regs.ip (0:16 @ offset 128)
>
> One thing I noticed - perhaps this will ring some bells for someone -
> if I use __builtin_preserve_access_index() I get the same (correct)
> value for ip as is retrieved with PT_REGS_IP():
>
> __builtin_preserve_access_index(({
> p_core = ctx->regs.ip;
> }));
>
> I'll check with latest LLVM to see if the issue persists there.
>
The problem occurs with latest bpf-next + latest LLVM too. Perf event
programs fix up context accesses to the "struct bpf_perf_event_data *"
context, so accessing ctx->regs in your program becomes accessing the
"struct bpf_perf_event_data_kern *" regs, which is a pointer to
struct pt_regs. So I _think_ that's why the
__builtin_preserve_access_index(({
p_core = ctx->regs.ip;
}));
...works; ctx->regs is fixed up to point at the right place, then
CO-RE does its thing with the results. Contrast this with
bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip);
In the latter case, the fixups don't seem to happen and we get a
bogus address which appears to be consistently 218 bytes after the ctx
pointer. I've confirmed that a basic bpf_probe_read_kernel()
exposes the issue (and gives the same wrong address as a CO-RE-wrapped
bpf_probe_read_kernel()).
I tried some permutations like defining
struct pt_regs *regs = &ctx->regs;
...to see if that helps, but I think in that case the accesses aren't
caught by the verifier because we use the & operator on the ctx->regs.
Not sure how smart the verifier can be about context accesses like this;
can someone who understands that code better than me take a look at this?
In the meantime the workaround described above should do the trick.
Thanks!
Alan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-25 14:04 ` Alan Maguire
@ 2023-07-26 0:03 ` Eduard Zingerman
2023-07-26 13:46 ` Alan Maguire
0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2023-07-26 0:03 UTC (permalink / raw)
To: Alan Maguire, Timofei Pushkin; +Cc: bpf
On Tue, 2023-07-25 at 15:04 +0100, Alan Maguire wrote:
> On 25/07/2023 00:00, Alan Maguire wrote:
> > On 24/07/2023 16:04, Timofei Pushkin wrote:
> > > On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > >
> > > > On 24/07/2023 11:32, Timofei Pushkin wrote:
> > > > > Dear BPF community,
> > > > >
> > > > > I'm developing a perf_event BPF program which reads some register
> > > > > values (frame and instruction pointers in particular) from the context
> > > > > provided to it. I found that CO-RE-enabled PT_REGS macros give results
> > > > > different from the results of the usual PT_REGS macros. I run the
> > > > > program on the same system I compiled it on, and so I cannot
> > > > > understand why the results differ and which ones should I use?
> > > > >
> > > > > From my tests, the results of the usual macros are the correct ones
> > > > > (e.g. I can symbolize the instruction pointers I get this way), but
> > > > > since I try to follow the CO-RE principle, it seems like I should be
> > > > > using the CO-RE-enabled variants instead.
> > > > >
> > > > > I did some experiments and found out that it is the
> > > > > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
> > > > > change the results and not __builtin_preserve_access_index. But I
> > > > > still don't get why exactly it changes the results.
> > > > >
> > > >
> > > > Can you provide the exact usage of the BPF CO-RE macros that isn't
> > > > working, and the equivalent non-CO-RE version that is? Also if you
> > >
> > > As a minimal example, I wrote the following little BPF program which
> > > prints instruction pointers obtained with non-CO-RE and CO-RE macros:
> > >
> > > volatile const pid_t target_pid;
> > >
> > > SEC("perf_event")
> > > int do_test(struct bpf_perf_event_data *ctx) {
> > > pid_t pid = bpf_get_current_pid_tgid();
> > > if (pid != target_pid) return 0;
> > >
> > > unsigned long p = PT_REGS_IP(&ctx->regs);
> > > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
> > > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
> > >
> > > return 0;
> > > }
> > >
> > > From user space, I set the target PID and attach the program to CPU
> > > clock perf events (error checking and cleanup omitted for brevity):
> > >
> > > int main(int argc, char *argv[]) {
> > > // Load the program also setting the target PID
> > > struct test_program_bpf *skel = test_program_bpf__open();
> > > skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
> > > test_program_bpf__load(skel);
> > >
> > > // Attach to perf events
> > > struct perf_event_attr attr = {
> > > .type = PERF_TYPE_SOFTWARE,
> > > .size = sizeof(struct perf_event_attr),
> > > .config = PERF_COUNT_SW_CPU_CLOCK,
> > > .sample_freq = 1,
> > > .freq = true
> > > };
> > > for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
> > > int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
> > > bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
> > > }
> > >
> > > // Wait for Ctrl-C
> > > pause();
> > > return 0;
> > > }
> > >
> > > As an experiment, I launched a simple C program with an endless loop
> > > in main and started the BPF program above with its target PID set to
> > > the PID of this simple C program. Then by checking the virtual memory
> > > mapped for the C program (with "cat /proc/<PID>/maps"), I found out
> > > that its .text section got mapped into 55ca2577b000-55ca2577c000
> > > address space. When I checked the output of the BPF program, I got
> > > "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
> > > the non-CO-RE result maps into the .text section of the launched C
> > > program (as it should since this is the value of the instruction
> > > pointer), while the CO-RE result does not.
> > >
> > > Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
> > > equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
> > > get results that correspond to the stack address space from the
> > > non-CO-RE macro, but I always get 0 from the CO-RE macro.
> > >
> > > > can provide details on the platform you're running on that will
> > > > help narrow down the issue. Thanks!
> > >
> > > Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
> > > the architecture is x86_64, clang 14.0.0 is used to compile BPF
> > > programs with flags -g -O2 -D__TARGET_ARCH_x86.
> > >
> >
> > Thanks for the additional details! I've reproduced this on
> > bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE
> > macros, and with BPF_CORE_READ(). However with extra libbpf debugging
> > I do see that we pick up the right type id/index for the ip field in
> > pt_regs:
> >
> > libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216]
> > struct pt_regs.ip (0:16 @ offset 128)
> >
> > One thing I noticed - perhaps this will ring some bells for someone -
> > if I use __builtin_preserve_access_index() I get the same (correct)
> > value for ip as is retrieved with PT_REGS_IP():
> >
> > __builtin_preserve_access_index(({
> > p_core = ctx->regs.ip;
> > }));
> >
> > I'll check with latest LLVM to see if the issue persists there.
> >
>
>
> The problem occurs with latest bpf-next + latest LLVM too. Perf event
> programs fix up context accesses to the "struct bpf_perf_event_data *"
> context, so accessing ctx->regs in your program becomes accessing the
> "struct bpf_perf_event_data_kern *" regs, which is a pointer to
> struct pt_regs. So I _think_ that's why the
>
> __builtin_preserve_access_index(({
> p_core = ctx->regs.ip;
> }));
>
>
> ...works; ctx->regs is fixed up to point at the right place, then
> CO-RE does its thing with the results. Contrast this with
>
> bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip);
>
> In the latter case, the fixups don't seem to happen and we get a
> bogus address which appears to be consistently 218 bytes after the ctx
> pointer. I've confirmed that a basic bpf_probe_read_kernel()
> exposes the issue (and gives the same wrong address as a CO-RE-wrapped
> bpf_probe_read_kernel()).
>
> I tried some permutations like defining
>
> struct pt_regs *regs = &ctx->regs;
>
> ...to see if that helps, but I think in that case the accesses aren't
> caught by the verifier because we use the & operator on the ctx->regs.
>
> Not sure how smart the verifier can be about context accesses like this;
> can someone who understands that code better than me take a look at this?
Hi Alan,
Your analysis is correct: verifier applies rewrites to instructions
that read/write from/to certain context fields, including
`struct bpf_perf_event_data`.
This is done by function verifier.c:convert_ctx_accesses().
This function handles BPF_LDX, BPF_STX and BPF_ST instructions, but it
does not handle calls to helpers like bpf_probe_read_kernel().
So, when code generated for PT_REGS_IP(&ctx->regs) is processed, the
correct access sequence is inserted by function
bpf_trace.c:pe_prog_convert_ctx_access() (see below).
But code generated for `PT_REGS_IP_CORE(&ctx->regs)` is not modified.
It looks like `PT_REGS_IP_CORE` macro should not be defined through
bpf_probe_read_kernel(). I'll dig through commit history tomorrow to
understand why is it defined like that now.
Thanks,
Eduard
---
Below is annotated example, inpatient reader might skip it
For the following test program:
#include "vmlinux.h"
...
SEC("perf_event")
int do_test(struct bpf_perf_event_data *ctx) {
unsigned long p = PT_REGS_IP(&ctx->regs);
unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
return 0;
}
Generated BPF code looks as follows:
$ llvm-objdump --no-show-raw-insn -rd bpf.linked.o
...
0000000000000000 <do_test>:
# Third argument for bpf_probe_read_kernel: offset of bpf_perf_event_data::regs.ip
0: r2 = 0x80
0000000000000000: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16)
1: r3 = r1
2: r3 += r2
# The "non CO-RE" version of PT_REGS_IP is, in fact, CO-RE
# because `struct bpf_perf_event_data` has preserve_access_index
# tag in the vmlinux.h.
# Here the regs.ip is stored in r6 to be used after the call
# to bpf_probe_read_kernel() (from PT_REGS_IP_CORE).
3: r6 = *(u64 *)(r1 + 0x80)
0000000000000018: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16)
# First argument for bpf_probe_read_kernel: a place on stack to put read result to.
4: r1 = r10
5: r1 += -0x8
# Second argument for bpf_probe_read_kernel: the size of the field to read.
6: w2 = 0x8
# Call to bpf_probe_read_kernel()
7: call 0x71
# Fourth parameter of bpf_printk: p_core read from stack
# (was written by call to bpf_probe_read_kernel)
8: r4 = *(u64 *)(r10 - 0x8)
# First parameter of bpf_printk: control string
9: r1 = 0x0 ll
0000000000000048: R_BPF_64_64 .rodata
# Second parameter of bpf_printk: size of the control string
11: w2 = 0x1b
# Third parameter of bpf_printk: p (see addr 3)
12: r3 = r6
# Call to bpf_printk
13: call 0x6
; return 0;
14: w0 = 0x0
15: exit
I get the following BPF after all verifier rewrites are applied
(including verifier.c:convert_ctx_accesses()):
# ./tools/bpf/bpftool/bpftool prog dump xlated id 114
int do_test(struct bpf_perf_event_data * ctx):
; int do_test(struct bpf_perf_event_data *ctx) {
0: (b7) r2 = 128 | CO-RE replacement, 128 is a valid offset for
| bpf_perf_event_data::regs.ip in my kernel
1: (bf) r3 = r1
2: (0f) r3 += r2
3: (79) r6 = *(u64 *)(r1 +0) | This is an expantion of the
4: (79) r6 = *(u64 *)(r6 +128) | r6 = *(u64 *)(r1 + 0x80)
5: (bf) r1 = r10 | Created by bpf_trace.c:pe_prog_convert_ctx_access()
6: (07) r1 += -8
7: (b4) w2 = 8
8: (85) call bpf_probe_read_kernel#-91984
9: (79) r4 = *(u64 *)(r10 -8)
10: (18) r1 = map[id:59][0]+0
12: (b4) w2 = 27
13: (bf) r3 = r6
14: (85) call bpf_trace_printk#-85520
15: (b4) w0 = 0
16: (95) exit
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-26 0:03 ` Eduard Zingerman
@ 2023-07-26 13:46 ` Alan Maguire
2023-07-26 20:03 ` Eduard Zingerman
0 siblings, 1 reply; 11+ messages in thread
From: Alan Maguire @ 2023-07-26 13:46 UTC (permalink / raw)
To: Eduard Zingerman, Timofei Pushkin; +Cc: bpf
On 26/07/2023 01:03, Eduard Zingerman wrote:
> On Tue, 2023-07-25 at 15:04 +0100, Alan Maguire wrote:
>> On 25/07/2023 00:00, Alan Maguire wrote:
>>> On 24/07/2023 16:04, Timofei Pushkin wrote:
>>>> On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>>
>>>>> On 24/07/2023 11:32, Timofei Pushkin wrote:
>>>>>> Dear BPF community,
>>>>>>
>>>>>> I'm developing a perf_event BPF program which reads some register
>>>>>> values (frame and instruction pointers in particular) from the context
>>>>>> provided to it. I found that CO-RE-enabled PT_REGS macros give results
>>>>>> different from the results of the usual PT_REGS macros. I run the
>>>>>> program on the same system I compiled it on, and so I cannot
>>>>>> understand why the results differ and which ones should I use?
>>>>>>
>>>>>> From my tests, the results of the usual macros are the correct ones
>>>>>> (e.g. I can symbolize the instruction pointers I get this way), but
>>>>>> since I try to follow the CO-RE principle, it seems like I should be
>>>>>> using the CO-RE-enabled variants instead.
>>>>>>
>>>>>> I did some experiments and found out that it is the
>>>>>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
>>>>>> change the results and not __builtin_preserve_access_index. But I
>>>>>> still don't get why exactly it changes the results.
>>>>>>
>>>>>
>>>>> Can you provide the exact usage of the BPF CO-RE macros that isn't
>>>>> working, and the equivalent non-CO-RE version that is? Also if you
>>>>
>>>> As a minimal example, I wrote the following little BPF program which
>>>> prints instruction pointers obtained with non-CO-RE and CO-RE macros:
>>>>
>>>> volatile const pid_t target_pid;
>>>>
>>>> SEC("perf_event")
>>>> int do_test(struct bpf_perf_event_data *ctx) {
>>>> pid_t pid = bpf_get_current_pid_tgid();
>>>> if (pid != target_pid) return 0;
>>>>
>>>> unsigned long p = PT_REGS_IP(&ctx->regs);
>>>> unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
>>>> bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> From user space, I set the target PID and attach the program to CPU
>>>> clock perf events (error checking and cleanup omitted for brevity):
>>>>
>>>> int main(int argc, char *argv[]) {
>>>> // Load the program also setting the target PID
>>>> struct test_program_bpf *skel = test_program_bpf__open();
>>>> skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
>>>> test_program_bpf__load(skel);
>>>>
>>>> // Attach to perf events
>>>> struct perf_event_attr attr = {
>>>> .type = PERF_TYPE_SOFTWARE,
>>>> .size = sizeof(struct perf_event_attr),
>>>> .config = PERF_COUNT_SW_CPU_CLOCK,
>>>> .sample_freq = 1,
>>>> .freq = true
>>>> };
>>>> for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
>>>> int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
>>>> bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
>>>> }
>>>>
>>>> // Wait for Ctrl-C
>>>> pause();
>>>> return 0;
>>>> }
>>>>
>>>> As an experiment, I launched a simple C program with an endless loop
>>>> in main and started the BPF program above with its target PID set to
>>>> the PID of this simple C program. Then by checking the virtual memory
>>>> mapped for the C program (with "cat /proc/<PID>/maps"), I found out
>>>> that its .text section got mapped into 55ca2577b000-55ca2577c000
>>>> address space. When I checked the output of the BPF program, I got
>>>> "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
>>>> the non-CO-RE result maps into the .text section of the launched C
>>>> program (as it should since this is the value of the instruction
>>>> pointer), while the CO-RE result does not.
>>>>
>>>> Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
>>>> equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
>>>> get results that correspond to the stack address space from the
>>>> non-CO-RE macro, but I always get 0 from the CO-RE macro.
>>>>
>>>>> can provide details on the platform you're running on that will
>>>>> help narrow down the issue. Thanks!
>>>>
>>>> Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
>>>> the architecture is x86_64, clang 14.0.0 is used to compile BPF
>>>> programs with flags -g -O2 -D__TARGET_ARCH_x86.
>>>>
>>>
>>> Thanks for the additional details! I've reproduced this on
>>> bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE
>>> macros, and with BPF_CORE_READ(). However with extra libbpf debugging
>>> I do see that we pick up the right type id/index for the ip field in
>>> pt_regs:
>>>
>>> libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216]
>>> struct pt_regs.ip (0:16 @ offset 128)
>>>
>>> One thing I noticed - perhaps this will ring some bells for someone -
>>> if I use __builtin_preserve_access_index() I get the same (correct)
>>> value for ip as is retrieved with PT_REGS_IP():
>>>
>>> __builtin_preserve_access_index(({
>>> p_core = ctx->regs.ip;
>>> }));
>>>
>>> I'll check with latest LLVM to see if the issue persists there.
>>>
>>
>>
>> The problem occurs with latest bpf-next + latest LLVM too. Perf event
>> programs fix up context accesses to the "struct bpf_perf_event_data *"
>> context, so accessing ctx->regs in your program becomes accessing the
>> "struct bpf_perf_event_data_kern *" regs, which is a pointer to
>> struct pt_regs. So I _think_ that's why the
>>
>> __builtin_preserve_access_index(({
>> p_core = ctx->regs.ip;
>> }));
>>
>>
>> ...works; ctx->regs is fixed up to point at the right place, then
>> CO-RE does its thing with the results. Contrast this with
>>
>> bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip);
>>
>> In the latter case, the fixups don't seem to happen and we get a
>> bogus address which appears to be consistently 218 bytes after the ctx
>> pointer. I've confirmed that a basic bpf_probe_read_kernel()
>> exposes the issue (and gives the same wrong address as a CO-RE-wrapped
>> bpf_probe_read_kernel()).
>>
>> I tried some permutations like defining
>>
>> struct pt_regs *regs = &ctx->regs;
>>
>> ...to see if that helps, but I think in that case the accesses aren't
>> caught by the verifier because we use the & operator on the ctx->regs.
>>
>> Not sure how smart the verifier can be about context accesses like this;
>> can someone who understands that code better than me take a look at this?
>
> Hi Alan,
>
> Your analysis is correct: verifier applies rewrites to instructions
> that read/write from/to certain context fields, including
> `struct bpf_perf_event_data`.
>
> This is done by function verifier.c:convert_ctx_accesses().
> This function handles BPF_LDX, BPF_STX and BPF_ST instructions, but it
> does not handle calls to helpers like bpf_probe_read_kernel().
>
> So, when code generated for PT_REGS_IP(&ctx->regs) is processed, the
> correct access sequence is inserted by function
> bpf_trace.c:pe_prog_convert_ctx_access() (see below).
>
> But code generated for `PT_REGS_IP_CORE(&ctx->regs)` is not modified.
>
Ah, makes sense. Would you consider it a bug that helper parameters
don't get context conversions applied, or are there additional
complexities here that mean that's not doable? (I'm wondering if
we should fix versus document this?). I would have thought the
only difference is the destination register, but the verifier is
a mysterious land to me..
> It looks like `PT_REGS_IP_CORE` macro should not be defined through
> bpf_probe_read_kernel(). I'll dig through commit history tomorrow to
> understand why is it defined like that now.
> help
If I recall the rationale was to allow the macros to work for both
BPF programs that can do direct dereference (fentry, fexit, tp_btf etc)
and for kprobe-style that need to use bpf_probe_read_kernel().
Not sure if it would be worth having variants that are purely
dereference-based, since we can just use PT_REGS_IP() due to
the __builtin_preserve_access_index attributes applied in vmlinux.h.
Thanks!
Alan
> Thanks,
> Eduard
>
> ---
> Below is annotated example, inpatient reader might skip it
>
> For the following test program:
>
> #include "vmlinux.h"
> ...
> SEC("perf_event")
> int do_test(struct bpf_perf_event_data *ctx) {
> unsigned long p = PT_REGS_IP(&ctx->regs);
> unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
> bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
> return 0;
> }
>
> Generated BPF code looks as follows:
>
> $ llvm-objdump --no-show-raw-insn -rd bpf.linked.o
> ...
> 0000000000000000 <do_test>:
> # Third argument for bpf_probe_read_kernel: offset of bpf_perf_event_data::regs.ip
> 0: r2 = 0x80
> 0000000000000000: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16)
> 1: r3 = r1
> 2: r3 += r2
> # The "non CO-RE" version of PT_REGS_IP is, in fact, CO-RE
> # because `struct bpf_perf_event_data` has preserve_access_index
> # tag in the vmlinux.h.
> # Here the regs.ip is stored in r6 to be used after the call
> # to bpf_probe_read_kernel() (from PT_REGS_IP_CORE).
> 3: r6 = *(u64 *)(r1 + 0x80)
> 0000000000000018: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16)
> # First argument for bpf_probe_read_kernel: a place on stack to put read result to.
> 4: r1 = r10
> 5: r1 += -0x8
> # Second argument for bpf_probe_read_kernel: the size of the field to read.
> 6: w2 = 0x8
> # Call to bpf_probe_read_kernel()
> 7: call 0x71
> # Fourth parameter of bpf_printk: p_core read from stack
> # (was written by call to bpf_probe_read_kernel)
> 8: r4 = *(u64 *)(r10 - 0x8)
> # First parameter of bpf_printk: control string
> 9: r1 = 0x0 ll
> 0000000000000048: R_BPF_64_64 .rodata
> # Second parameter of bpf_printk: size of the control string
> 11: w2 = 0x1b
> # Third parameter of bpf_printk: p (see addr 3)
> 12: r3 = r6
> # Call to bpf_printk
> 13: call 0x6
> ; return 0;
> 14: w0 = 0x0
> 15: exit
>
> I get the following BPF after all verifier rewrites are applied
> (including verifier.c:convert_ctx_accesses()):
>
> # ./tools/bpf/bpftool/bpftool prog dump xlated id 114
> int do_test(struct bpf_perf_event_data * ctx):
> ; int do_test(struct bpf_perf_event_data *ctx) {
> 0: (b7) r2 = 128 | CO-RE replacement, 128 is a valid offset for
> | bpf_perf_event_data::regs.ip in my kernel
> 1: (bf) r3 = r1
> 2: (0f) r3 += r2
>
> 3: (79) r6 = *(u64 *)(r1 +0) | This is an expantion of the
> 4: (79) r6 = *(u64 *)(r6 +128) | r6 = *(u64 *)(r1 + 0x80)
> 5: (bf) r1 = r10 | Created by bpf_trace.c:pe_prog_convert_ctx_access()
>
> 6: (07) r1 += -8
> 7: (b4) w2 = 8
> 8: (85) call bpf_probe_read_kernel#-91984
> 9: (79) r4 = *(u64 *)(r10 -8)
> 10: (18) r1 = map[id:59][0]+0
> 12: (b4) w2 = 27
> 13: (bf) r3 = r6
> 14: (85) call bpf_trace_printk#-85520
> 15: (b4) w0 = 0
> 16: (95) exit
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-26 13:46 ` Alan Maguire
@ 2023-07-26 20:03 ` Eduard Zingerman
2023-07-26 23:39 ` Eduard Zingerman
0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2023-07-26 20:03 UTC (permalink / raw)
To: Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf
On Wed, 2023-07-26 at 14:46 +0100, Alan Maguire wrote:
> On 26/07/2023 01:03, Eduard Zingerman wrote:
> > On Tue, 2023-07-25 at 15:04 +0100, Alan Maguire wrote:
> > > On 25/07/2023 00:00, Alan Maguire wrote:
> > > > On 24/07/2023 16:04, Timofei Pushkin wrote:
> > > > > On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > > > >
> > > > > > On 24/07/2023 11:32, Timofei Pushkin wrote:
> > > > > > > Dear BPF community,
> > > > > > >
> > > > > > > I'm developing a perf_event BPF program which reads some register
> > > > > > > values (frame and instruction pointers in particular) from the context
> > > > > > > provided to it. I found that CO-RE-enabled PT_REGS macros give results
> > > > > > > different from the results of the usual PT_REGS macros. I run the
> > > > > > > program on the same system I compiled it on, and so I cannot
> > > > > > > understand why the results differ and which ones should I use?
> > > > > > >
> > > > > > > From my tests, the results of the usual macros are the correct ones
> > > > > > > (e.g. I can symbolize the instruction pointers I get this way), but
> > > > > > > since I try to follow the CO-RE principle, it seems like I should be
> > > > > > > using the CO-RE-enabled variants instead.
> > > > > > >
> > > > > > > I did some experiments and found out that it is the
> > > > > > > bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
> > > > > > > change the results and not __builtin_preserve_access_index. But I
> > > > > > > still don't get why exactly it changes the results.
> > > > > > >
> > > > > >
> > > > > > Can you provide the exact usage of the BPF CO-RE macros that isn't
> > > > > > working, and the equivalent non-CO-RE version that is? Also if you
> > > > >
> > > > > As a minimal example, I wrote the following little BPF program which
> > > > > prints instruction pointers obtained with non-CO-RE and CO-RE macros:
> > > > >
> > > > > volatile const pid_t target_pid;
> > > > >
> > > > > SEC("perf_event")
> > > > > int do_test(struct bpf_perf_event_data *ctx) {
> > > > > pid_t pid = bpf_get_current_pid_tgid();
> > > > > if (pid != target_pid) return 0;
> > > > >
> > > > > unsigned long p = PT_REGS_IP(&ctx->regs);
> > > > > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
> > > > > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
> > > > >
> > > > > return 0;
> > > > > }
> > > > >
> > > > > From user space, I set the target PID and attach the program to CPU
> > > > > clock perf events (error checking and cleanup omitted for brevity):
> > > > >
> > > > > int main(int argc, char *argv[]) {
> > > > > // Load the program also setting the target PID
> > > > > struct test_program_bpf *skel = test_program_bpf__open();
> > > > > skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
> > > > > test_program_bpf__load(skel);
> > > > >
> > > > > // Attach to perf events
> > > > > struct perf_event_attr attr = {
> > > > > .type = PERF_TYPE_SOFTWARE,
> > > > > .size = sizeof(struct perf_event_attr),
> > > > > .config = PERF_COUNT_SW_CPU_CLOCK,
> > > > > .sample_freq = 1,
> > > > > .freq = true
> > > > > };
> > > > > for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
> > > > > int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
> > > > > bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
> > > > > }
> > > > >
> > > > > // Wait for Ctrl-C
> > > > > pause();
> > > > > return 0;
> > > > > }
> > > > >
> > > > > As an experiment, I launched a simple C program with an endless loop
> > > > > in main and started the BPF program above with its target PID set to
> > > > > the PID of this simple C program. Then by checking the virtual memory
> > > > > mapped for the C program (with "cat /proc/<PID>/maps"), I found out
> > > > > that its .text section got mapped into 55ca2577b000-55ca2577c000
> > > > > address space. When I checked the output of the BPF program, I got
> > > > > "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
> > > > > the non-CO-RE result maps into the .text section of the launched C
> > > > > program (as it should since this is the value of the instruction
> > > > > pointer), while the CO-RE result does not.
> > > > >
> > > > > Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
> > > > > equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
> > > > > get results that correspond to the stack address space from the
> > > > > non-CO-RE macro, but I always get 0 from the CO-RE macro.
> > > > >
> > > > > > can provide details on the platform you're running on that will
> > > > > > help narrow down the issue. Thanks!
> > > > >
> > > > > Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
> > > > > the architecture is x86_64, clang 14.0.0 is used to compile BPF
> > > > > programs with flags -g -O2 -D__TARGET_ARCH_x86.
> > > > >
> > > >
> > > > Thanks for the additional details! I've reproduced this on
> > > > bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE
> > > > macros, and with BPF_CORE_READ(). However with extra libbpf debugging
> > > > I do see that we pick up the right type id/index for the ip field in
> > > > pt_regs:
> > > >
> > > > libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216]
> > > > struct pt_regs.ip (0:16 @ offset 128)
> > > >
> > > > One thing I noticed - perhaps this will ring some bells for someone -
> > > > if I use __builtin_preserve_access_index() I get the same (correct)
> > > > value for ip as is retrieved with PT_REGS_IP():
> > > >
> > > > __builtin_preserve_access_index(({
> > > > p_core = ctx->regs.ip;
> > > > }));
> > > >
> > > > I'll check with latest LLVM to see if the issue persists there.
> > > >
> > >
> > >
> > > The problem occurs with latest bpf-next + latest LLVM too. Perf event
> > > programs fix up context accesses to the "struct bpf_perf_event_data *"
> > > context, so accessing ctx->regs in your program becomes accessing the
> > > "struct bpf_perf_event_data_kern *" regs, which is a pointer to
> > > struct pt_regs. So I _think_ that's why the
> > >
> > > __builtin_preserve_access_index(({
> > > p_core = ctx->regs.ip;
> > > }));
> > >
> > >
> > > ...works; ctx->regs is fixed up to point at the right place, then
> > > CO-RE does its thing with the results. Contrast this with
> > >
> > > bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip);
> > >
> > > In the latter case, the fixups don't seem to happen and we get a
> > > bogus address which appears to be consistently 218 bytes after the ctx
> > > pointer. I've confirmed that a basic bpf_probe_read_kernel()
> > > exposes the issue (and gives the same wrong address as a CO-RE-wrapped
> > > bpf_probe_read_kernel()).
> > >
> > > I tried some permutations like defining
> > >
> > > struct pt_regs *regs = &ctx->regs;
> > >
> > > ...to see if that helps, but I think in that case the accesses aren't
> > > caught by the verifier because we use the & operator on the ctx->regs.
> > >
> > > Not sure how smart the verifier can be about context accesses like this;
> > > can someone who understands that code better than me take a look at this?
> >
> > Hi Alan,
> >
> > Your analysis is correct: verifier applies rewrites to instructions
> > that read/write from/to certain context fields, including
> > `struct bpf_perf_event_data`.
> >
> > This is done by function verifier.c:convert_ctx_accesses().
> > This function handles BPF_LDX, BPF_STX and BPF_ST instructions, but it
> > does not handle calls to helpers like bpf_probe_read_kernel().
> >
> > So, when code generated for PT_REGS_IP(&ctx->regs) is processed, the
> > correct access sequence is inserted by function
> > bpf_trace.c:pe_prog_convert_ctx_access() (see below).
> >
> > But code generated for `PT_REGS_IP_CORE(&ctx->regs)` is not modified.
> >
>
> Ah, makes sense. Would you consider it a bug that helper parameters
> don't get context conversions applied, or are there additional
> complexities here that mean that's not doable? (I'm wondering if
> we should fix versus document this?). I would have thought the
> only difference is the destination register, but the verifier is
> a mysterious land to me..
Sorry for delay, had to read through implementation.
Verifier already reports an error when pointer to a context is passed
to helper function if context is subject to convert_ctx_accesses().
This is done in function verifier.c:check_helper_mem_access()
(see `case PTR_TO_CTX` at the end of the function).
For example, the following program is rejected:
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1);
__type(key, void *);
__type(value, int);
} map SEC(".maps");
SEC("perf_event")
int do_test(struct bpf_perf_event_data *ctx) {
void *p = bpf_map_lookup_elem(&map, &ctx->regs);
return p ? 0 : 1;
}
Here is the log:
...
0: R1=ctx(off=0,imm=0) R10=fp0
; int do_test(struct bpf_perf_event_data *ctx) {
0: (bf) r2 = r1 ; R1=ctx(off=0,imm=0) R2_w=ctx(off=0,imm=0)
1: (b7) r1 = 0 ; R1_w=0
2: (0f) r2 += r1
mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1
mark_precise: frame0: regs=r1 stack= before 1: (b7) r1 = 0
3: R1_w=0 R2_w=ctx(off=0,imm=0)
; void *p = bpf_map_lookup_elem(&map, &ctx->regs);
3: (18) r1 = 0xffff888102c22000 ; R1_w=map_ptr(off=0,ks=8,vs=4,imm=0)
5: (85) call bpf_map_lookup_elem#1
R2 type=ctx expected=fp, pkt, pkt_meta, map_key, map_value, mem, ringbuf_mem, buf, trusted_ptr_
^^^^^^
error message (a bit cryptic)
...
However, this error is checked conditionally depending on the
proto defined for the helper function. For example, here is the
proto for bpf_map_lookup_elem():
const struct bpf_func_proto bpf_map_lookup_elem_proto = {
.func = bpf_map_lookup_elem,
.gpl_only = false,
.pkt_access = true,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR, // <-- check_helper_mem_access() is done
.arg2_type = ARG_PTR_TO_MAP_KEY, // <-- check_helper_mem_access() is done
};
And here is prototype for bpf_probe_read_kernel():
const struct bpf_func_proto bpf_probe_read_kernel_proto = {
.func = bpf_probe_read_kernel,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_UNINIT_MEM, // <-- check_helper_mem_access() is done
.arg2_type = ARG_CONST_SIZE_OR_ZERO, // <-- check_helper_mem_access() is done for
// the previous argument (a bit complicated)
.arg3_type = ARG_ANYTHING, // <-- check_helper_mem_access() is *NOT* done
};
So, the way bpf_probe_read_kernel() is defined for verifier it does
not check that last argument might point to "fake" location.
I think this should be fixed, but would be interesting to hear what
people on the mailing list think.
> > It looks like `PT_REGS_IP_CORE` macro should not be defined through
> > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to
> > understand why is it defined like that now.
> > help
>
> If I recall the rationale was to allow the macros to work for both
> BPF programs that can do direct dereference (fentry, fexit, tp_btf etc)
> and for kprobe-style that need to use bpf_probe_read_kernel().
> Not sure if it would be worth having variants that are purely
> dereference-based, since we can just use PT_REGS_IP() due to
> the __builtin_preserve_access_index attributes applied in vmlinux.h.
Sorry, need a bit more time, thanks for the context.
>
> Thanks!
>
> Alan
>
> > Thanks,
> > Eduard
> >
> > ---
> > Below is annotated example, inpatient reader might skip it
> >
> > For the following test program:
> >
> > #include "vmlinux.h"
> > ...
> > SEC("perf_event")
> > int do_test(struct bpf_perf_event_data *ctx) {
> > unsigned long p = PT_REGS_IP(&ctx->regs);
> > unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
> > bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
> > return 0;
> > }
> >
> > Generated BPF code looks as follows:
> >
> > $ llvm-objdump --no-show-raw-insn -rd bpf.linked.o
> > ...
> > 0000000000000000 <do_test>:
> > # Third argument for bpf_probe_read_kernel: offset of bpf_perf_event_data::regs.ip
> > 0: r2 = 0x80
> > 0000000000000000: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16)
> > 1: r3 = r1
> > 2: r3 += r2
> > # The "non CO-RE" version of PT_REGS_IP is, in fact, CO-RE
> > # because `struct bpf_perf_event_data` has preserve_access_index
> > # tag in the vmlinux.h.
> > # Here the regs.ip is stored in r6 to be used after the call
> > # to bpf_probe_read_kernel() (from PT_REGS_IP_CORE).
> > 3: r6 = *(u64 *)(r1 + 0x80)
> > 0000000000000018: CO-RE <byte_off> [2] struct bpf_perf_event_data::regs.ip (0:0:16)
> > # First argument for bpf_probe_read_kernel: a place on stack to put read result to.
> > 4: r1 = r10
> > 5: r1 += -0x8
> > # Second argument for bpf_probe_read_kernel: the size of the field to read.
> > 6: w2 = 0x8
> > # Call to bpf_probe_read_kernel()
> > 7: call 0x71
> > # Fourth parameter of bpf_printk: p_core read from stack
> > # (was written by call to bpf_probe_read_kernel)
> > 8: r4 = *(u64 *)(r10 - 0x8)
> > # First parameter of bpf_printk: control string
> > 9: r1 = 0x0 ll
> > 0000000000000048: R_BPF_64_64 .rodata
> > # Second parameter of bpf_printk: size of the control string
> > 11: w2 = 0x1b
> > # Third parameter of bpf_printk: p (see addr 3)
> > 12: r3 = r6
> > # Call to bpf_printk
> > 13: call 0x6
> > ; return 0;
> > 14: w0 = 0x0
> > 15: exit
> >
> > I get the following BPF after all verifier rewrites are applied
> > (including verifier.c:convert_ctx_accesses()):
> >
> > # ./tools/bpf/bpftool/bpftool prog dump xlated id 114
> > int do_test(struct bpf_perf_event_data * ctx):
> > ; int do_test(struct bpf_perf_event_data *ctx) {
> > 0: (b7) r2 = 128 | CO-RE replacement, 128 is a valid offset for
> > | bpf_perf_event_data::regs.ip in my kernel
> > 1: (bf) r3 = r1
> > 2: (0f) r3 += r2
> >
> > 3: (79) r6 = *(u64 *)(r1 +0) | This is an expantion of the
> > 4: (79) r6 = *(u64 *)(r6 +128) | r6 = *(u64 *)(r1 + 0x80)
> > 5: (bf) r1 = r10 | Created by bpf_trace.c:pe_prog_convert_ctx_access()
> >
> > 6: (07) r1 += -8
> > 7: (b4) w2 = 8
> > 8: (85) call bpf_probe_read_kernel#-91984
> > 9: (79) r4 = *(u64 *)(r10 -8)
> > 10: (18) r1 = map[id:59][0]+0
> > 12: (b4) w2 = 27
> > 13: (bf) r3 = r6
> > 14: (85) call bpf_trace_printk#-85520
> > 15: (b4) w0 = 0
> > 16: (95) exit
> >
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-26 20:03 ` Eduard Zingerman
@ 2023-07-26 23:39 ` Eduard Zingerman
2023-07-28 3:03 ` Yonghong Song
0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2023-07-26 23:39 UTC (permalink / raw)
To: Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf
On Wed, 2023-07-26 at 23:03 +0300, Eduard Zingerman wrote:
[...]
> > > It looks like `PT_REGS_IP_CORE` macro should not be defined through
> > > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to
> > > understand why is it defined like that now.
> > > help
> >
> > If I recall the rationale was to allow the macros to work for both
> > BPF programs that can do direct dereference (fentry, fexit, tp_btf etc)
> > and for kprobe-style that need to use bpf_probe_read_kernel().
> > Not sure if it would be worth having variants that are purely
> > dereference-based, since we can just use PT_REGS_IP() due to
> > the __builtin_preserve_access_index attributes applied in vmlinux.h.
>
> Sorry, need a bit more time, thanks for the context.
The PT_REGS_*_CORE macros were added by Andrii Nakryiko in [1].
Stated intent there is to use those macros for raw tracepoint
programs. Such programs have `struct pt_regs` as a parameter.
Contexts of type `struct pt_regs` are *not* subject to rewrite by
convert_ctx_access(), so it is valid to use PT_REGS_*_CORE for such
programs.
However, `struct pt_regs` is also a part of `struct
bpf_perf_event_data`. Latter is used as a context parameter for
"perf_event" programs and is a subject to rewrite by
convert_ctx_access(). Thus, PT_REGS_*_CORE macros can't be used for
such programs (because these macro are implemented through
bpf_probe_read_kernel() of which convert_ctx_access() is not aware).
If `struct pt_regs` is defined with `preserve_access_index` attribute
CO-RE relocations are generated for both PT_REGS_IP_CORE and
PT_REGS_IP invocations. So, there is no real need to use *_CORE
variants in combination with `struct bpf_perf_event_data` to have all
CO-RE benefits, e.g.:
$ cat bpf.c
#include "vmlinux.h"
// ...
SEC("perf_event")
int do_test(struct bpf_perf_event_data *ctx) {
return PT_REGS_IP(&ctx->regs);
}
// ...
$ llvm-objdump --no-show-raw-insn -rd bpf.o
...
0000000000000000 <do_test>:
0: r0 = *(u64 *)(r1 + 0x80)
0000000000000000: CO-RE <byte_off> [11] struct bpf_perf_event_data::regs.ip (0:0:16)
1: exit
[1] b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros")
---
I think the following should be done:
- Timofei's code should use PT_REGS_IP and make sure that `struct
pt_regs` has preserve_access_index annotation (e.g. use vmlinux.h);
- verifier should be adjusted to report error when
bpf_probe_read_kernel() (and similar) are used to read from "fake"
contexts.
- (maybe?) update PT_REGS_*_CORE to use `__builtin_preserve_access_index`
(to allow usage with `bpf_perf_event_data` context).
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-26 23:39 ` Eduard Zingerman
@ 2023-07-28 3:03 ` Yonghong Song
2023-07-28 12:30 ` Eduard Zingerman
0 siblings, 1 reply; 11+ messages in thread
From: Yonghong Song @ 2023-07-28 3:03 UTC (permalink / raw)
To: Eduard Zingerman, Alan Maguire, Timofei Pushkin,
Alexei Starovoitov; +Cc: bpf
On 7/26/23 4:39 PM, Eduard Zingerman wrote:
> On Wed, 2023-07-26 at 23:03 +0300, Eduard Zingerman wrote:
> [...]
>>>> It looks like `PT_REGS_IP_CORE` macro should not be defined through
>>>> bpf_probe_read_kernel(). I'll dig through commit history tomorrow to
>>>> understand why is it defined like that now.
>>>> help
>>>
>>> If I recall the rationale was to allow the macros to work for both
>>> BPF programs that can do direct dereference (fentry, fexit, tp_btf etc)
>>> and for kprobe-style that need to use bpf_probe_read_kernel().
>>> Not sure if it would be worth having variants that are purely
>>> dereference-based, since we can just use PT_REGS_IP() due to
>>> the __builtin_preserve_access_index attributes applied in vmlinux.h.
>>
>> Sorry, need a bit more time, thanks for the context.
>
> The PT_REGS_*_CORE macros were added by Andrii Nakryiko in [1].
> Stated intent there is to use those macros for raw tracepoint
> programs. Such programs have `struct pt_regs` as a parameter.
> Contexts of type `struct pt_regs` are *not* subject to rewrite by
> convert_ctx_access(), so it is valid to use PT_REGS_*_CORE for such
> programs.
>
> However, `struct pt_regs` is also a part of `struct
> bpf_perf_event_data`. Latter is used as a context parameter for
> "perf_event" programs and is a subject to rewrite by
> convert_ctx_access(). Thus, PT_REGS_*_CORE macros can't be used for
> such programs (because these macro are implemented through
> bpf_probe_read_kernel() of which convert_ctx_access() is not aware).
>
> If `struct pt_regs` is defined with `preserve_access_index` attribute
> CO-RE relocations are generated for both PT_REGS_IP_CORE and
> PT_REGS_IP invocations. So, there is no real need to use *_CORE
> variants in combination with `struct bpf_perf_event_data` to have all
> CO-RE benefits, e.g.:
>
> $ cat bpf.c
> #include "vmlinux.h"
> // ...
> SEC("perf_event")
> int do_test(struct bpf_perf_event_data *ctx) {
> return PT_REGS_IP(&ctx->regs);
> }
> // ...
> $ llvm-objdump --no-show-raw-insn -rd bpf.o
> ...
> 0000000000000000 <do_test>:
> 0: r0 = *(u64 *)(r1 + 0x80)
> 0000000000000000: CO-RE <byte_off> [11] struct bpf_perf_event_data::regs.ip (0:0:16)
> 1: exit
>
> [1] b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros")
>
> ---
>
> I think the following should be done:
> - Timofei's code should use PT_REGS_IP and make sure that `struct
> pt_regs` has preserve_access_index annotation (e.g. use vmlinux.h);
> - verifier should be adjusted to report error when
> bpf_probe_read_kernel() (and similar) are used to read from "fake"
> contexts.
The func prototype of bpf_probe_read_kernel() is
BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
const void *, unsafe_ptr)
{
return bpf_probe_read_kernel_common(dst, size, unsafe_ptr);
}
Notice the argument name is 'unsafe_ptr'. So there is no checking
in verifier for this argument. Some users may take advantage of this
to initialize the 'dst' with 0 by providing an illegal address.
> - (maybe?) update PT_REGS_*_CORE to use `__builtin_preserve_access_index`
> (to allow usage with `bpf_perf_event_data` context).
>
> [...]
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Question: CO-RE-enabled PT_REGS macros give strange results
2023-07-28 3:03 ` Yonghong Song
@ 2023-07-28 12:30 ` Eduard Zingerman
0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2023-07-28 12:30 UTC (permalink / raw)
To: yonghong.song, Alan Maguire, Timofei Pushkin, Alexei Starovoitov; +Cc: bpf
On Thu, 2023-07-27 at 20:03 -0700, Yonghong Song wrote:
>
> On 7/26/23 4:39 PM, Eduard Zingerman wrote:
> > On Wed, 2023-07-26 at 23:03 +0300, Eduard Zingerman wrote:
> > [...]
> > > > > It looks like `PT_REGS_IP_CORE` macro should not be defined through
> > > > > bpf_probe_read_kernel(). I'll dig through commit history tomorrow to
> > > > > understand why is it defined like that now.
> > > > > help
> > > >
> > > > If I recall the rationale was to allow the macros to work for both
> > > > BPF programs that can do direct dereference (fentry, fexit, tp_btf etc)
> > > > and for kprobe-style that need to use bpf_probe_read_kernel().
> > > > Not sure if it would be worth having variants that are purely
> > > > dereference-based, since we can just use PT_REGS_IP() due to
> > > > the __builtin_preserve_access_index attributes applied in vmlinux.h.
> > >
> > > Sorry, need a bit more time, thanks for the context.
> >
> > The PT_REGS_*_CORE macros were added by Andrii Nakryiko in [1].
> > Stated intent there is to use those macros for raw tracepoint
> > programs. Such programs have `struct pt_regs` as a parameter.
> > Contexts of type `struct pt_regs` are *not* subject to rewrite by
> > convert_ctx_access(), so it is valid to use PT_REGS_*_CORE for such
> > programs.
> >
> > However, `struct pt_regs` is also a part of `struct
> > bpf_perf_event_data`. Latter is used as a context parameter for
> > "perf_event" programs and is a subject to rewrite by
> > convert_ctx_access(). Thus, PT_REGS_*_CORE macros can't be used for
> > such programs (because these macro are implemented through
> > bpf_probe_read_kernel() of which convert_ctx_access() is not aware).
> >
> > If `struct pt_regs` is defined with `preserve_access_index` attribute
> > CO-RE relocations are generated for both PT_REGS_IP_CORE and
> > PT_REGS_IP invocations. So, there is no real need to use *_CORE
> > variants in combination with `struct bpf_perf_event_data` to have all
> > CO-RE benefits, e.g.:
> >
> > $ cat bpf.c
> > #include "vmlinux.h"
> > // ...
> > SEC("perf_event")
> > int do_test(struct bpf_perf_event_data *ctx) {
> > return PT_REGS_IP(&ctx->regs);
> > }
> > // ...
> > $ llvm-objdump --no-show-raw-insn -rd bpf.o
> > ...
> > 0000000000000000 <do_test>:
> > 0: r0 = *(u64 *)(r1 + 0x80)
> > 0000000000000000: CO-RE <byte_off> [11] struct bpf_perf_event_data::regs.ip (0:0:16)
> > 1: exit
> >
> > [1] b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros")
> >
> > ---
> >
> > I think the following should be done:
> > - Timofei's code should use PT_REGS_IP and make sure that `struct
> > pt_regs` has preserve_access_index annotation (e.g. use vmlinux.h);
> > - verifier should be adjusted to report error when
> > bpf_probe_read_kernel() (and similar) are used to read from "fake"
> > contexts.
>
> The func prototype of bpf_probe_read_kernel() is
>
> BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
> const void *, unsafe_ptr)
> {
> return bpf_probe_read_kernel_common(dst, size, unsafe_ptr);
> }
>
> Notice the argument name is 'unsafe_ptr'. So there is no checking
> in verifier for this argument. Some users may take advantage of this
> to initialize the 'dst' with 0 by providing an illegal address.
On the one hand yes, but on the other hand the address of context
parameter like bpf_perf_event_data is a kind of fake, it does not exist.
It would be meaningful to use bpf_probe_read_kernel() for this address
only if someone knows the layout of the internal verifier structure
`bpf_perf_event_data_kern` and wants to access it.
Tbh, this appears to be a "footgun".
>
>
> > - (maybe?) update PT_REGS_*_CORE to use `__builtin_preserve_access_index`
> > (to allow usage with `bpf_perf_event_data` context).
> >
> > [...]
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-07-28 12:30 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-24 10:32 Question: CO-RE-enabled PT_REGS macros give strange results Timofei Pushkin
2023-07-24 12:36 ` Alan Maguire
2023-07-24 15:04 ` Timofei Pushkin
2023-07-24 23:00 ` Alan Maguire
2023-07-25 14:04 ` Alan Maguire
2023-07-26 0:03 ` Eduard Zingerman
2023-07-26 13:46 ` Alan Maguire
2023-07-26 20:03 ` Eduard Zingerman
2023-07-26 23:39 ` Eduard Zingerman
2023-07-28 3:03 ` Yonghong Song
2023-07-28 12:30 ` Eduard Zingerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox