[PATCH RFC bpf-next 0/3] bpf: report probe fault to BPF stderr

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC bpf-next 0/3] bpf: report probe fault to BPF stderr
@ 2025-09-27  6:12 Menglong Dong
  2025-09-27  6:12 ` [PATCH RFC bpf-next 1/3] " Menglong Dong
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Menglong Dong @ 2025-09-27  6:12 UTC (permalink / raw)
  To: ast; +Cc: bpf, linux-kernel, linux-trace-kernel, jiang.biao

For now, we can do the memory read with bpf_core_cast, which is faster
than bpf_probe_read_kernel. But the memory probe read is not aware of the
read failure, and the user can't get such failure information.

I wanted to introduce a fault_callback to the BPF program, which can be
called when the memory read fails. Then I saw the BPF stream interface,
and it's already used in the error reporting. So I implement the probe
fault base on the BPF stream.

This series adds a new function bpf_prog_report_probe_violation to report
probe fault to BPF stderr. It is used to report probe read fault and probe
write fault.

The shortcoming of this way is that we can't report the fault event if the
memory address is not a kernel address. I remember that we will check if
the address is a kernel address in the JIT compiler, and it will not
trigger the fault event if the address is not a kernel address. If we
implement the fault callback, we call the callback during the address
checking by JIT.

Menglong Dong (3):
  bpf: report probe fault to BPF stderr
  x86,bpf: use bpf_prog_report_probe_violation for x86
  selftests/bpf: add testcase for probe read fault

 arch/x86/net/bpf_jit_comp.c                   |  2 ++
 include/linux/bpf.h                           |  1 +
 kernel/trace/bpf_trace.c                      | 18 +++++++++++++++
 .../testing/selftests/bpf/prog_tests/stream.c | 22 ++++++++++++++++++-
 tools/testing/selftests/bpf/progs/stream.c    | 21 ++++++++++++++++++
 5 files changed, 63 insertions(+), 1 deletion(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-09-27  6:12 [PATCH RFC bpf-next 0/3] bpf: report probe fault to BPF stderr Menglong Dong
@ 2025-09-27  6:12 ` Menglong Dong
  2025-10-02  2:03   ` Alexei Starovoitov
  2025-09-27  6:12 ` [PATCH RFC bpf-next 2/3] x86,bpf: use bpf_prog_report_probe_violation for x86 Menglong Dong
  2025-09-27  6:12 ` [PATCH RFC bpf-next 3/3] selftests/bpf: add testcase for probe read fault Menglong Dong
  2 siblings, 1 reply; 22+ messages in thread
From: Menglong Dong @ 2025-09-27  6:12 UTC (permalink / raw)
  To: ast; +Cc: bpf, linux-kernel, linux-trace-kernel, jiang.biao

Introduce the function bpf_prog_report_probe_violation(), which is used
to report the memory probe fault to the user by the BPF stderr.

Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
---
 include/linux/bpf.h      |  1 +
 kernel/trace/bpf_trace.c | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6338e54a9b1f..a31c5ce56c32 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2902,6 +2902,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
 void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
 void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
+void bpf_prog_report_probe_violation(bool write, unsigned long fault_ip);
 
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8f23f5273bab..9bd03a9f53db 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2055,6 +2055,24 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 	module_put(mod);
 }
 
+void bpf_prog_report_probe_violation(bool write, unsigned long fault_ip)
+{
+	struct bpf_stream_stage ss;
+	struct bpf_prog *prog;
+
+	rcu_read_lock();
+	prog = bpf_prog_ksym_find(fault_ip);
+	rcu_read_unlock();
+	if (!prog)
+		return;
+
+	bpf_stream_stage(ss, prog, BPF_STDERR, ({
+		bpf_stream_printk(ss, "ERROR: Probe %s access faule, insn=0x%lx\n",
+				  write ? "WRITE" : "READ", fault_ip);
+		bpf_stream_dump_stack(ss);
+	}));
+}
+
 static __always_inline
 void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC bpf-next 2/3] x86,bpf: use bpf_prog_report_probe_violation for x86
  2025-09-27  6:12 [PATCH RFC bpf-next 0/3] bpf: report probe fault to BPF stderr Menglong Dong
  2025-09-27  6:12 ` [PATCH RFC bpf-next 1/3] " Menglong Dong
@ 2025-09-27  6:12 ` Menglong Dong
  2025-09-27  6:12 ` [PATCH RFC bpf-next 3/3] selftests/bpf: add testcase for probe read fault Menglong Dong
  2 siblings, 0 replies; 22+ messages in thread
From: Menglong Dong @ 2025-09-27  6:12 UTC (permalink / raw)
  To: ast; +Cc: bpf, linux-kernel, linux-trace-kernel, jiang.biao

Use bpf_prog_report_probe_violation() to report the memory probe fault
in ex_handler_bpf().

Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index fc13306af15f..03d4d8385f4c 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1470,6 +1470,8 @@ bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
 		off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
 		addr = *(unsigned long *)((void *)regs + arena_reg) + off;
 		bpf_prog_report_arena_violation(is_write, addr, regs->ip);
+	} else {
+		bpf_prog_report_probe_violation(is_write, regs->ip);
 	}
 
 	/* jump over faulting load and clear dest register */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC bpf-next 3/3] selftests/bpf: add testcase for probe read fault
  2025-09-27  6:12 [PATCH RFC bpf-next 0/3] bpf: report probe fault to BPF stderr Menglong Dong
  2025-09-27  6:12 ` [PATCH RFC bpf-next 1/3] " Menglong Dong
  2025-09-27  6:12 ` [PATCH RFC bpf-next 2/3] x86,bpf: use bpf_prog_report_probe_violation for x86 Menglong Dong
@ 2025-09-27  6:12 ` Menglong Dong
  2 siblings, 0 replies; 22+ messages in thread
From: Menglong Dong @ 2025-09-27  6:12 UTC (permalink / raw)
  To: ast; +Cc: bpf, linux-kernel, linux-trace-kernel, jiang.biao

Add testcase for probe read fault to stream.c.

Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
---
 .../testing/selftests/bpf/prog_tests/stream.c | 22 ++++++++++++++++++-
 tools/testing/selftests/bpf/progs/stream.c    | 21 ++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
index c3cce5c292bd..81fd258b97e0 100644
--- a/tools/testing/selftests/bpf/prog_tests/stream.c
+++ b/tools/testing/selftests/bpf/prog_tests/stream.c
@@ -72,7 +72,8 @@ static void test_address(struct bpf_program *prog, unsigned long *fault_addr_p)
 	ASSERT_OK(ret, "ret");
 	ASSERT_OK(opts.retval, "retval");
 
-	sprintf(fault_addr, "0x%lx", *fault_addr_p);
+	if (fault_addr_p)
+		sprintf(fault_addr, "0x%lx", *fault_addr_p);
 
 	ret = bpf_prog_stream_read(prog_fd, BPF_STREAM_STDERR, buf, sizeof(buf), &ropts);
 	ASSERT_GT(ret, 0, "stream read");
@@ -106,3 +107,22 @@ void test_stream_arena_fault_address(void)
 
 	stream__destroy(skel);
 }
+
+void test_stream_probe_read_fault(void)
+{
+	struct stream *skel;
+
+#if !defined(__x86_64__)
+	printf("%s:SKIP: probe fault reporting not supported\n", __func__);
+	test__skip();
+	return;
+#endif
+
+	skel = stream__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
+		return;
+
+	test_address(skel->progs.stream_probe_read_fault, NULL);
+
+	stream__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
index 4a5bd852f10c..290c40463522 100644
--- a/tools/testing/selftests/bpf/progs/stream.c
+++ b/tools/testing/selftests/bpf/progs/stream.c
@@ -7,6 +7,8 @@
 #include "bpf_experimental.h"
 #include "bpf_arena_common.h"
 
+#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
+
 struct arr_elem {
 	struct bpf_res_spin_lock lock;
 };
@@ -234,4 +236,23 @@ int stream_arena_callback_fault(void *ctx)
 	return 0;
 }
 
+SEC("syscall")
+__arch_x86_64
+__success __retval(0)
+__stderr("ERROR: Probe READ access faule, insn=0x[0-9a-fA-F]+")
+__stderr("CPU: {{[0-9]+}} UID: 0 PID: {{[0-9]+}} Comm: {{.*}}")
+__stderr("Call trace:\n"
+"{{([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
+"|[ \t]+[^\n]+\n)*}}")
+int stream_probe_read_fault(void *ctx)
+{
+	struct sk_buff *skb = bpf_core_cast((void *)0xFFFFFFFF00000000,
+					    struct sk_buff);
+
+	/* do the memory read */
+	READ_ONCE(skb->network_header);
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-09-27  6:12 ` [PATCH RFC bpf-next 1/3] " Menglong Dong
@ 2025-10-02  2:03   ` Alexei Starovoitov
  2025-10-07  6:14     ` Menglong Dong
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-10-02  2:03 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao

On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Introduce the function bpf_prog_report_probe_violation(), which is used
> to report the memory probe fault to the user by the BPF stderr.
>
> Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> ---
>  include/linux/bpf.h      |  1 +
>  kernel/trace/bpf_trace.c | 18 ++++++++++++++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6338e54a9b1f..a31c5ce56c32 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2902,6 +2902,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
>  void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>  void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>  void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
> +void bpf_prog_report_probe_violation(bool write, unsigned long fault_ip);
>
>  #else /* !CONFIG_BPF_SYSCALL */
>  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 8f23f5273bab..9bd03a9f53db 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2055,6 +2055,24 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
>         module_put(mod);
>  }
>
> +void bpf_prog_report_probe_violation(bool write, unsigned long fault_ip)
> +{
> +       struct bpf_stream_stage ss;
> +       struct bpf_prog *prog;
> +
> +       rcu_read_lock();
> +       prog = bpf_prog_ksym_find(fault_ip);
> +       rcu_read_unlock();
> +       if (!prog)
> +               return;
> +
> +       bpf_stream_stage(ss, prog, BPF_STDERR, ({
> +               bpf_stream_printk(ss, "ERROR: Probe %s access faule, insn=0x%lx\n",
> +                                 write ? "WRITE" : "READ", fault_ip);
> +               bpf_stream_dump_stack(ss);
> +       }));

Interesting idea, but the above message is not helpful.
Users cannot decipher a fault_ip within a bpf prog.
It's just a random number.

But stepping back... just faults are common in tracing.
If we start printing them we will just fill the stream to the max,
but users won't know that the message is there, since no one
expects it. arena and lock errors are rare and arena faults
were specifically requested by folks who develop progs that use arena.
This one is different. These faults have been around for a long time
and I don't recall people asking for more verbosity.
We can add them with an extra flag specified at prog load time,
but even then. Doesn't feel that useful.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-02  2:03   ` Alexei Starovoitov
@ 2025-10-07  6:14     ` Menglong Dong
  2025-10-08 14:40       ` Leon Hwang
  0 siblings, 1 reply; 22+ messages in thread
From: Menglong Dong @ 2025-10-07  6:14 UTC (permalink / raw)
  To: Menglong Dong, Alexei Starovoitov
  Cc: Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao

On 2025/10/2 10:03, Alexei Starovoitov wrote:
> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Introduce the function bpf_prog_report_probe_violation(), which is used
> > to report the memory probe fault to the user by the BPF stderr.
> >
> > Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> > ---
> >  include/linux/bpf.h      |  1 +
> >  kernel/trace/bpf_trace.c | 18 ++++++++++++++++++
> >  2 files changed, 19 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 6338e54a9b1f..a31c5ce56c32 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2902,6 +2902,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
> >  void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> >  void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
> >  void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
> > +void bpf_prog_report_probe_violation(bool write, unsigned long fault_ip);
> >
> >  #else /* !CONFIG_BPF_SYSCALL */
> >  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 8f23f5273bab..9bd03a9f53db 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2055,6 +2055,24 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
> >         module_put(mod);
> >  }
> >
> > +void bpf_prog_report_probe_violation(bool write, unsigned long fault_ip)
> > +{
> > +       struct bpf_stream_stage ss;
> > +       struct bpf_prog *prog;
> > +
> > +       rcu_read_lock();
> > +       prog = bpf_prog_ksym_find(fault_ip);
> > +       rcu_read_unlock();
> > +       if (!prog)
> > +               return;
> > +
> > +       bpf_stream_stage(ss, prog, BPF_STDERR, ({
> > +               bpf_stream_printk(ss, "ERROR: Probe %s access faule, insn=0x%lx\n",
> > +                                 write ? "WRITE" : "READ", fault_ip);
> > +               bpf_stream_dump_stack(ss);
> > +       }));
> 
> Interesting idea, but the above message is not helpful.
> Users cannot decipher a fault_ip within a bpf prog.
> It's just a random number.

Yeah, I have noticed this too. What useful is the
bpf_stream_dump_stack(), which will print the code
line that trigger the fault.

> But stepping back... just faults are common in tracing.
> If we start printing them we will just fill the stream to the max,
> but users won't know that the message is there, since no one

You are right, we definitely can't output this message
to STDERR directly. We can add an extra flag for it, as you
said below.

Or, maybe we can introduce a enum stream_type, and
the users can subscribe what kind of messages they
want to receive.

> expects it. arena and lock errors are rare and arena faults
> were specifically requested by folks who develop progs that use arena.
> This one is different. These faults have been around for a long time
> and I don't recall people asking for more verbosity.
> We can add them with an extra flag specified at prog load time,
> but even then. Doesn't feel that useful.

Generally speaking, users can do invalid checking before
they do the memory reading, such as NULL checking. And
the pointer in function arguments that we hook is initialized
in most case. So the fault is someting that can be prevented.

I have a BPF tools which is writed for 4.X kernel and kprobe
based BPF is used. Now I'm planing to migrate it to 6.X kernel
and replace bpf_probe_read_kernel() with bpf_core_cast() to
obtain better performance. Then I find that I can't check if the
memory reading is success, which can lead to potential risk.
So my tool will be happy to get such fault event :)

Leon suggested to add a global errno for each BPF programs,
and I haven't dig deeply on this idea yet.

Thanks!
Menglong Dong

> 
> 





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-07  6:14     ` Menglong Dong
@ 2025-10-08 14:40       ` Leon Hwang
  2025-10-08 16:27         ` bpf_errno. Was: " Alexei Starovoitov
  0 siblings, 1 reply; 22+ messages in thread
From: Leon Hwang @ 2025-10-08 14:40 UTC (permalink / raw)
  To: Menglong Dong, Menglong Dong, Alexei Starovoitov
  Cc: Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao



On 2025/10/7 14:14, Menglong Dong wrote:
> On 2025/10/2 10:03, Alexei Starovoitov wrote:
>> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>>>
>>> Introduce the function bpf_prog_report_probe_violation(), which is used
>>> to report the memory probe fault to the user by the BPF stderr.
>>>
>>> Signed-off-by: Menglong Dong <menglong.dong@linux.dev>

[...]

>>
>> Interesting idea, but the above message is not helpful.
>> Users cannot decipher a fault_ip within a bpf prog.
>> It's just a random number.
>
> Yeah, I have noticed this too. What useful is the
> bpf_stream_dump_stack(), which will print the code
> line that trigger the fault.
>
>> But stepping back... just faults are common in tracing.
>> If we start printing them we will just fill the stream to the max,
>> but users won't know that the message is there, since no one
>
> You are right, we definitely can't output this message
> to STDERR directly. We can add an extra flag for it, as you
> said below.
>
> Or, maybe we can introduce a enum stream_type, and
> the users can subscribe what kind of messages they
> want to receive.
>
>> expects it. arena and lock errors are rare and arena faults
>> were specifically requested by folks who develop progs that use arena.
>> This one is different. These faults have been around for a long time
>> and I don't recall people asking for more verbosity.
>> We can add them with an extra flag specified at prog load time,
>> but even then. Doesn't feel that useful.
>
> Generally speaking, users can do invalid checking before
> they do the memory reading, such as NULL checking. And
> the pointer in function arguments that we hook is initialized
> in most case. So the fault is someting that can be prevented.
>
> I have a BPF tools which is writed for 4.X kernel and kprobe
> based BPF is used. Now I'm planing to migrate it to 6.X kernel
> and replace bpf_probe_read_kernel() with bpf_core_cast() to
> obtain better performance. Then I find that I can't check if the
> memory reading is success, which can lead to potential risk.
> So my tool will be happy to get such fault event :)
>
> Leon suggested to add a global errno for each BPF programs,
> and I haven't dig deeply on this idea yet.
>

Yeah, as we discussed, a global errno would be a much more lightweight
approach for handling such faults.

The idea would look like this:

DEFINE_PER_CPU(int, bpf_errno);

__bpf_kfunc void bpf_errno_clear(void);
__bpf_kfunc void bpf_errno_set(int errno);
__bpf_kfunc int bpf_errno_get(void);

When a fault occurs, the kernel can simply call
'bpf_errno_set(-EFAULT);'.

If users want to detect whether a fault happened, they can do:

bpf_errno_clear();
header = READ_ONCE(skb->network_header);
if (header == 0 && bpf_errno_get() == -EFAULT)
        /* handle fault */;

This way, users can identify faults immediately and handle them gracefully.

Furthermore, these kfuncs can be inlined by the verifier, so there would
be no runtime function call overhead.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 14:40       ` Leon Hwang
@ 2025-10-08 16:27         ` Alexei Starovoitov
  2025-10-08 17:08           ` Kumar Kartikeya Dwivedi
  2025-10-09 14:15           ` Leon Hwang
  0 siblings, 2 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2025-10-08 16:27 UTC (permalink / raw)
  To: Leon Hwang, Andrii Nakryiko, Kumar Kartikeya Dwivedi
  Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, bpf, LKML,
	linux-trace-kernel, jiang.biao

On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>
>
>
> On 2025/10/7 14:14, Menglong Dong wrote:
> > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> >> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >>>
> >>> Introduce the function bpf_prog_report_probe_violation(), which is used
> >>> to report the memory probe fault to the user by the BPF stderr.
> >>>
> >>> Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
>
> [...]
>
> >>
> >> Interesting idea, but the above message is not helpful.
> >> Users cannot decipher a fault_ip within a bpf prog.
> >> It's just a random number.
> >
> > Yeah, I have noticed this too. What useful is the
> > bpf_stream_dump_stack(), which will print the code
> > line that trigger the fault.
> >
> >> But stepping back... just faults are common in tracing.
> >> If we start printing them we will just fill the stream to the max,
> >> but users won't know that the message is there, since no one
> >
> > You are right, we definitely can't output this message
> > to STDERR directly. We can add an extra flag for it, as you
> > said below.
> >
> > Or, maybe we can introduce a enum stream_type, and
> > the users can subscribe what kind of messages they
> > want to receive.
> >
> >> expects it. arena and lock errors are rare and arena faults
> >> were specifically requested by folks who develop progs that use arena.
> >> This one is different. These faults have been around for a long time
> >> and I don't recall people asking for more verbosity.
> >> We can add them with an extra flag specified at prog load time,
> >> but even then. Doesn't feel that useful.
> >
> > Generally speaking, users can do invalid checking before
> > they do the memory reading, such as NULL checking. And
> > the pointer in function arguments that we hook is initialized
> > in most case. So the fault is someting that can be prevented.
> >
> > I have a BPF tools which is writed for 4.X kernel and kprobe
> > based BPF is used. Now I'm planing to migrate it to 6.X kernel
> > and replace bpf_probe_read_kernel() with bpf_core_cast() to
> > obtain better performance. Then I find that I can't check if the
> > memory reading is success, which can lead to potential risk.
> > So my tool will be happy to get such fault event :)
> >
> > Leon suggested to add a global errno for each BPF programs,
> > and I haven't dig deeply on this idea yet.
> >
>
> Yeah, as we discussed, a global errno would be a much more lightweight
> approach for handling such faults.
>
> The idea would look like this:
>
> DEFINE_PER_CPU(int, bpf_errno);
>
> __bpf_kfunc void bpf_errno_clear(void);
> __bpf_kfunc void bpf_errno_set(int errno);
> __bpf_kfunc int bpf_errno_get(void);
>
> When a fault occurs, the kernel can simply call
> 'bpf_errno_set(-EFAULT);'.
>
> If users want to detect whether a fault happened, they can do:
>
> bpf_errno_clear();
> header = READ_ONCE(skb->network_header);
> if (header == 0 && bpf_errno_get() == -EFAULT)
>         /* handle fault */;
>
> This way, users can identify faults immediately and handle them gracefully.
>
> Furthermore, these kfuncs can be inlined by the verifier, so there would
> be no runtime function call overhead.

Interesting idea, but errno as-is doesn't quite fit,
since we only have 2 (or 3 ?) cases without explicit error return:
probe_read_kernel above, arena read, arena write.
I guess we can add may_goto to this set as well.
But in all these cases we'll struggle to find an appropriate errno code,
so it probably should be a custom enum and not called "errno".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 16:27         ` bpf_errno. Was: " Alexei Starovoitov
@ 2025-10-08 17:08           ` Kumar Kartikeya Dwivedi
  2025-10-08 19:34             ` Eduard Zingerman
  2025-10-09 14:15           ` Leon Hwang
  1 sibling, 1 reply; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-10-08 17:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Leon Hwang, Andrii Nakryiko, Menglong Dong, Menglong Dong,
	Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao

On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
> >
> >
> >
> > On 2025/10/7 14:14, Menglong Dong wrote:
> > > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> > >> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >>>
> > >>> Introduce the function bpf_prog_report_probe_violation(), which is used
> > >>> to report the memory probe fault to the user by the BPF stderr.
> > >>>
> > >>> Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> >
> > [...]
> >
> > >>
> > >> Interesting idea, but the above message is not helpful.
> > >> Users cannot decipher a fault_ip within a bpf prog.
> > >> It's just a random number.
> > >
> > > Yeah, I have noticed this too. What useful is the
> > > bpf_stream_dump_stack(), which will print the code
> > > line that trigger the fault.
> > >
> > >> But stepping back... just faults are common in tracing.
> > >> If we start printing them we will just fill the stream to the max,
> > >> but users won't know that the message is there, since no one
> > >
> > > You are right, we definitely can't output this message
> > > to STDERR directly. We can add an extra flag for it, as you
> > > said below.
> > >
> > > Or, maybe we can introduce a enum stream_type, and
> > > the users can subscribe what kind of messages they
> > > want to receive.
> > >
> > >> expects it. arena and lock errors are rare and arena faults
> > >> were specifically requested by folks who develop progs that use arena.
> > >> This one is different. These faults have been around for a long time
> > >> and I don't recall people asking for more verbosity.
> > >> We can add them with an extra flag specified at prog load time,
> > >> but even then. Doesn't feel that useful.
> > >
> > > Generally speaking, users can do invalid checking before
> > > they do the memory reading, such as NULL checking. And
> > > the pointer in function arguments that we hook is initialized
> > > in most case. So the fault is someting that can be prevented.
> > >
> > > I have a BPF tools which is writed for 4.X kernel and kprobe
> > > based BPF is used. Now I'm planing to migrate it to 6.X kernel
> > > and replace bpf_probe_read_kernel() with bpf_core_cast() to
> > > obtain better performance. Then I find that I can't check if the
> > > memory reading is success, which can lead to potential risk.
> > > So my tool will be happy to get such fault event :)
> > >
> > > Leon suggested to add a global errno for each BPF programs,
> > > and I haven't dig deeply on this idea yet.
> > >
> >
> > Yeah, as we discussed, a global errno would be a much more lightweight
> > approach for handling such faults.
> >
> > The idea would look like this:
> >
> > DEFINE_PER_CPU(int, bpf_errno);
> >
> > __bpf_kfunc void bpf_errno_clear(void);
> > __bpf_kfunc void bpf_errno_set(int errno);
> > __bpf_kfunc int bpf_errno_get(void);
> >
> > When a fault occurs, the kernel can simply call
> > 'bpf_errno_set(-EFAULT);'.
> >
> > If users want to detect whether a fault happened, they can do:
> >
> > bpf_errno_clear();
> > header = READ_ONCE(skb->network_header);
> > if (header == 0 && bpf_errno_get() == -EFAULT)
> >         /* handle fault */;
> >
> > This way, users can identify faults immediately and handle them gracefully.
> >
> > Furthermore, these kfuncs can be inlined by the verifier, so there would
> > be no runtime function call overhead.
>
> Interesting idea, but errno as-is doesn't quite fit,
> since we only have 2 (or 3 ?) cases without explicit error return:
> probe_read_kernel above, arena read, arena write.
> I guess we can add may_goto to this set as well.
> But in all these cases we'll struggle to find an appropriate errno code,
> so it probably should be a custom enum and not called "errno".

Yeah, agreed that this would be useful, particularly in this case. I'm
wondering how we'll end up implementing this.
Sounds like it needs to be tied to the program's invocation, so it
cannot be per-cpu per-program, since they nest. Most likely should be
backed by run_ctx, but that is unavailable in all program types. Next
best thing that comes to mind is reserving some space in the stack
frame at a known offset in each subprog that invokes this helper, and
use that to signal (by finding the program's bp and writing to the
stack), the downside being it likely becomes yet-another arch-specific
thing. Any other better ideas?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 17:08           ` Kumar Kartikeya Dwivedi
@ 2025-10-08 19:34             ` Eduard Zingerman
  2025-10-08 20:08               ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 22+ messages in thread
From: Eduard Zingerman @ 2025-10-08 19:34 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Alexei Starovoitov
  Cc: Leon Hwang, Andrii Nakryiko, Menglong Dong, Menglong Dong,
	Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao

On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
> > > 
> > > 
> > > 
> > > On 2025/10/7 14:14, Menglong Dong wrote:
> > > > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> > > > > On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > 
> > > > > > Introduce the function bpf_prog_report_probe_violation(), which is used
> > > > > > to report the memory probe fault to the user by the BPF stderr.
> > > > > > 
> > > > > > Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> > > 
> > > [...]
> > > 
> > > > > 
> > > > > Interesting idea, but the above message is not helpful.
> > > > > Users cannot decipher a fault_ip within a bpf prog.
> > > > > It's just a random number.
> > > > 
> > > > Yeah, I have noticed this too. What useful is the
> > > > bpf_stream_dump_stack(), which will print the code
> > > > line that trigger the fault.
> > > > 
> > > > > But stepping back... just faults are common in tracing.
> > > > > If we start printing them we will just fill the stream to the max,
> > > > > but users won't know that the message is there, since no one
> > > > 
> > > > You are right, we definitely can't output this message
> > > > to STDERR directly. We can add an extra flag for it, as you
> > > > said below.
> > > > 
> > > > Or, maybe we can introduce a enum stream_type, and
> > > > the users can subscribe what kind of messages they
> > > > want to receive.
> > > > 
> > > > > expects it. arena and lock errors are rare and arena faults
> > > > > were specifically requested by folks who develop progs that use arena.
> > > > > This one is different. These faults have been around for a long time
> > > > > and I don't recall people asking for more verbosity.
> > > > > We can add them with an extra flag specified at prog load time,
> > > > > but even then. Doesn't feel that useful.
> > > > 
> > > > Generally speaking, users can do invalid checking before
> > > > they do the memory reading, such as NULL checking. And
> > > > the pointer in function arguments that we hook is initialized
> > > > in most case. So the fault is someting that can be prevented.
> > > > 
> > > > I have a BPF tools which is writed for 4.X kernel and kprobe
> > > > based BPF is used. Now I'm planing to migrate it to 6.X kernel
> > > > and replace bpf_probe_read_kernel() with bpf_core_cast() to
> > > > obtain better performance. Then I find that I can't check if the
> > > > memory reading is success, which can lead to potential risk.
> > > > So my tool will be happy to get such fault event :)
> > > > 
> > > > Leon suggested to add a global errno for each BPF programs,
> > > > and I haven't dig deeply on this idea yet.
> > > > 
> > > 
> > > Yeah, as we discussed, a global errno would be a much more lightweight
> > > approach for handling such faults.
> > > 
> > > The idea would look like this:
> > > 
> > > DEFINE_PER_CPU(int, bpf_errno);
> > > 
> > > __bpf_kfunc void bpf_errno_clear(void);
> > > __bpf_kfunc void bpf_errno_set(int errno);
> > > __bpf_kfunc int bpf_errno_get(void);
> > > 
> > > When a fault occurs, the kernel can simply call
> > > 'bpf_errno_set(-EFAULT);'.
> > > 
> > > If users want to detect whether a fault happened, they can do:
> > > 
> > > bpf_errno_clear();
> > > header = READ_ONCE(skb->network_header);
> > > if (header == 0 && bpf_errno_get() == -EFAULT)
> > >         /* handle fault */;
> > > 
> > > This way, users can identify faults immediately and handle them gracefully.
> > > 
> > > Furthermore, these kfuncs can be inlined by the verifier, so there would
> > > be no runtime function call overhead.
> > 
> > Interesting idea, but errno as-is doesn't quite fit,
> > since we only have 2 (or 3 ?) cases without explicit error return:
> > probe_read_kernel above, arena read, arena write.
> > I guess we can add may_goto to this set as well.
> > But in all these cases we'll struggle to find an appropriate errno code,
> > so it probably should be a custom enum and not called "errno".
> 
> Yeah, agreed that this would be useful, particularly in this case. I'm
> wondering how we'll end up implementing this.
> Sounds like it needs to be tied to the program's invocation, so it
> cannot be per-cpu per-program, since they nest. Most likely should be
> backed by run_ctx, but that is unavailable in all program types. Next
> best thing that comes to mind is reserving some space in the stack
> frame at a known offset in each subprog that invokes this helper, and
> use that to signal (by finding the program's bp and writing to the
> stack), the downside being it likely becomes yet-another arch-specific
> thing. Any other better ideas?

Another option is to lower probe_read to a BPF_PROBE_MEM instruction
and generate a special kind of exception handler, that would set r0 to
-EFAULT. (We don't do this already, right? Don't see anything like that
in verifier.c or x86/../bpf_jit_comp.c).

This would avoid any user-visible changes and address performance
concern. Not so convenient for a chain of dereferences a->b->c->d,
though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 19:34             ` Eduard Zingerman
@ 2025-10-08 20:08               ` Kumar Kartikeya Dwivedi
  2025-10-08 20:30                 ` Eduard Zingerman
                                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-10-08 20:08 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Leon Hwang, Andrii Nakryiko, Menglong Dong,
	Menglong Dong, Alexei Starovoitov, bpf, LKML, linux-trace-kernel,
	jiang.biao

On Wed, 8 Oct 2025 at 21:34, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On 2025/10/7 14:14, Menglong Dong wrote:
> > > > > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> > > > > > On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > >
> > > > > > > Introduce the function bpf_prog_report_probe_violation(), which is used
> > > > > > > to report the memory probe fault to the user by the BPF stderr.
> > > > > > >
> > > > > > > Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> > > >
> > > > [...]
> > > >
> > > > > >
> > > > > > Interesting idea, but the above message is not helpful.
> > > > > > Users cannot decipher a fault_ip within a bpf prog.
> > > > > > It's just a random number.
> > > > >
> > > > > Yeah, I have noticed this too. What useful is the
> > > > > bpf_stream_dump_stack(), which will print the code
> > > > > line that trigger the fault.
> > > > >
> > > > > > But stepping back... just faults are common in tracing.
> > > > > > If we start printing them we will just fill the stream to the max,
> > > > > > but users won't know that the message is there, since no one
> > > > >
> > > > > You are right, we definitely can't output this message
> > > > > to STDERR directly. We can add an extra flag for it, as you
> > > > > said below.
> > > > >
> > > > > Or, maybe we can introduce a enum stream_type, and
> > > > > the users can subscribe what kind of messages they
> > > > > want to receive.
> > > > >
> > > > > > expects it. arena and lock errors are rare and arena faults
> > > > > > were specifically requested by folks who develop progs that use arena.
> > > > > > This one is different. These faults have been around for a long time
> > > > > > and I don't recall people asking for more verbosity.
> > > > > > We can add them with an extra flag specified at prog load time,
> > > > > > but even then. Doesn't feel that useful.
> > > > >
> > > > > Generally speaking, users can do invalid checking before
> > > > > they do the memory reading, such as NULL checking. And
> > > > > the pointer in function arguments that we hook is initialized
> > > > > in most case. So the fault is someting that can be prevented.
> > > > >
> > > > > I have a BPF tools which is writed for 4.X kernel and kprobe
> > > > > based BPF is used. Now I'm planing to migrate it to 6.X kernel
> > > > > and replace bpf_probe_read_kernel() with bpf_core_cast() to
> > > > > obtain better performance. Then I find that I can't check if the
> > > > > memory reading is success, which can lead to potential risk.
> > > > > So my tool will be happy to get such fault event :)
> > > > >
> > > > > Leon suggested to add a global errno for each BPF programs,
> > > > > and I haven't dig deeply on this idea yet.
> > > > >
> > > >
> > > > Yeah, as we discussed, a global errno would be a much more lightweight
> > > > approach for handling such faults.
> > > >
> > > > The idea would look like this:
> > > >
> > > > DEFINE_PER_CPU(int, bpf_errno);
> > > >
> > > > __bpf_kfunc void bpf_errno_clear(void);
> > > > __bpf_kfunc void bpf_errno_set(int errno);
> > > > __bpf_kfunc int bpf_errno_get(void);
> > > >
> > > > When a fault occurs, the kernel can simply call
> > > > 'bpf_errno_set(-EFAULT);'.
> > > >
> > > > If users want to detect whether a fault happened, they can do:
> > > >
> > > > bpf_errno_clear();
> > > > header = READ_ONCE(skb->network_header);
> > > > if (header == 0 && bpf_errno_get() == -EFAULT)
> > > >         /* handle fault */;
> > > >
> > > > This way, users can identify faults immediately and handle them gracefully.
> > > >
> > > > Furthermore, these kfuncs can be inlined by the verifier, so there would
> > > > be no runtime function call overhead.
> > >
> > > Interesting idea, but errno as-is doesn't quite fit,
> > > since we only have 2 (or 3 ?) cases without explicit error return:
> > > probe_read_kernel above, arena read, arena write.
> > > I guess we can add may_goto to this set as well.
> > > But in all these cases we'll struggle to find an appropriate errno code,
> > > so it probably should be a custom enum and not called "errno".
> >
> > Yeah, agreed that this would be useful, particularly in this case. I'm
> > wondering how we'll end up implementing this.
> > Sounds like it needs to be tied to the program's invocation, so it
> > cannot be per-cpu per-program, since they nest. Most likely should be
> > backed by run_ctx, but that is unavailable in all program types. Next
> > best thing that comes to mind is reserving some space in the stack
> > frame at a known offset in each subprog that invokes this helper, and
> > use that to signal (by finding the program's bp and writing to the
> > stack), the downside being it likely becomes yet-another arch-specific
> > thing. Any other better ideas?
>
> Another option is to lower probe_read to a BPF_PROBE_MEM instruction
> and generate a special kind of exception handler, that would set r0 to
> -EFAULT. (We don't do this already, right? Don't see anything like that
> in verifier.c or x86/../bpf_jit_comp.c).
>
> This would avoid any user-visible changes and address performance
> concern. Not so convenient for a chain of dereferences a->b->c->d,
> though.

Since we're piling on ideas, one of the other things that I think
could be useful in general (and maybe should be done orthogonally to
bpf_errno)
is making some empty nop function and making it not traceable reliably
across arches and invoke it in the bpf exception handler.
Then if we expose prog_stream_dump_stack() as a kfunc (should be
trivial), the user can write anything to stderr that is relevant to
get more information on the fault.

It is then up to the user to decide the rate of messages for such
faults etc. and get more information if needed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 20:08               ` Kumar Kartikeya Dwivedi
@ 2025-10-08 20:30                 ` Eduard Zingerman
  2025-10-08 20:59                   ` Kumar Kartikeya Dwivedi
  2025-10-09 14:29                 ` Leon Hwang
  2025-10-10 12:05                 ` Menglong Dong
  2 siblings, 1 reply; 22+ messages in thread
From: Eduard Zingerman @ 2025-10-08 20:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Alexei Starovoitov, Leon Hwang, Andrii Nakryiko, Menglong Dong,
	Menglong Dong, Alexei Starovoitov, bpf, LKML, linux-trace-kernel,
	jiang.biao

On Wed, 2025-10-08 at 22:08 +0200, Kumar Kartikeya Dwivedi wrote:

[...]

> Since we're piling on ideas, one of the other things that I think
> could be useful in general (and maybe should be done orthogonally to
> bpf_errno)
> is making some empty nop function and making it not traceable reliably
                                                  ^^^^^^^^^^^^^
                                   You mean traceable, right?
		   So that user attaches a bpf program to it,
		  and debugs bpf programs using bpf programs?

> across arches and invoke it in the bpf exception handler.
> Then if we expose prog_stream_dump_stack() as a kfunc (should be
> trivial), the user can write anything to stderr that is relevant to
> get more information on the fault.
> 
> It is then up to the user to decide the rate of messages for such
> faults etc. and get more information if needed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 20:30                 ` Eduard Zingerman
@ 2025-10-08 20:59                   ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-10-08 20:59 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Leon Hwang, Andrii Nakryiko, Menglong Dong,
	Menglong Dong, Alexei Starovoitov, bpf, LKML, linux-trace-kernel,
	jiang.biao

On Wed, 8 Oct 2025 at 22:30, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-10-08 at 22:08 +0200, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > Since we're piling on ideas, one of the other things that I think
> > could be useful in general (and maybe should be done orthogonally to
> > bpf_errno)
> > is making some empty nop function and making it not traceable reliably
>                                                   ^^^^^^^^^^^^^
>                                    You mean traceable, right?
>                    So that user attaches a bpf program to it,
>                   and debugs bpf programs using bpf programs?

Yeah, sorry, typo.

>
> > across arches and invoke it in the bpf exception handler.
> > Then if we expose prog_stream_dump_stack() as a kfunc (should be
> > trivial), the user can write anything to stderr that is relevant to
> > get more information on the fault.
> >
> > It is then up to the user to decide the rate of messages for such
> > faults etc. and get more information if needed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 16:27         ` bpf_errno. Was: " Alexei Starovoitov
  2025-10-08 17:08           ` Kumar Kartikeya Dwivedi
@ 2025-10-09 14:15           ` Leon Hwang
  2025-10-09 14:45             ` Alexei Starovoitov
  1 sibling, 1 reply; 22+ messages in thread
From: Leon Hwang @ 2025-10-09 14:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Kumar Kartikeya Dwivedi
  Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, bpf, LKML,
	linux-trace-kernel, jiang.biao



On 2025/10/9 00:27, Alexei Starovoitov wrote:
> On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>>
>>
>> On 2025/10/7 14:14, Menglong Dong wrote:
>>> On 2025/10/2 10:03, Alexei Starovoitov wrote:
>>>> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>>>>>

[...]

>>>
>>> Leon suggested to add a global errno for each BPF programs,
>>> and I haven't dig deeply on this idea yet.
>>>
>>
>> Yeah, as we discussed, a global errno would be a much more lightweight
>> approach for handling such faults.
>>
>> The idea would look like this:
>>
>> DEFINE_PER_CPU(int, bpf_errno);
>>
>> __bpf_kfunc void bpf_errno_clear(void);
>> __bpf_kfunc void bpf_errno_set(int errno);
>> __bpf_kfunc int bpf_errno_get(void);
>>
>> When a fault occurs, the kernel can simply call
>> 'bpf_errno_set(-EFAULT);'.
>>
>> If users want to detect whether a fault happened, they can do:
>>
>> bpf_errno_clear();
>> header = READ_ONCE(skb->network_header);
>> if (header == 0 && bpf_errno_get() == -EFAULT)
>>         /* handle fault */;
>>
>> This way, users can identify faults immediately and handle them gracefully.
>>
>> Furthermore, these kfuncs can be inlined by the verifier, so there would
>> be no runtime function call overhead.
>
> Interesting idea, but errno as-is doesn't quite fit,
> since we only have 2 (or 3 ?) cases without explicit error return:
> probe_read_kernel above, arena read, arena write.
> I guess we can add may_goto to this set as well.
> But in all these cases we'll struggle to find an appropriate errno code,
> so it probably should be a custom enum and not called "errno".

To avoid introducing a global errno, here's a more lightweight approach:

1. Introduce an internal BPF_REG_AUX and a helper
   'bpf_jit_supports_reg_aux()'.
2. Introduce a kfunc 'int bpf_reg_aux(void)'.

When a fault occurs, we can set 'BPF_REG_AUX = -EFAULT;' in
'ex_handler_bpf()'.
Otherwise, 'BPF_REG_AUX = 0;'.

(Alternatively, BPF_REG_AUX can use a custom enum instead of '-EFAULT'.)

If users want to check whether a fault happened, they can do:

header = READ_ONCE(skb->network_header);
if (header == 0 && bpf_reg_aux() == -EFAULT)
        /* handle fault */;

This allows users to detect faults immediately without any extra global
state.

The verifier can rewrite 'bpf_reg_aux()' into the following instructions:

dst_reg = BPF_REG_AUX;
BPF_REG_AUX = 0; /* clear BPF_REG_AUX */

As for the architecture-specific implementation, BPF_REG_AUX can be
mapped to an appropriate register per arch — for example, r11 on x86_64.
The verifier would ensure that BPF_REG_AUX is not clobbered after a
probe read.

As a result, this avoids the need for a global errno and introduces no
runtime function call overhead.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 20:08               ` Kumar Kartikeya Dwivedi
  2025-10-08 20:30                 ` Eduard Zingerman
@ 2025-10-09 14:29                 ` Leon Hwang
  2025-10-09 15:15                   ` Leon Hwang
  2025-10-10 12:05                 ` Menglong Dong
  2 siblings, 1 reply; 22+ messages in thread
From: Leon Hwang @ 2025-10-09 14:29 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Eduard Zingerman
  Cc: Alexei Starovoitov, Andrii Nakryiko, Menglong Dong, Menglong Dong,
	Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao



On 2025/10/9 04:08, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Oct 2025 at 21:34, Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>> On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2025/10/7 14:14, Menglong Dong wrote:
>>>>>> On 2025/10/2 10:03, Alexei Starovoitov wrote:
>>>>>>> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Introduce the function bpf_prog_report_probe_violation(), which is used
>>>>>>>> to report the memory probe fault to the user by the BPF stderr.
>>>>>>>>
>>>>>>>> Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
>>>>>
>>>>> [...]
>>>>>
>>>>>>>
>>>>>>> Interesting idea, but the above message is not helpful.
>>>>>>> Users cannot decipher a fault_ip within a bpf prog.
>>>>>>> It's just a random number.
>>>>>>
>>>>>> Yeah, I have noticed this too. What useful is the
>>>>>> bpf_stream_dump_stack(), which will print the code
>>>>>> line that trigger the fault.
>>>>>>
>>>>>>> But stepping back... just faults are common in tracing.
>>>>>>> If we start printing them we will just fill the stream to the max,
>>>>>>> but users won't know that the message is there, since no one
>>>>>>
>>>>>> You are right, we definitely can't output this message
>>>>>> to STDERR directly. We can add an extra flag for it, as you
>>>>>> said below.
>>>>>>
>>>>>> Or, maybe we can introduce a enum stream_type, and
>>>>>> the users can subscribe what kind of messages they
>>>>>> want to receive.
>>>>>>
>>>>>>> expects it. arena and lock errors are rare and arena faults
>>>>>>> were specifically requested by folks who develop progs that use arena.
>>>>>>> This one is different. These faults have been around for a long time
>>>>>>> and I don't recall people asking for more verbosity.
>>>>>>> We can add them with an extra flag specified at prog load time,
>>>>>>> but even then. Doesn't feel that useful.
>>>>>>
>>>>>> Generally speaking, users can do invalid checking before
>>>>>> they do the memory reading, such as NULL checking. And
>>>>>> the pointer in function arguments that we hook is initialized
>>>>>> in most case. So the fault is someting that can be prevented.
>>>>>>
>>>>>> I have a BPF tools which is writed for 4.X kernel and kprobe
>>>>>> based BPF is used. Now I'm planing to migrate it to 6.X kernel
>>>>>> and replace bpf_probe_read_kernel() with bpf_core_cast() to
>>>>>> obtain better performance. Then I find that I can't check if the
>>>>>> memory reading is success, which can lead to potential risk.
>>>>>> So my tool will be happy to get such fault event :)
>>>>>>
>>>>>> Leon suggested to add a global errno for each BPF programs,
>>>>>> and I haven't dig deeply on this idea yet.
>>>>>>
>>>>>
>>>>> Yeah, as we discussed, a global errno would be a much more lightweight
>>>>> approach for handling such faults.
>>>>>
>>>>> The idea would look like this:
>>>>>
>>>>> DEFINE_PER_CPU(int, bpf_errno);
>>>>>
>>>>> __bpf_kfunc void bpf_errno_clear(void);
>>>>> __bpf_kfunc void bpf_errno_set(int errno);
>>>>> __bpf_kfunc int bpf_errno_get(void);
>>>>>
>>>>> When a fault occurs, the kernel can simply call
>>>>> 'bpf_errno_set(-EFAULT);'.
>>>>>
>>>>> If users want to detect whether a fault happened, they can do:
>>>>>
>>>>> bpf_errno_clear();
>>>>> header = READ_ONCE(skb->network_header);
>>>>> if (header == 0 && bpf_errno_get() == -EFAULT)
>>>>>         /* handle fault */;
>>>>>
>>>>> This way, users can identify faults immediately and handle them gracefully.
>>>>>
>>>>> Furthermore, these kfuncs can be inlined by the verifier, so there would
>>>>> be no runtime function call overhead.
>>>>
>>>> Interesting idea, but errno as-is doesn't quite fit,
>>>> since we only have 2 (or 3 ?) cases without explicit error return:
>>>> probe_read_kernel above, arena read, arena write.
>>>> I guess we can add may_goto to this set as well.
>>>> But in all these cases we'll struggle to find an appropriate errno code,
>>>> so it probably should be a custom enum and not called "errno".
>>>
>>> Yeah, agreed that this would be useful, particularly in this case. I'm
>>> wondering how we'll end up implementing this.
>>> Sounds like it needs to be tied to the program's invocation, so it
>>> cannot be per-cpu per-program, since they nest. Most likely should be
>>> backed by run_ctx, but that is unavailable in all program types. Next
>>> best thing that comes to mind is reserving some space in the stack
>>> frame at a known offset in each subprog that invokes this helper, and
>>> use that to signal (by finding the program's bp and writing to the
>>> stack), the downside being it likely becomes yet-another arch-specific
>>> thing. Any other better ideas?
>>
>> Another option is to lower probe_read to a BPF_PROBE_MEM instruction
>> and generate a special kind of exception handler, that would set r0 to
>> -EFAULT. (We don't do this already, right? Don't see anything like that
>> in verifier.c or x86/../bpf_jit_comp.c).
>>
>> This would avoid any user-visible changes and address performance
>> concern. Not so convenient for a chain of dereferences a->b->c->d,
>> though.
> 
> Since we're piling on ideas, one of the other things that I think
> could be useful in general (and maybe should be done orthogonally to
> bpf_errno)
> is making some empty nop function and making it not traceable reliably
> across arches and invoke it in the bpf exception handler.

No new traceable function is needed, since ex_handler_bpf itself can
already be traced via fentry.

If users really want to detect whether a fault occurred, they could
attach a program to ex_handler_bpf and record fault events into a map.
However, this approach would be too heavyweight just to check for a
simple fault condition.

Thanks,
Leon

> Then if we expose prog_stream_dump_stack() as a kfunc (should be
> trivial), the user can write anything to stderr that is relevant to
> get more information on the fault.
> 
> It is then up to the user to decide the rate of messages for such
> faults etc. and get more information if needed.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-09 14:15           ` Leon Hwang
@ 2025-10-09 14:45             ` Alexei Starovoitov
  2025-10-10 14:22               ` Leon Hwang
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-10-09 14:45 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Andrii Nakryiko, Kumar Kartikeya Dwivedi, Menglong Dong,
	Menglong Dong, Alexei Starovoitov, bpf, LKML, linux-trace-kernel,
	jiang.biao

On Thu, Oct 9, 2025 at 7:15 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>
>
> The verifier can rewrite 'bpf_reg_aux()' into the following instructions:
>
> dst_reg = BPF_REG_AUX;
> BPF_REG_AUX = 0; /* clear BPF_REG_AUX */
>
> As for the architecture-specific implementation, BPF_REG_AUX can be
> mapped to an appropriate register per arch — for example, r11 on x86_64.

it's taken. There are no free registers.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-09 14:29                 ` Leon Hwang
@ 2025-10-09 15:15                   ` Leon Hwang
  0 siblings, 0 replies; 22+ messages in thread
From: Leon Hwang @ 2025-10-09 15:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Eduard Zingerman
  Cc: Alexei Starovoitov, Andrii Nakryiko, Menglong Dong, Menglong Dong,
	Alexei Starovoitov, bpf, LKML, linux-trace-kernel, jiang.biao


>>
>> Since we're piling on ideas, one of the other things that I think
>> could be useful in general (and maybe should be done orthogonally to
>> bpf_errno)
>> is making some empty nop function and making it not traceable reliably
>> across arches and invoke it in the bpf exception handler.
> 
> No new traceable function is needed, since ex_handler_bpf itself can
> already be traced via fentry.
> 
> If users really want to detect whether a fault occurred, they could
> attach a program to ex_handler_bpf and record fault events into a map.
> However, this approach would be too heavyweight just to check for a
> simple fault condition.
> 

As ex_handler_bpf can already be traced using fentry, a potential
approach without modifying the kernel would be:

1. In the fentry program:

int is_fault SEC(".percpu.fault");

SEC("fentry/ex_handler_bpf")
int BPF_PROG(f__ex, const struct exception_table_entry *x, struct
pt_regs *regs)
{
    is_fault = 1;
    return 0;
}

2. In the main program:

int is_fault SEC(".percpu.fault");

is_fault = 0;
/* probe read */
if (is_fault)
    /* handle fault */;

The main idea is that both programs share the same ".percpu.fault" map,
so the variable 'is_fault' can be accessed from both sides.

Here, ".percpu.fault" represents a percpu_array map section, which is
expected to be supported in the future.
In the meantime, it can simply be replaced with a regular percpu_array map.

Finally, this approach is conceptually similar to the idea of using a
global errno.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-08 20:08               ` Kumar Kartikeya Dwivedi
  2025-10-08 20:30                 ` Eduard Zingerman
  2025-10-09 14:29                 ` Leon Hwang
@ 2025-10-10 12:05                 ` Menglong Dong
  2025-10-10 15:10                   ` Menglong Dong
  2025-10-10 18:55                   ` Eduard Zingerman
  2 siblings, 2 replies; 22+ messages in thread
From: Menglong Dong @ 2025-10-10 12:05 UTC (permalink / raw)
  To: Eduard Zingerman, Kumar Kartikeya Dwivedi, Alexei Starovoitov,
	Leon Hwang
  Cc: Andrii Nakryiko, Menglong Dong, Alexei Starovoitov, bpf, LKML,
	linux-trace-kernel, jiang.biao

On 2025/10/9 04:08, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Oct 2025 at 21:34, Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 2025/10/7 14:14, Menglong Dong wrote:
> > > > > > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> > > > > > > On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Introduce the function bpf_prog_report_probe_violation(), which is used
> > > > > > > > to report the memory probe fault to the user by the BPF stderr.
> > > > > > > >
> > > > > > > > Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> > > > >
> > > > > [...]
> > > > >
[......]
> > >
> > > Yeah, agreed that this would be useful, particularly in this case. I'm
> > > wondering how we'll end up implementing this.
> > > Sounds like it needs to be tied to the program's invocation, so it
> > > cannot be per-cpu per-program, since they nest. Most likely should be
> > > backed by run_ctx, but that is unavailable in all program types. Next
> > > best thing that comes to mind is reserving some space in the stack
> > > frame at a known offset in each subprog that invokes this helper, and
> > > use that to signal (by finding the program's bp and writing to the
> > > stack), the downside being it likely becomes yet-another arch-specific
> > > thing. Any other better ideas?
> >
> > Another option is to lower probe_read to a BPF_PROBE_MEM instruction
> > and generate a special kind of exception handler, that would set r0 to
> > -EFAULT. (We don't do this already, right? Don't see anything like that
> > in verifier.c or x86/../bpf_jit_comp.c).
> >
> > This would avoid any user-visible changes and address performance
> > concern. Not so convenient for a chain of dereferences a->b->c->d,
> > though.
> 
> Since we're piling on ideas, one of the other things that I think
> could be useful in general (and maybe should be done orthogonally to
> bpf_errno)
> is making some empty nop function and making it not traceable reliably
> across arches and invoke it in the bpf exception handler.
> Then if we expose prog_stream_dump_stack() as a kfunc (should be
> trivial), the user can write anything to stderr that is relevant to
> get more information on the fault.

Thanks for all the ideas! So we have following approaches
on this problem:

may_goto(Kumar)
--------------------------
Hmm......I haven't figure how this work on this problem yet.

"may_goto" is a condition break, does it mean that we introduce
a "condition_fault"? Will it need the supporting of the compiler?

I'm not sure if this is the right understanding: save the fault
type(PROBE_FAULT, AREA_READ_FAULT, AREA_WRITE_FAULT) to
the stack or the run_ctx, and the "if (condition_fault)" will be
replace with "if (__stack or run_ctx)".

save errno to r0(Eduard)
-----------------------------------
Save the errno to r0 in the exception handler of BPF_PROBE_MEM,
and read r0 with a __kfun in BPF program. (Not sure if I understand
it correctly).

This sounds effective, but won't this break the usage of r0? I mean,
the r0 can be used by the BPF program somewhere.

trace error event(Kumar)
------------------------------------
Call a empty and traceable function in the exception handler.

This maybe the simplest way, and I think the only shortcoming
is that there may be some noise, as the the BPF program can
receive the fault event from other BPF users.

And maybe it's better to pass the bpf prog to the arguments of
the empty function, therefore users can do some filter. Or, we
can introduce a tracepoint for this propose.

And I think this is the similar way that Leon suggested later.

bpf errno(Leon)
----------------------
introduce a percpu variable, save the -EFAULT to it in the
exception handler. Introduce the __kfunc to read, set and
clear the errno.

output the error information directly to STDERR(Menglong)
--------------------------------------------------------------------------------------
As it described.

Ah......it seems we have many approaches here, and most
of them work. Do we have any ideas on these ideas?

Thanks!
Menglong Dong

> 
> It is then up to the user to decide the rate of messages for such
> faults etc. and get more information if needed.
> 





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-09 14:45             ` Alexei Starovoitov
@ 2025-10-10 14:22               ` Leon Hwang
  0 siblings, 0 replies; 22+ messages in thread
From: Leon Hwang @ 2025-10-10 14:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Kumar Kartikeya Dwivedi, Menglong Dong,
	Menglong Dong, Alexei Starovoitov, bpf, LKML, linux-trace-kernel,
	jiang.biao

On 2025/10/9 22:45, Alexei Starovoitov wrote:
> On Thu, Oct 9, 2025 at 7:15 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>>
>> The verifier can rewrite 'bpf_reg_aux()' into the following instructions:
>>
>> dst_reg = BPF_REG_AUX;
>> BPF_REG_AUX = 0; /* clear BPF_REG_AUX */
>>
>> As for the architecture-specific implementation, BPF_REG_AUX can be
>> mapped to an appropriate register per arch — for example, r11 on x86_64.
> 
> it's taken. There are no free registers.

Understood.

It would certainly be beneficial if there were available registers on
x86_64, as that would enable certain optimizations and improvements.

In a similar direction, I have been exploring the idea of introducing a
dedicated BPF_REG_TAIL_CALL register to unify the handling of
tail_call_cnt in the verifier. This could help standardize the logic
across architectures, particularly for those that already employ a
dedicated register for tail calls, and allow JIT backends to simplify
their tail call implementations accordingly.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-10 12:05                 ` Menglong Dong
@ 2025-10-10 15:10                   ` Menglong Dong
  2025-10-10 18:55                   ` Eduard Zingerman
  1 sibling, 0 replies; 22+ messages in thread
From: Menglong Dong @ 2025-10-10 15:10 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Leon Hwang,
	Andrii Nakryiko, Menglong Dong, Alexei Starovoitov, bpf, LKML,
	linux-trace-kernel, jiang.biao

On 2025/10/10 20:05, Menglong Dong wrote:
> On 2025/10/9 04:08, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 8 Oct 2025 at 21:34, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2025/10/7 14:14, Menglong Dong wrote:
> > > > > > > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> > > > > > > > On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Introduce the function bpf_prog_report_probe_violation(), which is used
> > > > > > > > > to report the memory probe fault to the user by the BPF stderr.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Menglong Dong <menglong.dong@linux.dev>
> > > > > >
> > > > > > [...]
> > > > > >
> [......]
> > > >
> > > > Yeah, agreed that this would be useful, particularly in this case. I'm
> > > > wondering how we'll end up implementing this.
> > > > Sounds like it needs to be tied to the program's invocation, so it
> > > > cannot be per-cpu per-program, since they nest. Most likely should be
> > > > backed by run_ctx, but that is unavailable in all program types. Next
> > > > best thing that comes to mind is reserving some space in the stack
> > > > frame at a known offset in each subprog that invokes this helper, and
> > > > use that to signal (by finding the program's bp and writing to the
> > > > stack), the downside being it likely becomes yet-another arch-specific
> > > > thing. Any other better ideas?
> > >
> > > Another option is to lower probe_read to a BPF_PROBE_MEM instruction
> > > and generate a special kind of exception handler, that would set r0 to
> > > -EFAULT. (We don't do this already, right? Don't see anything like that
> > > in verifier.c or x86/../bpf_jit_comp.c).
> > >
> > > This would avoid any user-visible changes and address performance
> > > concern. Not so convenient for a chain of dereferences a->b->c->d,
> > > though.
> > 
> > Since we're piling on ideas, one of the other things that I think
> > could be useful in general (and maybe should be done orthogonally to
> > bpf_errno)
> > is making some empty nop function and making it not traceable reliably
> > across arches and invoke it in the bpf exception handler.
> > Then if we expose prog_stream_dump_stack() as a kfunc (should be
> > trivial), the user can write anything to stderr that is relevant to
> > get more information on the fault.
> 
> Thanks for all the ideas! So we have following approaches
> on this problem:
> 
> may_goto(Kumar)
> --------------------------
> Hmm......I haven't figure how this work on this problem yet.
> 
> "may_goto" is a condition break, does it mean that we introduce
> a "condition_fault"? Will it need the supporting of the compiler?
> 
> I'm not sure if this is the right understanding: save the fault
> type(PROBE_FAULT, AREA_READ_FAULT, AREA_WRITE_FAULT) to
> the stack or the run_ctx, and the "if (condition_fault)" will be
> replace with "if (__stack or run_ctx)".
> 
> save errno to r0(Eduard)
> -----------------------------------
> Save the errno to r0 in the exception handler of BPF_PROBE_MEM,
> and read r0 with a __kfun in BPF program. (Not sure if I understand
> it correctly).
> 
> This sounds effective, but won't this break the usage of r0? I mean,
> the r0 can be used by the BPF program somewhere.

I think I'm a little understand it:

int a, *b;

b = xxx;
a = *b; // insert "r0 = 0" before this insn in verifier
             // if fault happen, r0 will become -EFAULT
if (bpf_probe_fault()) // change it to if (r0) in verifier
   return;

Am I right?

Thanks!
Menglong Dong

> 
> trace error event(Kumar)
> ------------------------------------
> Call a empty and traceable function in the exception handler.
> 
> This maybe the simplest way, and I think the only shortcoming
> is that there may be some noise, as the the BPF program can
> receive the fault event from other BPF users.
> 
> And maybe it's better to pass the bpf prog to the arguments of
> the empty function, therefore users can do some filter. Or, we
> can introduce a tracepoint for this propose.
> 
> And I think this is the similar way that Leon suggested later.
> 
> bpf errno(Leon)
> ----------------------
> introduce a percpu variable, save the -EFAULT to it in the
> exception handler. Introduce the __kfunc to read, set and
> clear the errno.
> 
> output the error information directly to STDERR(Menglong)
> --------------------------------------------------------------------------------------
> As it described.
> 
> Ah......it seems we have many approaches here, and most
> of them work. Do we have any ideas on these ideas?
> 
> Thanks!
> Menglong Dong
> 
> > 
> > It is then up to the user to decide the rate of messages for such
> > faults etc. and get more information if needed.
> > 
> 
> 
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-10 12:05                 ` Menglong Dong
  2025-10-10 15:10                   ` Menglong Dong
@ 2025-10-10 18:55                   ` Eduard Zingerman
  2025-10-11  1:23                     ` Menglong Dong
  1 sibling, 1 reply; 22+ messages in thread
From: Eduard Zingerman @ 2025-10-10 18:55 UTC (permalink / raw)
  To: Menglong Dong, Kumar Kartikeya Dwivedi, Alexei Starovoitov,
	Leon Hwang
  Cc: Andrii Nakryiko, Menglong Dong, Alexei Starovoitov, bpf, LKML,
	linux-trace-kernel, jiang.biao

On Fri, 2025-10-10 at 20:05 +0800, Menglong Dong wrote:

[...]

> save errno to r0(Eduard)
> -----------------------------------
> Save the errno to r0 in the exception handler of BPF_PROBE_MEM,
> and read r0 with a __kfun in BPF program. (Not sure if I understand
> it correctly).
> 
> This sounds effective, but won't this break the usage of r0? I mean,
> the r0 can be used by the BPF program somewhere.

What I meant is that for cases when someone wants to check for memory
access error, there is already bpf_probe_read_kernel(). It's return
value in r0 and is defined for both success and failure cases.

The problem with it, is that it has a function call overhead.
But we can workaround that for 1,2,4,8 byte accesses, by replacing
helper call by some `BPF_LDX | BPF_PROBE_MEM1 | <size>`,
where BPF_PROBE_MEM1 is different from BPF_PROBE_MEM and tells
jit that exception handler for this memory access needs to set
r0 to -EFAULT if it is executed.

The inconvenient part here is that one can't do chaining,
like a->b->c, using bpf_probe_read_kernel().
One needs to insert bpf_probe_read_kernel() call at each step of a
chain, which is a bit of a pain.  Maybe it can be alleviated using
some vararg macro.

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr
  2025-10-10 18:55                   ` Eduard Zingerman
@ 2025-10-11  1:23                     ` Menglong Dong
  0 siblings, 0 replies; 22+ messages in thread
From: Menglong Dong @ 2025-10-11  1:23 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Leon Hwang,
	Andrii Nakryiko, Menglong Dong, Alexei Starovoitov, bpf, LKML,
	linux-trace-kernel, jiang.biao

On 2025/10/11 02:55, Eduard Zingerman wrote:
> On Fri, 2025-10-10 at 20:05 +0800, Menglong Dong wrote:
> 
> [...]
> 
> > save errno to r0(Eduard)
> > -----------------------------------
> > Save the errno to r0 in the exception handler of BPF_PROBE_MEM,
> > and read r0 with a __kfun in BPF program. (Not sure if I understand
> > it correctly).
> > 
> > This sounds effective, but won't this break the usage of r0? I mean,
> > the r0 can be used by the BPF program somewhere.
> 
> What I meant is that for cases when someone wants to check for memory
> access error, there is already bpf_probe_read_kernel(). It's return
> value in r0 and is defined for both success and failure cases.
> 
> The problem with it, is that it has a function call overhead.
> But we can workaround that for 1,2,4,8 byte accesses, by replacing
> helper call by some `BPF_LDX | BPF_PROBE_MEM1 | <size>`,
> where BPF_PROBE_MEM1 is different from BPF_PROBE_MEM and tells
> jit that exception handler for this memory access needs to set
> r0 to -EFAULT if it is executed.
> 
> The inconvenient part here is that one can't do chaining,
> like a->b->c, using bpf_probe_read_kernel().
> One needs to insert bpf_probe_read_kernel() call at each step of a
> chain, which is a bit of a pain.  Maybe it can be alleviated using
> some vararg macro.

Thanks for the explication, and I see now.

Interesting idea, and I think this is something that we can do
despite this problem, which can optimize the performance of
bpf_probe_read_kernel().

Thanks!
Menglong Dong

> 
> [...]
> 
> 





^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-10-11  1:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-27  6:12 [PATCH RFC bpf-next 0/3] bpf: report probe fault to BPF stderr Menglong Dong
2025-09-27  6:12 ` [PATCH RFC bpf-next 1/3] " Menglong Dong
2025-10-02  2:03   ` Alexei Starovoitov
2025-10-07  6:14     ` Menglong Dong
2025-10-08 14:40       ` Leon Hwang
2025-10-08 16:27         ` bpf_errno. Was: " Alexei Starovoitov
2025-10-08 17:08           ` Kumar Kartikeya Dwivedi
2025-10-08 19:34             ` Eduard Zingerman
2025-10-08 20:08               ` Kumar Kartikeya Dwivedi
2025-10-08 20:30                 ` Eduard Zingerman
2025-10-08 20:59                   ` Kumar Kartikeya Dwivedi
2025-10-09 14:29                 ` Leon Hwang
2025-10-09 15:15                   ` Leon Hwang
2025-10-10 12:05                 ` Menglong Dong
2025-10-10 15:10                   ` Menglong Dong
2025-10-10 18:55                   ` Eduard Zingerman
2025-10-11  1:23                     ` Menglong Dong
2025-10-09 14:15           ` Leon Hwang
2025-10-09 14:45             ` Alexei Starovoitov
2025-10-10 14:22               ` Leon Hwang
2025-09-27  6:12 ` [PATCH RFC bpf-next 2/3] x86,bpf: use bpf_prog_report_probe_violation for x86 Menglong Dong
2025-09-27  6:12 ` [PATCH RFC bpf-next 3/3] selftests/bpf: add testcase for probe read fault Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).