[PATCH bpf-next] bpf: remove is_return in struct bpf_session_run

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
@ 2025-09-22  9:57 Menglong Dong
  2025-09-22 14:08 ` Song Liu
  2025-09-23  7:52 ` kernel test robot
  0 siblings, 2 replies; 6+ messages in thread
From: Menglong Dong @ 2025-09-22  9:57 UTC (permalink / raw)
  To: jolsa
  Cc: kpsingh, mattbobrowski, song, ast, daniel, andrii, martin.lau,
	eddyz87, yonghong.song, john.fastabend, sdf, haoluo, rostedt,
	mhiramat, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
Therefore, we can store the "is_return" to the last bit of the "data",
which can make bpf_session_run_ctx 8-bytes aligned and save memory.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 kernel/trace/bpf_trace.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f2360579658e..b6a34aa039f6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2269,7 +2269,6 @@ fs_initcall(bpf_event_init);
 
 struct bpf_session_run_ctx {
 	struct bpf_run_ctx run_ctx;
-	bool is_return;
 	void *data;
 };
 
@@ -2535,8 +2534,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
 {
 	struct bpf_kprobe_multi_run_ctx run_ctx = {
 		.session_ctx = {
-			.is_return = is_return,
-			.data = data,
+			.data = (void *)((unsigned long)data | !!is_return),
 		},
 		.link = link,
 		.entry_ip = entry_ip,
@@ -3058,8 +3056,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
 	struct bpf_uprobe_multi_link *link = uprobe->link;
 	struct bpf_uprobe_multi_run_ctx run_ctx = {
 		.session_ctx = {
-			.is_return = is_return,
-			.data = data,
+			.data = (void *)((unsigned long)data | !!is_return),
 		},
 		.entry_ip = entry_ip,
 		.uprobe = uprobe,
@@ -3316,7 +3313,7 @@ __bpf_kfunc bool bpf_session_is_return(void)
 	struct bpf_session_run_ctx *session_ctx;
 
 	session_ctx = container_of(current->bpf_ctx, struct bpf_session_run_ctx, run_ctx);
-	return session_ctx->is_return;
+	return (unsigned long)session_ctx->data & 1;
 }
 
 __bpf_kfunc __u64 *bpf_session_cookie(void)
@@ -3324,7 +3321,7 @@ __bpf_kfunc __u64 *bpf_session_cookie(void)
 	struct bpf_session_run_ctx *session_ctx;
 
 	session_ctx = container_of(current->bpf_ctx, struct bpf_session_run_ctx, run_ctx);
-	return session_ctx->data;
+	return (__u64 *)((unsigned long)session_ctx->data & ~1);
 }
 
 __bpf_kfunc_end_defs();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
  2025-09-22  9:57 [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx Menglong Dong
@ 2025-09-22 14:08 ` Song Liu
  2025-09-22 14:10   ` Menglong Dong
  2025-09-23  7:52 ` kernel test robot
  1 sibling, 1 reply; 6+ messages in thread
From: Song Liu @ 2025-09-22 14:08 UTC (permalink / raw)
  To: Menglong Dong
  Cc: jolsa, kpsingh, mattbobrowski, ast, daniel, andrii, martin.lau,
	eddyz87, yonghong.song, john.fastabend, sdf, haoluo, rostedt,
	mhiramat, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> Therefore, we can store the "is_return" to the last bit of the "data",
> which can make bpf_session_run_ctx 8-bytes aligned and save memory.

Does this really save anything? AFAICT, bpf_session_run_ctx is
only allocated on the stack. Therefore, we don't save any memory
unless there is potential risk of stack overflow.

OTOH, this last-bit logic is confusing and error prone. I would argue
against this type of optimization.

Thanks,
Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
  2025-09-22 14:08 ` Song Liu
@ 2025-09-22 14:10   ` Menglong Dong
  2025-09-23 19:23     ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Menglong Dong @ 2025-09-22 14:10 UTC (permalink / raw)
  To: Song Liu
  Cc: jolsa, kpsingh, mattbobrowski, ast, daniel, andrii, martin.lau,
	eddyz87, yonghong.song, john.fastabend, sdf, haoluo, rostedt,
	mhiramat, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote:
>
> On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> > Therefore, we can store the "is_return" to the last bit of the "data",
> > which can make bpf_session_run_ctx 8-bytes aligned and save memory.
>
> Does this really save anything? AFAICT, bpf_session_run_ctx is
> only allocated on the stack. Therefore, we don't save any memory
> unless there is potential risk of stack overflow.

Hi, Song. My original intention is to save the usage of the
stack to prevent potential stack overflow, especially when we
trace all the kernel functions with kprobe-multi.

The most thing for me is that the unaligned field in the struct
looks very awkward, and it consumes 8-bytes only for a bit.

>
> OTOH, this last-bit logic is confusing and error prone. I would argue
> against this type of optimization.

Ah, you are right about this part. It does make the code more
confusing :/

Thanks!
Menglong Dong

>
> Thanks,
> Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
  2025-09-22  9:57 [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx Menglong Dong
  2025-09-22 14:08 ` Song Liu
@ 2025-09-23  7:52 ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2025-09-23  7:52 UTC (permalink / raw)
  To: Menglong Dong, jolsa
  Cc: oe-kbuild-all, kpsingh, mattbobrowski, song, ast, daniel, andrii,
	martin.lau, eddyz87, yonghong.song, john.fastabend, sdf, haoluo,
	rostedt, mhiramat, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

Hi Menglong,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/bpf-remove-is_return-in-struct-bpf_session_run_ctx/20250922-175833
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250922095705.252519-1-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
config: i386-randconfig-r113-20250923 (https://download.01.org/0day-ci/archive/20250923/202509231550.BWhLcP9e-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250923/202509231550.BWhLcP9e-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509231550.BWhLcP9e-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   kernel/trace/bpf_trace.c:833:41: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr @@     got void * @@
   kernel/trace/bpf_trace.c:833:41: sparse:     expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr
   kernel/trace/bpf_trace.c:833:41: sparse:     got void *
>> kernel/trace/bpf_trace.c:3059:62: sparse: sparse: dubious: x | !y
   kernel/trace/bpf_trace.c:3512:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3526:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3540:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3547:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3555:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3563:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
   include/linux/rcupdate.h:869:9: sparse: sparse: context imbalance in 'uprobe_prog_run' - unexpected unlock

vim +3059 kernel/trace/bpf_trace.c

  3050	
  3051	static int uprobe_prog_run(struct bpf_uprobe *uprobe,
  3052				   unsigned long entry_ip,
  3053				   struct pt_regs *regs,
  3054				   bool is_return, void *data)
  3055	{
  3056		struct bpf_uprobe_multi_link *link = uprobe->link;
  3057		struct bpf_uprobe_multi_run_ctx run_ctx = {
  3058			.session_ctx = {
> 3059				.data = (void *)((unsigned long)data | !!is_return),
  3060			},
  3061			.entry_ip = entry_ip,
  3062			.uprobe = uprobe,
  3063		};
  3064		struct bpf_prog *prog = link->link.prog;
  3065		bool sleepable = prog->sleepable;
  3066		struct bpf_run_ctx *old_run_ctx;
  3067		int err;
  3068	
  3069		if (link->task && !same_thread_group(current, link->task))
  3070			return 0;
  3071	
  3072		if (sleepable)
  3073			rcu_read_lock_trace();
  3074		else
  3075			rcu_read_lock();
  3076	
  3077		migrate_disable();
  3078	
  3079		old_run_ctx = bpf_set_run_ctx(&run_ctx.session_ctx.run_ctx);
  3080		err = bpf_prog_run(link->link.prog, regs);
  3081		bpf_reset_run_ctx(old_run_ctx);
  3082	
  3083		migrate_enable();
  3084	
  3085		if (sleepable)
  3086			rcu_read_unlock_trace();
  3087		else
  3088			rcu_read_unlock();
  3089		return err;
  3090	}
  3091	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
  2025-09-22 14:10   ` Menglong Dong
@ 2025-09-23 19:23     ` Alexei Starovoitov
  2025-09-24  2:03       ` menglong.dong
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2025-09-23 19:23 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Song Liu, Jiri Olsa, KP Singh, Matt Bobrowski, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, bpf, LKML,
	linux-trace-kernel

On Mon, Sep 22, 2025 at 7:11 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >
> > > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> > > Therefore, we can store the "is_return" to the last bit of the "data",
> > > which can make bpf_session_run_ctx 8-bytes aligned and save memory.
> >
> > Does this really save anything? AFAICT, bpf_session_run_ctx is
> > only allocated on the stack. Therefore, we don't save any memory
> > unless there is potential risk of stack overflow.
>
> Hi, Song. My original intention is to save the usage of the
> stack to prevent potential stack overflow,

8 bytes won't matter, but wasting 8 bytes for 1 bit is indeed annoying.

> especially when we
> trace all the kernel functions with kprobe-multi.

What do you mean? kprobe-multi won't recurse,
so tracing all or a few functions is the same concern
from stack overflow pov, no ?

> The most thing for me is that the unaligned field in the struct
> looks very awkward, and it consumes 8-bytes only for a bit.

let's keep it as-is. If stack overflow is indeed an issue we need
a generic way to detect it and prevent it.
We've been thinking whether vmap stack guard pages
can become JIT's extable-like things, so when stack overflow
happens we unwind stack and stop bpf prog instead of panicing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
  2025-09-23 19:23     ` Alexei Starovoitov
@ 2025-09-24  2:03       ` menglong.dong
  0 siblings, 0 replies; 6+ messages in thread
From: menglong.dong @ 2025-09-24  2:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Song Liu, Jiri Olsa, KP Singh, Matt Bobrowski, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, bpf, LKML,
	linux-trace-kernel

On 2025/9/24 03:23 Alexei Starovoitov <alexei.starovoitov@gmail.com> write:
> On Mon, Sep 22, 2025 at 7:11 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > >
> > > > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> > > > Therefore, we can store the "is_return" to the last bit of the "data",
> > > > which can make bpf_session_run_ctx 8-bytes aligned and save memory.
> > >
> > > Does this really save anything? AFAICT, bpf_session_run_ctx is
> > > only allocated on the stack. Therefore, we don't save any memory
> > > unless there is potential risk of stack overflow.
> >
> > Hi, Song. My original intention is to save the usage of the
> > stack to prevent potential stack overflow,
> 
> 8 bytes won't matter, but wasting 8 bytes for 1 bit is indeed annoying.
> 
> > especially when we
> > trace all the kernel functions with kprobe-multi.
> 
> What do you mean? kprobe-multi won't recurse,
> so tracing all or a few functions is the same concern
> from stack overflow pov, no ?

You are right, I made something wrong. I mixed it with origin
call case of the bpf trampoline, which will store all the things
in the stack for every function call.

> 
> > The most thing for me is that the unaligned field in the struct
> > looks very awkward, and it consumes 8-bytes only for a bit.
> 
> let's keep it as-is. If stack overflow is indeed an issue we need
> a generic way to detect it and prevent it.
> We've been thinking whether vmap stack guard pages
> can become JIT's extable-like things, so when stack overflow

Interesting

> happens we unwind stack and stop bpf prog instead of panicing.

Yeah, I think it's OK to keep it still.

Thanks!
Menglong Dong

> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-24  2:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-22  9:57 [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx Menglong Dong
2025-09-22 14:08 ` Song Liu
2025-09-22 14:10   ` Menglong Dong
2025-09-23 19:23     ` Alexei Starovoitov
2025-09-24  2:03       ` menglong.dong
2025-09-23  7:52 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).