From: Yonghong Song <yonghong.song@linux.dev>
To: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: [PATCH bpf-next v6 0/9] bpf: Support private stack for bpf progs
Date: Sun, 20 Oct 2024 12:13:41 -0700 [thread overview]
Message-ID: <20241020191341.2104841-1-yonghong.song@linux.dev> (raw)
The main motivation for private stack comes from nested scheduler in
sched-ext from Tejun. The basic idea is that
- each cgroup will its own associated bpf program,
- bpf program with parent cgroup will call bpf programs
in immediate child cgroups.
Let us say we have the following cgroup hierarchy:
root_cg (prog0):
cg1 (prog1):
cg11 (prog11):
cg111 (prog111)
cg112 (prog112)
cg12 (prog12):
cg121 (prog121)
cg122 (prog122)
cg2 (prog2):
cg21 (prog21)
cg22 (prog22)
cg23 (prog23)
In the above example, prog0 will call a kfunc which will call prog1 and
prog2 to get sched info for cg1 and cg2 and then the information is
summarized and sent back to prog0. Similarly, prog11 and prog12 will be
invoked in the kfunc and the result will be summarized and sent back to
prog1, etc. The following illustrates a possible call sequence:
... -> bpf prog A -> kfunc -> ops.<callback_fn> (bpf prog B) ...
Currently, for each thread, the x86 kernel allocate 16KB stack. Each
bpf program (including its subprograms) has maximum 512B stack size to
avoid potential stack overflow. Nested bpf programs further increase the
risk of stack overflow. To avoid potential stack overflow caused by bpf
programs, this patch set supported private stack and bpf program stack
space is allocated during verification time. Such private stack is applied
to tracing programs like kprobe/uprobe, perf_event, tracepoint, raw
tracepoint and struct_ops progs. For struct_ops progs, if the callback
stub function name has format like
<st_ops_name>__<member_name>__priv_stack
that callback func prog will use the private stack. For other tracing
programs, if the prog (including subprogs, but not including callback
functions) stack depth is greater than or equals to 128 bytes, private
stack will be used.
But more than one instance of the same bpf program may run in the system.
To make things simple, percpu private stack is allocated for each program,
so if the same program is running on different cpus concurrently, we won't
have any issue. Note that the kernel already have logic to prevent the
recursion for the same bpf program on the same cpu (kprobe, fentry, etc.).
This patch set implemented a percpu private stack based approach for x86
arch. Please see each individual patch for details.
Change logs:
v5 -> v6:
- v5 link: https://lore.kernel.org/bpf/20241017223138.3175885-1-yonghong.song@linux.dev/
- Instead of using (or not using) private stack at struct_ops level,
each prog in struct_ops can decide whether to use private stack or not.
v4 -> v5:
- v4 link: https://lore.kernel.org/bpf/20241010175552.1895980-1-yonghong.song@linux.dev/
- Remove bpf_prog_call() related implementation.
- Allow (opt-in) private stack for sched-ext progs.
v3 -> v4:
- v3 link: https://lore.kernel.org/bpf/20240926234506.1769256-1-yonghong.song@linux.dev/
There is a long discussion in the above v3 link trying to allow private
stack to be used by kernel functions in order to simplify implementation.
But unfortunately we didn't find a workable solution yet, so we return
to the approach where private stack is only used by bpf programs.
- Add bpf_prog_call() kfunc.
v2 -> v3:
- Instead of per-subprog private stack allocation, allocate private
stacks at main prog or callback entry prog. Subprogs not main or callback
progs will increment the inherited stack pointer to be their
frame pointer.
- Private stack allows each prog max stack size to be 512 bytes, intead
of the whole prog hierarchy to be 512 bytes.
- Add some tests.
Yonghong Song (9):
bpf: Allow each subprog having stack size of 512 bytes
bpf: Rename bpf_struct_ops_arg_info to bpf_struct_ops_func_info
bpf: Support private stack for struct ops programs
bpf: Mark each subprog with proper private stack modes
bpf, x86: Refactor func emit_prologue
bpf, x86: Create a helper for certain "reg <op>= imm" operations
bpf, x86: Add jit support for private stack
selftests/bpf: Add tracing prog private stack tests
selftests/bpf: Add struct_ops prog private stack tests
arch/x86/net/bpf_jit_comp.c | 187 +++++++++++----
include/linux/bpf.h | 16 +-
include/linux/bpf_verifier.h | 4 +
include/linux/filter.h | 1 +
kernel/bpf/bpf_struct_ops.c | 71 +++---
kernel/bpf/core.c | 24 ++
kernel/bpf/verifier.c | 139 +++++++++--
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 77 +++++++
.../selftests/bpf/bpf_testmod/bpf_testmod.h | 6 +
.../bpf/prog_tests/struct_ops_private_stack.c | 106 +++++++++
.../selftests/bpf/prog_tests/verifier.c | 2 +
.../bpf/progs/struct_ops_private_stack.c | 62 +++++
.../bpf/progs/struct_ops_private_stack_fail.c | 62 +++++
.../progs/struct_ops_private_stack_recur.c | 50 ++++
.../bpf/progs/verifier_private_stack.c | 216 ++++++++++++++++++
15 files changed, 933 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_private_stack.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_private_stack_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_private_stack_recur.c
create mode 100644 tools/testing/selftests/bpf/progs/verifier_private_stack.c
--
2.43.5
next reply other threads:[~2024-10-20 19:13 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-20 19:13 Yonghong Song [this message]
2024-10-20 19:13 ` [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes Yonghong Song
2024-10-22 1:18 ` Alexei Starovoitov
2024-10-22 3:21 ` Yonghong Song
2024-10-22 3:43 ` Alexei Starovoitov
2024-10-22 4:08 ` Yonghong Song
2024-10-22 20:13 ` Yonghong Song
2024-10-22 20:41 ` Alexei Starovoitov
2024-10-22 21:29 ` Kumar Kartikeya Dwivedi
2024-10-22 21:36 ` Kumar Kartikeya Dwivedi
2024-10-22 21:43 ` Yonghong Song
2024-10-22 21:57 ` Alexei Starovoitov
2024-10-22 22:41 ` Yonghong Song
2024-10-22 22:59 ` Alexei Starovoitov
2024-10-22 23:53 ` Yonghong Song
2024-10-20 19:13 ` [PATCH bpf-next v6 2/9] bpf: Rename bpf_struct_ops_arg_info to bpf_struct_ops_func_info Yonghong Song
2024-10-20 19:13 ` [PATCH bpf-next v6 3/9] bpf: Support private stack for struct ops programs Yonghong Song
2024-10-22 1:34 ` Alexei Starovoitov
2024-10-22 2:59 ` Yonghong Song
2024-10-22 17:26 ` Martin KaFai Lau
2024-10-22 20:19 ` Alexei Starovoitov
2024-10-23 21:00 ` Tejun Heo
2024-10-23 23:07 ` Alexei Starovoitov
2024-10-24 0:56 ` Tejun Heo
2024-10-20 19:14 ` [PATCH bpf-next v6 4/9] bpf: Mark each subprog with proper private stack modes Yonghong Song
2024-10-20 22:01 ` Jiri Olsa
2024-10-21 4:22 ` Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 5/9] bpf, x86: Refactor func emit_prologue Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 6/9] bpf, x86: Create a helper for certain "reg <op>= imm" operations Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 7/9] bpf, x86: Add jit support for private stack Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 8/9] selftests/bpf: Add tracing prog private stack tests Yonghong Song
2024-10-20 21:59 ` Jiri Olsa
2024-10-21 4:32 ` Yonghong Song
2024-10-21 10:40 ` Jiri Olsa
2024-10-21 16:19 ` Yonghong Song
2024-10-21 21:13 ` Jiri Olsa
2024-10-20 19:14 ` [PATCH bpf-next v6 9/9] selftests/bpf: Add struct_ops " Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241020191341.2104841-1-yonghong.song@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox