* [PATCH bpf-next v2] bpf: Do not allocate percpu memory at init stage
@ 2023-11-10 17:20 Yonghong Song
2023-11-10 20:32 ` Alexei Starovoitov
0 siblings, 1 reply; 3+ messages in thread
From: Yonghong Song @ 2023-11-10 17:20 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
Martin KaFai Lau, Kirill A . Shutemov
Kirill Shutemov reported significant percpu memory consumption increase after
booting in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
non-fix-size percpu mem allocation"). The percpu memory consumption is
increased from 111MB to 969MB. The number is from /proc/meminfo.
I tried to reproduce the issue with my local VM which at most supports upto
255 cpus. With 252 cpus, without the above commit, the percpu memory
consumption immediately after boot is 57MB while with the above commit the
percpu memory consumption is 231MB.
This is not good since so far percpu memory from bpf memory allocator is not
widely used yet. Let us change pre-allocation in init stage to on-demand
allocation when verifier detects there is a need of percpu memory for bpf
program. With this change, percpu memory consumption after boot can be reduced
signicantly.
[1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/
Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation")
Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
include/linux/bpf.h | 2 +-
kernel/bpf/core.c | 8 +++-----
kernel/bpf/verifier.c | 19 +++++++++++++++++--
3 files changed, 21 insertions(+), 8 deletions(-)
Changelog:
v1 -> v2:
- Add proper Reported-and-tested-by tag.
- Do a check of !bpf_global_percpu_ma_set before acquiring verifier_lock.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b4825d3cdb29..3df67a04d32e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -56,7 +56,7 @@ extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
extern struct kobject *btf_kobj;
extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
-extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
+extern bool bpf_global_ma_set;
typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 08626b519ce2..cd3afe57ece3 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -64,8 +64,8 @@
#define OFF insn->off
#define IMM insn->imm
-struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
-bool bpf_global_ma_set, bpf_global_percpu_ma_set;
+struct bpf_mem_alloc bpf_global_ma;
+bool bpf_global_ma_set;
/* No hurry in this branch
*
@@ -2934,9 +2934,7 @@ static int __init bpf_global_ma_init(void)
ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
bpf_global_ma_set = !ret;
- ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
- bpf_global_percpu_ma_set = !ret;
- return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
+ return ret;
}
late_initcall(bpf_global_ma_init);
#endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bd1c42eb540f..fadbabfdef60 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -26,6 +26,7 @@
#include <linux/poison.h>
#include <linux/module.h>
#include <linux/cpumask.h>
+#include <linux/bpf_mem_alloc.h>
#include <net/xdp.h>
#include "disasm.h"
@@ -41,6 +42,9 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
#undef BPF_LINK_TYPE
};
+struct bpf_mem_alloc bpf_global_percpu_ma;
+static bool bpf_global_percpu_ma_set;
+
/* bpf_check() is a static code analyzer that walks eBPF program
* instruction by instruction and updates register/stack state.
* All paths of conditional branches are analyzed until 'bpf_exit' insn.
@@ -12074,8 +12078,19 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
return -ENOMEM;
- if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set)
- return -ENOMEM;
+ if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
+ if (!bpf_global_percpu_ma_set) {
+ mutex_lock(&bpf_verifier_lock);
+ if (!bpf_global_percpu_ma_set) {
+ err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
+ if (!err)
+ bpf_global_percpu_ma_set = true;
+ }
+ mutex_unlock(&bpf_verifier_lock);
+ if (err)
+ return err;
+ }
+ }
if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH bpf-next v2] bpf: Do not allocate percpu memory at init stage
2023-11-10 17:20 [PATCH bpf-next v2] bpf: Do not allocate percpu memory at init stage Yonghong Song
@ 2023-11-10 20:32 ` Alexei Starovoitov
2023-11-10 21:05 ` Yonghong Song
0 siblings, 1 reply; 3+ messages in thread
From: Alexei Starovoitov @ 2023-11-10 20:32 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Kernel Team, Martin KaFai Lau, Kirill A . Shutemov
On Fri, Nov 10, 2023 at 9:23 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
> + if (!bpf_global_percpu_ma_set) {
> + mutex_lock(&bpf_verifier_lock);
> + if (!bpf_global_percpu_ma_set) {
> + err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
> + if (!err)
> + bpf_global_percpu_ma_set = true;
> + }
> + mutex_unlock(&bpf_verifier_lock);
I feel we're taking unnecessary risk here by reusing the mutex.
bpf_obj_new kfunc is a privileged operation and the verifier lock
is not held in such scenario, so it won't deadlock,
but let's just add another mutex to protect percpu_ma init.
Much easier to reason about.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH bpf-next v2] bpf: Do not allocate percpu memory at init stage
2023-11-10 20:32 ` Alexei Starovoitov
@ 2023-11-10 21:05 ` Yonghong Song
0 siblings, 0 replies; 3+ messages in thread
From: Yonghong Song @ 2023-11-10 21:05 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Kernel Team, Martin KaFai Lau, Kirill A . Shutemov
On 11/10/23 12:32 PM, Alexei Starovoitov wrote:
> On Fri, Nov 10, 2023 at 9:23 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>> + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> + if (!bpf_global_percpu_ma_set) {
>> + mutex_lock(&bpf_verifier_lock);
>> + if (!bpf_global_percpu_ma_set) {
>> + err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
>> + if (!err)
>> + bpf_global_percpu_ma_set = true;
>> + }
>> + mutex_unlock(&bpf_verifier_lock);
> I feel we're taking unnecessary risk here by reusing the mutex.
> bpf_obj_new kfunc is a privileged operation and the verifier lock
> is not held in such scenario, so it won't deadlock,
That is true. deadlock situation won't happen.
> but let's just add another mutex to protect percpu_ma init.
> Much easier to reason about.
Okay. will do.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-11-10 21:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-10 17:20 [PATCH bpf-next v2] bpf: Do not allocate percpu memory at init stage Yonghong Song
2023-11-10 20:32 ` Alexei Starovoitov
2023-11-10 21:05 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox