From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v5 3/8] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Wed, 20 Dec 2023 23:16:04 -0800 [thread overview]
Message-ID: <ea395971-25f0-4b5c-8303-1620154e9b9d@linux.dev> (raw)
In-Reply-To: <58e11994-6f73-20de-eab8-f4d7a4f71d80@huaweicloud.com>
On 12/20/23 10:26 PM, Hou Tao wrote:
> Hi,
>
> On 12/21/2023 1:00 PM, Yonghong Song wrote:
>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
>> added support for non-fix-size percpu memory allocation.
>> Such allocation will allocate percpu memory for all buckets on all
>> cpus and the memory consumption is in the order to quadratic.
>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
>> Then let us say, 8 cpus with the same unit size, each cpu
>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
>> So if the number of cpus doubles, the number of memory consumption
>> will be 4 times. So for a system with large number of cpus, the
>> memory consumption goes up quickly with quadratic order.
>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
>>
>> In Commit 41a5db8d8161, the non-fix-size percpu memory allocation is
>> done in boot time, so for system with large number of cpus, the initial
>> percpu memory consumption is very visible. For example, for 128 cpu
>> system, the total percpu memory allocation will be at least
>> (16 + 32 + 64 + 96 + 128 + 196 + 256 + 512 + 1024 + 2048 + 4096)
>> * 128 * 128 = ~138MB.
>> which is pretty big. It will be even bigger for larger number of cpus.
> SNIP
>> +
>> static void drain_mem_cache(struct bpf_mem_cache *c)
>> {
>> bool percpu = !!c->percpu_size;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index f13008d27f35..08f9a49cc11c 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -12141,20 +12141,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>> if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
>> return -ENOMEM;
>>
>> - if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> - if (!bpf_global_percpu_ma_set) {
>> - mutex_lock(&bpf_percpu_ma_lock);
>> - if (!bpf_global_percpu_ma_set) {
>> - err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
>> - if (!err)
>> - bpf_global_percpu_ma_set = true;
>> - }
>> - mutex_unlock(&bpf_percpu_ma_lock);
>> - if (err)
>> - return err;
>> - }
>> - }
>> -
>> if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
>> verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
>> return -EINVAL;
>> @@ -12175,6 +12161,26 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>> return -EINVAL;
>> }
>>
>> + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> + if (!bpf_global_percpu_ma_set) {
>> + mutex_lock(&bpf_percpu_ma_lock);
>> + if (!bpf_global_percpu_ma_set) {
>> + err = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma);
> Because ma->objcg is assigned as get_obj_cgroup_from_current(), so I
> think the memory account will be incorrect, right ? Maybe we should pass
> objcg to bpf_mem_alloc_percpu_init() explicit. For root memcg, I think
> the objcg is NULL.
You are correct. Calling bpf_mem_alloc_percpu_init() in init stage
is exactly the reason to have proper root memcg for objcg. Sorry I missed it.
I remembered I indeed traced it a few days ago and indeed it is NULL.
There are three ways to resolve this:
1 Just do 'ma->objcg = NULL' unconditionally in bpf_mem_alloc_percpu_init().
2 Second, we can remember objcg = bpf_mem_alloc_percpu_init() at init stage,
e.g., in bpf_global_ma_init() init function (core.c), and later it can
be used in bpf_mem_alloc_percpu_init().
3 Still do bpf_mem_alloc_percpu_init() at init stage to initialize ma->objcg
properly. But delay __alloc_percpu_gfp() later when verifier found a call
to bpf_percpu_obj_new(). We could add a call bpf_mem_alloc_percpu_init_caches()
to do __alloc_percpu_grp().
I prefer option 3, what do you think?
>> + if (!err)
>> + bpf_global_percpu_ma_set = true;
>> + }
>> + mutex_unlock(&bpf_percpu_ma_lock);
>> + if (err)
>> + return err;
>> + }
>> +
>> + mutex_lock(&bpf_percpu_ma_lock);
>> + err = bpf_mem_alloc_percpu_unit_init(&bpf_global_percpu_ma, ret_t->size);
>> + mutex_unlock(&bpf_percpu_ma_lock);
>> + if (err)
>> + return err;
>> + }
>> +
>> struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id);
>> if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) {
>
next prev parent reply other threads:[~2023-12-21 7:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 4:59 [PATCH bpf-next v5 0/8] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 1/8] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 2/8] bpf: Add objcg to bpf_mem_alloc Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 3/8] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-21 6:26 ` Hou Tao
2023-12-21 7:16 ` Yonghong Song [this message]
2023-12-21 7:52 ` Yonghong Song
2023-12-21 8:42 ` Hou Tao
2023-12-21 16:53 ` Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 4/8] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 5/8] bpf: Use smaller low/high marks for percpu allocation Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 6/8] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 7/8] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 8/8] selftests/bpf: Add a selftest with > 512-byte percpu allocation size Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea395971-25f0-4b5c-8303-1620154e9b9d@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=houtao@huaweicloud.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.