From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v5 3/8] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Wed, 20 Dec 2023 23:16:04 -0800 [thread overview]
Message-ID: <ea395971-25f0-4b5c-8303-1620154e9b9d@linux.dev> (raw)
In-Reply-To: <58e11994-6f73-20de-eab8-f4d7a4f71d80@huaweicloud.com>
On 12/20/23 10:26 PM, Hou Tao wrote:
> Hi,
>
> On 12/21/2023 1:00 PM, Yonghong Song wrote:
>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
>> added support for non-fix-size percpu memory allocation.
>> Such allocation will allocate percpu memory for all buckets on all
>> cpus and the memory consumption is in the order to quadratic.
>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
>> Then let us say, 8 cpus with the same unit size, each cpu
>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
>> So if the number of cpus doubles, the number of memory consumption
>> will be 4 times. So for a system with large number of cpus, the
>> memory consumption goes up quickly with quadratic order.
>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
>>
>> In Commit 41a5db8d8161, the non-fix-size percpu memory allocation is
>> done in boot time, so for system with large number of cpus, the initial
>> percpu memory consumption is very visible. For example, for 128 cpu
>> system, the total percpu memory allocation will be at least
>> (16 + 32 + 64 + 96 + 128 + 196 + 256 + 512 + 1024 + 2048 + 4096)
>> * 128 * 128 = ~138MB.
>> which is pretty big. It will be even bigger for larger number of cpus.
> SNIP
>> +
>> static void drain_mem_cache(struct bpf_mem_cache *c)
>> {
>> bool percpu = !!c->percpu_size;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index f13008d27f35..08f9a49cc11c 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -12141,20 +12141,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>> if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
>> return -ENOMEM;
>>
>> - if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> - if (!bpf_global_percpu_ma_set) {
>> - mutex_lock(&bpf_percpu_ma_lock);
>> - if (!bpf_global_percpu_ma_set) {
>> - err = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
>> - if (!err)
>> - bpf_global_percpu_ma_set = true;
>> - }
>> - mutex_unlock(&bpf_percpu_ma_lock);
>> - if (err)
>> - return err;
>> - }
>> - }
>> -
>> if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
>> verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
>> return -EINVAL;
>> @@ -12175,6 +12161,26 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>> return -EINVAL;
>> }
>>
>> + if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> + if (!bpf_global_percpu_ma_set) {
>> + mutex_lock(&bpf_percpu_ma_lock);
>> + if (!bpf_global_percpu_ma_set) {
>> + err = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma);
> Because ma->objcg is assigned as get_obj_cgroup_from_current(), so I
> think the memory account will be incorrect, right ? Maybe we should pass
> objcg to bpf_mem_alloc_percpu_init() explicit. For root memcg, I think
> the objcg is NULL.
You are correct. Calling bpf_mem_alloc_percpu_init() in init stage
is exactly the reason to have proper root memcg for objcg. Sorry I missed it.
I remembered I indeed traced it a few days ago and indeed it is NULL.
There are three ways to resolve this:
1 Just do 'ma->objcg = NULL' unconditionally in bpf_mem_alloc_percpu_init().
2 Second, we can remember objcg = bpf_mem_alloc_percpu_init() at init stage,
e.g., in bpf_global_ma_init() init function (core.c), and later it can
be used in bpf_mem_alloc_percpu_init().
3 Still do bpf_mem_alloc_percpu_init() at init stage to initialize ma->objcg
properly. But delay __alloc_percpu_gfp() later when verifier found a call
to bpf_percpu_obj_new(). We could add a call bpf_mem_alloc_percpu_init_caches()
to do __alloc_percpu_grp().
I prefer option 3, what do you think?
>> + if (!err)
>> + bpf_global_percpu_ma_set = true;
>> + }
>> + mutex_unlock(&bpf_percpu_ma_lock);
>> + if (err)
>> + return err;
>> + }
>> +
>> + mutex_lock(&bpf_percpu_ma_lock);
>> + err = bpf_mem_alloc_percpu_unit_init(&bpf_global_percpu_ma, ret_t->size);
>> + mutex_unlock(&bpf_percpu_ma_lock);
>> + if (err)
>> + return err;
>> + }
>> +
>> struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id);
>> if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
>> if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) {
>
next prev parent reply other threads:[~2023-12-21 7:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 4:59 [PATCH bpf-next v5 0/8] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 1/8] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 2/8] bpf: Add objcg to bpf_mem_alloc Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 3/8] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-21 6:26 ` Hou Tao
2023-12-21 7:16 ` Yonghong Song [this message]
2023-12-21 7:52 ` Yonghong Song
2023-12-21 8:42 ` Hou Tao
2023-12-21 16:53 ` Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 4/8] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 5/8] bpf: Use smaller low/high marks for percpu allocation Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 6/8] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 7/8] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-21 5:00 ` [PATCH bpf-next v5 8/8] selftests/bpf: Add a selftest with > 512-byte percpu allocation size Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea395971-25f0-4b5c-8303-1620154e9b9d@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=houtao@huaweicloud.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox