From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Sat, 16 Dec 2023 23:11:48 -0800 [thread overview]
Message-ID: <5c13f568-325c-4e5c-9f9e-ca5da5c2c75b@linux.dev> (raw)
In-Reply-To: <ca0512d6-fa01-9d94-017f-a717756dcf86@huaweicloud.com>
On 12/15/23 7:12 PM, Hou Tao wrote:
> Hi,
>
> On 12/16/2023 10:30 AM, Yonghong Song wrote:
>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
>> added support for non-fix-size percpu memory allocation.
>> Such allocation will allocate percpu memory for all buckets on all
>> cpus and the memory consumption is in the order to quadratic.
>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
>> Then let us say, 8 cpus with the same unit size, each cpu
>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
>> So if the number of cpus doubles, the number of memory consumption
>> will be 4 times. So for a system with large number of cpus, the
>> memory consumption goes up quickly with quadratic order.
>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
> SNIP
>> +__init int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma)
>> +{
>> + struct bpf_mem_caches __percpu *pcc;
>> +
>> + pcc = __alloc_percpu_gfp(sizeof(struct bpf_mem_caches), 8, GFP_KERNEL);
>> + if (!pcc)
>> + return -ENOMEM;
>> +
>> + ma->caches = pcc;
>> + ma->percpu = true;
>> + return 0;
>> +}
>> +
>> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
>> +{
>> + int cpu, i, err = 0, unit_size, percpu_size;
>> + struct bpf_mem_caches *cc, __percpu *pcc;
>> + struct obj_cgroup *objcg;
>> + struct bpf_mem_cache *c;
>> +
>> + i = bpf_mem_cache_idx(size);
>> + if (i < 0)
>> + return -EINVAL;
>> +
>> + /* room for llist_node and per-cpu pointer */
>> + percpu_size = LLIST_NODE_SZ + sizeof(void *);
>> +
>> + pcc = ma->caches;
>> + unit_size = sizes[i];
>> +
>> +#ifdef CONFIG_MEMCG_KMEM
>> + objcg = get_obj_cgroup_from_current();
>> +#endif
> For bpf_global_percpu_ma, we also need to account the allocated memory
> to root memory cgroup just like bpf_global_ma did, do we ? So it seems
> that we need to initialize c->objcg early in bpf_mem_alloc_percpu_init ().
Good point. Agree. the original behavior percpu non-fix-size mem
allocation is to do get_obj_cgroup_from_current() at init stage
and charge to root memory cgroup, and we indeed should move
the above bpf_mem_alloc_percpu_init().
>> + for_each_possible_cpu(cpu) {
>> + cc = per_cpu_ptr(pcc, cpu);
>> + c = &cc->cache[i];
>> + if (cpu == 0 && c->unit_size)
>> + goto out;
>> +
>> + c->unit_size = unit_size;
>> + c->objcg = objcg;
>> + c->percpu_size = percpu_size;
>> + c->tgt = c;
>> +
>> + init_refill_work(c);
>> + prefill_mem_cache(c, cpu);
>> +
>> + if (cpu == 0) {
>> + err = check_obj_size(c, i);
>> + if (err) {
>> + drain_mem_cache(c);
>> + memset(c, 0, sizeof(*c));
> I also forgot about c->objcg. objcg may be leaked if we do memset() here.
The objcg gets a reference at init bpf_mem_alloc_init() stage
and released at bpf_mem_alloc_destroy(). For bpf_global_ma,
if there is a failure, indeed bpf_mem_alloc_destroy() will be
called and the reference c->objcg will be released.
So if we move get_obj_cgroup_from_current() to
bpf_mem_alloc_percpu_init() stage, we should be okay here.
BTW, is check_obj_size() really necessary here? My answer is no
since as you mentioned, the size->cache_index is pretty stable,
so check_obj_size() should not return error in such cases.
What do you think?
>> + goto out;
>> + }
>> + }
>> + }
>> +
>> +out:
>> + return err;
>> +}
>> +
>>
> .
>
next prev parent reply other threads:[~2023-12-17 7:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-16 2:30 [PATCH bpf-next v3 0/6] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-16 2:30 ` [PATCH bpf-next v3 1/6] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-16 2:30 ` [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-16 3:12 ` Hou Tao
2023-12-17 7:11 ` Yonghong Song [this message]
2023-12-17 17:21 ` Yonghong Song
2023-12-18 1:15 ` Hou Tao
2023-12-16 15:27 ` kernel test robot
2023-12-16 2:30 ` [PATCH bpf-next v3 3/6] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-16 3:13 ` Hou Tao
2023-12-16 2:30 ` [PATCH bpf-next v3 4/6] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-16 4:05 ` Hou Tao
2023-12-16 2:30 ` [PATCH bpf-next v3 5/6] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-16 4:04 ` Hou Tao
2023-12-16 2:30 ` [PATCH bpf-next v3 6/6] selftests/bpf: Add a selftest with > 512-byte percpu allocation size Yonghong Song
2023-12-16 4:07 ` Hou Tao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5c13f568-325c-4e5c-9f9e-ca5da5c2c75b@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=houtao@huaweicloud.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox