From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next 2/5] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Wed, 13 Dec 2023 09:25:55 -0800 [thread overview]
Message-ID: <b917b011-545a-4804-b8b4-06b412e16610@linux.dev> (raw)
In-Reply-To: <ab1146fa-db28-c1a3-bfef-3dc7b1575738@huaweicloud.com>
On 12/13/23 3:03 AM, Hou Tao wrote:
> Hi,
>
> On 12/13/2023 6:30 AM, Yonghong Song wrote:
>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
>> added support for non-fix-size percpu memory allocation.
>> Such allocation will allocate percpu memory for all buckets on all
>> cpus and the memory consumption is in the order to quadratic.
>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
>> Then let us say, 8 cpus with the same unit size, each cpu
>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
>> So if the number of cpus doubles, the number of memory consumption
>> will be 4 times. So for a system with large number of cpus, the
>> memory consumption goes up quickly with quadratic order.
>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
>>
>> In Commit 41a5db8d8161, the non-fix-size percpu memory allocation is
>> done in boot time, so for system with large number of cpus, the initial
>> percpu memory consumption is very visible. For example, for 128 cpu
>> system, the total percpu memory allocation will be at least
>> (16 + 32 + 64 + 96 + 128 + 196 + 256 + 512 + 1024 + 2048 + 4096)
>> * 128 * 128 = ~138MB.
>> which is pretty big. It will be even bigger for larger number of cpus.
>>
>> Note that the current prefill also allocates 4 entries if the unit size
>> is less than 256. So on top of 138MB memory consumption, this will
>> add more consumption with
>> 3 * (16 + 32 + 64 + 96 + 128 + 196 + 256) * 128 * 128 = ~38MB.
>> Next patch will try to reduce this memory consumption.
>>
>> Later on, Commit 1fda5bb66ad8 ("bpf: Do not allocate percpu memory
>> at init stage") moved the non-fix-size percpu memory allocation
>> to bpf verificaiton stage. Once a particular bpf_percpu_obj_new()
>> is called by bpf program, the memory allocator will try to fill in
>> the cache with all sizes, causing the same amount of percpu memory
>> consumption as in the boot stage.
>>
>> To reduce the initial percpu memory consumption for non-fix-size
>> percpu memory allocation, instead of filling the cache with all
>> supported allocation sizes, this patch intends to fill the cache
>> only for the requested size. As typically users will not use large
>> percpu data structure, this can save memory significantly.
>> For example, the allocation size is 64 bytes with 128 cpus.
>> Then total percpu memory amount will be 64 * 128 * 128 = 1MB,
>> much less than previous 138MB.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>> include/linux/bpf_mem_alloc.h | 5 +++
>> kernel/bpf/memalloc.c | 62 +++++++++++++++++++++++++++++++++++
>> kernel/bpf/verifier.c | 23 +++++--------
>> 3 files changed, 75 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
>> index bb1223b21308..b049c580e7fb 100644
>> --- a/include/linux/bpf_mem_alloc.h
>> +++ b/include/linux/bpf_mem_alloc.h
>> @@ -21,8 +21,13 @@ struct bpf_mem_alloc {
>> * 'size = 0' is for bpf_mem_alloc which manages many fixed-size objects.
>> * Alloc and free are done with bpf_mem_{alloc,free}() and the size of
>> * the returned object is given by the size argument of bpf_mem_alloc().
>> + * If percpu equals true, error will be returned in order to avoid
>> + * large memory consumption and the below bpf_mem_alloc_percpu_unit_init()
>> + * should be used to do on-demand per-cpu allocation for each size.
>> */
>> int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu);
>> +/* The percpu allocation is allowed for different unit size. */
>> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
>> void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
>>
>> /* kmalloc/kfree equivalent: */
>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>> index 75068167e745..84987e97fd0a 100644
>> --- a/kernel/bpf/memalloc.c
>> +++ b/kernel/bpf/memalloc.c
>> @@ -526,6 +526,9 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>> struct bpf_mem_cache *c, __percpu *pc;
>> struct obj_cgroup *objcg = NULL;
>>
>> + if (percpu && size == 0)
>> + return -EINVAL;
>> +
>> /* room for llist_node and per-cpu pointer */
>> if (percpu)
>> percpu_size = LLIST_NODE_SZ + sizeof(void *);
>> @@ -625,6 +628,65 @@ static void bpf_mem_alloc_destroy_cache(struct bpf_mem_cache *c)
>> drain_mem_cache(c);
>> }
>>
>> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
>> +{
>> + static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096};
> Why duplicate the sizes array ? It is better to move it out of these
> functions and share it between both bpf_mem_alloc_ini() and
> bpf_mem_alloc_percpu_unit_init().
Good point. Will do in the next revision.
>
>> + int cpu, i, err, unit_size, percpu_size = 0;
>> + struct bpf_mem_caches *cc, __percpu *pcc;
>> + struct obj_cgroup *objcg = NULL;
>> + struct bpf_mem_cache *c;
>> +
>> + /* room for llist_node and per-cpu pointer */
>> + percpu_size = LLIST_NODE_SZ + sizeof(void *);
>> +
>> + if (ma->caches) {
>> + pcc = ma->caches;
>> + } else {
>> + ma->percpu = true;
>> + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL | __GFP_ZERO);
>> + if (!pcc)
>> + return -ENOMEM;
>> + ma->caches = pcc;
>> + }
> It is a little weird to me that a single API does two things:
> initialization and incremental refill. How about introducing two APIs to
> reduce the memory usage of global per-cpu ma: one API to initialize the
> global per-cpu ma in bpf_global_ma_init(), and another API to
> incremental refill global per-cpu ma accordingly ?
This can ineed to make semantics and code easy to understand.
Will make the change in the next revision.
>> +
>> + err = 0;
>> + i = bpf_mem_cache_idx(size + LLIST_NODE_SZ);
>> + if (i < 0) {
>> + err = -EINVAL;
>> + goto out;
>> + }
>> + unit_size = sizes[i];
>> +
[...]
next prev parent reply other threads:[~2023-12-13 17:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 22:30 [PATCH bpf-next 0/5] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-12 22:30 ` [PATCH bpf-next 1/5] bpf: Refactor to have a memalloc cache destroying function Yonghong Song
2023-12-13 11:05 ` Hou Tao
2023-12-12 22:30 ` [PATCH bpf-next 2/5] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-13 11:03 ` Hou Tao
2023-12-13 17:25 ` Yonghong Song [this message]
2023-12-12 22:30 ` [PATCH bpf-next 3/5] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-13 11:05 ` Hou Tao
2023-12-13 17:26 ` Yonghong Song
2023-12-12 22:31 ` [PATCH bpf-next 4/5] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-13 10:15 ` kernel test robot
2023-12-13 17:20 ` Yonghong Song
2023-12-13 11:09 ` Hou Tao
2023-12-13 17:28 ` Yonghong Song
2023-12-13 14:13 ` kernel test robot
2023-12-12 22:31 ` [PATCH bpf-next 5/5] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b917b011-545a-4804-b8b4-06b412e16610@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=houtao@huaweicloud.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.