Re: [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator

BPF List
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Sat, 16 Dec 2023 23:11:48 -0800	[thread overview]
Message-ID: <5c13f568-325c-4e5c-9f9e-ca5da5c2c75b@linux.dev> (raw)
In-Reply-To: <ca0512d6-fa01-9d94-017f-a717756dcf86@huaweicloud.com>


On 12/15/23 7:12 PM, Hou Tao wrote:
> Hi,
>
> On 12/16/2023 10:30 AM, Yonghong Song wrote:
>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
>> added support for non-fix-size percpu memory allocation.
>> Such allocation will allocate percpu memory for all buckets on all
>> cpus and the memory consumption is in the order to quadratic.
>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
>> Then let us say, 8 cpus with the same unit size, each cpu
>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
>> So if the number of cpus doubles, the number of memory consumption
>> will be 4 times. So for a system with large number of cpus, the
>> memory consumption goes up quickly with quadratic order.
>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
> SNIP
>> +__init int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma)
>> +{
>> +	struct bpf_mem_caches __percpu *pcc;
>> +
>> +	pcc = __alloc_percpu_gfp(sizeof(struct bpf_mem_caches), 8, GFP_KERNEL);
>> +	if (!pcc)
>> +		return -ENOMEM;
>> +
>> +	ma->caches = pcc;
>> +	ma->percpu = true;
>> +	return 0;
>> +}
>> +
>> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
>> +{
>> +	int cpu, i, err = 0, unit_size, percpu_size;
>> +	struct bpf_mem_caches *cc, __percpu *pcc;
>> +	struct obj_cgroup *objcg;
>> +	struct bpf_mem_cache *c;
>> +
>> +	i = bpf_mem_cache_idx(size);
>> +	if (i < 0)
>> +		return -EINVAL;
>> +
>> +	/* room for llist_node and per-cpu pointer */
>> +	percpu_size = LLIST_NODE_SZ + sizeof(void *);
>> +
>> +	pcc = ma->caches;
>> +	unit_size = sizes[i];
>> +
>> +#ifdef CONFIG_MEMCG_KMEM
>> +	objcg = get_obj_cgroup_from_current();
>> +#endif
> For bpf_global_percpu_ma, we also need to account the allocated memory
> to root memory cgroup just like bpf_global_ma did, do we ? So it seems
> that we need to initialize c->objcg early in bpf_mem_alloc_percpu_init ().

Good point. Agree. the original behavior percpu non-fix-size mem
allocation is to do get_obj_cgroup_from_current() at init stage
and charge to root memory cgroup, and we indeed should move
the above bpf_mem_alloc_percpu_init().

>> +	for_each_possible_cpu(cpu) {
>> +		cc = per_cpu_ptr(pcc, cpu);
>> +		c = &cc->cache[i];
>> +		if (cpu == 0 && c->unit_size)
>> +			goto out;
>> +
>> +		c->unit_size = unit_size;
>> +		c->objcg = objcg;
>> +		c->percpu_size = percpu_size;
>> +		c->tgt = c;
>> +
>> +		init_refill_work(c);
>> +		prefill_mem_cache(c, cpu);
>> +
>> +		if (cpu == 0) {
>> +			err = check_obj_size(c, i);
>> +			if (err) {
>> +				drain_mem_cache(c);
>> +				memset(c, 0, sizeof(*c));
> I also forgot about c->objcg. objcg may be leaked if we do memset() here.

The objcg gets a reference at init bpf_mem_alloc_init() stage
and released at bpf_mem_alloc_destroy(). For bpf_global_ma,
if there is a failure, indeed bpf_mem_alloc_destroy() will be
called and the reference c->objcg will be released.

So if we move get_obj_cgroup_from_current() to
bpf_mem_alloc_percpu_init() stage, we should be okay here.

BTW, is check_obj_size() really necessary here? My answer is no
since as you mentioned, the size->cache_index is pretty stable,
so check_obj_size() should not return error in such cases.
What do you think?

>> +				goto out;
>> +			}
>> +		}
>> +	}
>> +
>> +out:
>> +	return err;
>> +}
>> +
>>   
> .
>

next prev parent reply	other threads:[~2023-12-17  7:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-16  2:30 [PATCH bpf-next v3 0/6] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-16  2:30 ` [PATCH bpf-next v3 1/6] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-16  2:30 ` [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-16  3:12   ` Hou Tao
2023-12-17  7:11     ` Yonghong Song [this message]
2023-12-17 17:21       ` Yonghong Song
2023-12-18  1:15         ` Hou Tao
2023-12-16 15:27   ` kernel test robot
2023-12-16  2:30 ` [PATCH bpf-next v3 3/6] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-16  3:13   ` Hou Tao
2023-12-16  2:30 ` [PATCH bpf-next v3 4/6] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-16  4:05   ` Hou Tao
2023-12-16  2:30 ` [PATCH bpf-next v3 5/6] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-16  4:04   ` Hou Tao
2023-12-16  2:30 ` [PATCH bpf-next v3 6/6] selftests/bpf: Add a selftest with > 512-byte percpu allocation size Yonghong Song
2023-12-16  4:07   ` Hou Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c13f568-325c-4e5c-9f9e-ca5da5c2c75b@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=houtao@huaweicloud.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox