Re: [PATCH bpf-next v2 3/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Fri, 15 Dec 2023 06:20:09 -0800	[thread overview]
Message-ID: <f2a42641-d9a7-47c4-9993-9a35555ed6bc@linux.dev> (raw)
In-Reply-To: <ee755de6-b86c-a80f-271d-4e34ee7d0f94@huaweicloud.com>


On 12/14/23 11:40 PM, Hou Tao wrote:
> Hi,
>
> On 12/15/2023 3:27 PM, Yonghong Song wrote:
>> On 12/14/23 10:50 PM, Yonghong Song wrote:
>>> On 12/14/23 7:19 PM, Hou Tao wrote:
>>>> On 12/15/2023 8:12 AM, Yonghong Song wrote:
>>>>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem
>>>>> allocation")
>>>>> added support for non-fix-size percpu memory allocation.
>>>>> Such allocation will allocate percpu memory for all buckets on all
>>>>> cpus and the memory consumption is in the order to quadratic.
>>>>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>>>>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256
>>>>> bytes.
>>>>> Then let us say, 8 cpus with the same unit size, each cpu
>>>>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024
>>>>> bytes.
>>>>> So if the number of cpus doubles, the number of memory consumption
>>>>> will be 4 times. So for a system with large number of cpus, the
>>>>> memory consumption goes up quickly with quadratic order.
>>>>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>>>>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>>>>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
> SNIP
>>>>> +#ifdef CONFIG_MEMCG_KMEM
>>>>> +    objcg = get_obj_cgroup_from_current();
>>>>> +#endif
>>>>> +    for_each_possible_cpu(cpu) {
>>>>> +        cc = per_cpu_ptr(pcc, cpu);
>>>>> +        c = &cc->cache[i];
>>>>> +        if (cpu == 0 && c->unit_size)
>>>>> +            goto out;
>>>>> +
>>>>> +        c->unit_size = unit_size;
>>>>> +        c->objcg = objcg;
>>>>> +        c->percpu_size = percpu_size;
>>>>> +        c->tgt = c;
>>>>> +
>>>>> +        init_refill_work(c);
>>>>> +        prefill_mem_cache(c, cpu);
>>>>> +
>>>>> +        if (cpu == 0) {
>>>>> +            err = check_obj_size(c, i);
>>>>> +            if (err) {
>>>>> +                bpf_mem_alloc_destroy_cache(c);
>>>> It seems drain_mem_cache() will be enough. Have you considered setting
>>> At prefill stage, looks like the following is enough:
>>>      free_all(__llist_del_all(&c->free_llist), percpu);
>>> But I agree that drain_mem_cache() is simpler and is
>>> easier for future potential code change.
>>>
>>>> low_watermark as 0 to prevent potential refill in unit_alloc() if the
>>>> initialization of the current unit fails ?
>>> I think it does make sense. For non-fix-size non-percpu prefill,
>>> if check_obj_size() failed, the prefill will fail, which include
>>> all buckets.
>>>
>>> In this case, if it fails for a particular bucket, we should
>>> make sure that bucket always return NULL ptr, so setting the
>>> low_watermark to 0 does make sense.
>> Thinking again. If the initialization of the current unit
>> failed, the verification will fail and the corresponding
>> bpf program will not be able to do memory alloc, so we
>> should be fine.
>>
>> But it is totally possible that some prog later may
>> call bpf_mem_alloc_percpu_unit_init() again with the
>> same size/bucket. So we should simply reset bpf_mem_cache
>> to 0 during the previous failed bpf_mem_alloc_percpu_unit_init()
>> call. Is it possible that check_obj_size() may initially
>> returns an error but sometime later something in
>> the kernel changed and the check_obj_size() with the
>> same size could return true?
> Resetting bpf_mem_cache as 0 is much simpler and easier to understand
> than resetting low_watermark as 0. For per-cpu allocation, the return
> value of pcpu_alloc_size() is stable and I don't think it will change
> like ksize() does(), so it is not possible that the previous
> check_obj_size() failed, but the new check_obj_size() for the same
> unit_size succeeds.

Thanks for clarification. Let me just do resetting bpf_mem_cache to 0 then.

>
>>
>>>>> +                goto out;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +out:
>>>>> +    return err;
>>>>> +}
>>>>> +
>>>>>    static void check_mem_cache(struct bpf_mem_cache *c)
>>>>>    {
>>>>> WARN_ON_ONCE(!llist_empty(&c->free_by_rcu_ttrace));
>>>>>
>>>> .
>>>>

next prev parent reply	other threads:[~2023-12-15 14:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15  0:11 [PATCH bpf-next v2 0/6] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-15  0:11 ` [PATCH bpf-next v2 1/6] bpf: Refactor to have a memalloc cache destroying function Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 2/6] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-15  3:40   ` Hou Tao
2023-12-15  0:12 ` [PATCH bpf-next v2 3/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-15  2:45   ` Yonghong Song
2023-12-15  3:19   ` Hou Tao
2023-12-15  6:50     ` Yonghong Song
2023-12-15  7:27       ` Yonghong Song
2023-12-15  7:40         ` Hou Tao
2023-12-15 14:20           ` Yonghong Song [this message]
2023-12-15  0:12 ` [PATCH bpf-next v2 4/6] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 5/6] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 6/6] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-15  3:33   ` Hou Tao
2023-12-15  7:38     ` Yonghong Song
2023-12-15  7:51       ` Hou Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2a42641-d9a7-47c4-9993-9a35555ed6bc@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=houtao@huaweicloud.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.