From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Sun, 17 Dec 2023 09:21:30 -0800 [thread overview]
Message-ID: <9bf96304-2cfe-453c-a709-00eb56fdf136@linux.dev> (raw)
In-Reply-To: <5c13f568-325c-4e5c-9f9e-ca5da5c2c75b@linux.dev>
On 12/16/23 11:11 PM, Yonghong Song wrote:
>
> On 12/15/23 7:12 PM, Hou Tao wrote:
>> Hi,
>>
>> On 12/16/2023 10:30 AM, Yonghong Song wrote:
>>> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem
>>> allocation")
>>> added support for non-fix-size percpu memory allocation.
>>> Such allocation will allocate percpu memory for all buckets on all
>>> cpus and the memory consumption is in the order to quadratic.
>>> For example, let us say, 4 cpus, unit size 16 bytes, so each
>>> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256
>>> bytes.
>>> Then let us say, 8 cpus with the same unit size, each cpu
>>> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024
>>> bytes.
>>> So if the number of cpus doubles, the number of memory consumption
>>> will be 4 times. So for a system with large number of cpus, the
>>> memory consumption goes up quickly with quadratic order.
>>> For example, for 4KB percpu allocation, 128 cpus. The total memory
>>> consumption will 4KB * 128 * 128 = 64MB. Things will become
>>> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
>> SNIP
>>> +__init int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma)
>>> +{
>>> + struct bpf_mem_caches __percpu *pcc;
>>> +
>>> + pcc = __alloc_percpu_gfp(sizeof(struct bpf_mem_caches), 8,
>>> GFP_KERNEL);
>>> + if (!pcc)
>>> + return -ENOMEM;
>>> +
>>> + ma->caches = pcc;
>>> + ma->percpu = true;
>>> + return 0;
>>> +}
>>> +
>>> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
>>> +{
>>> + int cpu, i, err = 0, unit_size, percpu_size;
>>> + struct bpf_mem_caches *cc, __percpu *pcc;
>>> + struct obj_cgroup *objcg;
>>> + struct bpf_mem_cache *c;
>>> +
>>> + i = bpf_mem_cache_idx(size);
>>> + if (i < 0)
>>> + return -EINVAL;
>>> +
>>> + /* room for llist_node and per-cpu pointer */
>>> + percpu_size = LLIST_NODE_SZ + sizeof(void *);
>>> +
>>> + pcc = ma->caches;
>>> + unit_size = sizes[i];
>>> +
>>> +#ifdef CONFIG_MEMCG_KMEM
>>> + objcg = get_obj_cgroup_from_current();
>>> +#endif
>> For bpf_global_percpu_ma, we also need to account the allocated memory
>> to root memory cgroup just like bpf_global_ma did, do we ? So it seems
>> that we need to initialize c->objcg early in
>> bpf_mem_alloc_percpu_init ().
>
> Good point. Agree. the original behavior percpu non-fix-size mem
> allocation is to do get_obj_cgroup_from_current() at init stage
> and charge to root memory cgroup, and we indeed should move
> the above bpf_mem_alloc_percpu_init().
>
>>> + for_each_possible_cpu(cpu) {
>>> + cc = per_cpu_ptr(pcc, cpu);
>>> + c = &cc->cache[i];
>>> + if (cpu == 0 && c->unit_size)
>>> + goto out;
>>> +
>>> + c->unit_size = unit_size;
>>> + c->objcg = objcg;
>>> + c->percpu_size = percpu_size;
>>> + c->tgt = c;
>>> +
>>> + init_refill_work(c);
>>> + prefill_mem_cache(c, cpu);
>>> +
>>> + if (cpu == 0) {
>>> + err = check_obj_size(c, i);
>>> + if (err) {
>>> + drain_mem_cache(c);
>>> + memset(c, 0, sizeof(*c));
>> I also forgot about c->objcg. objcg may be leaked if we do memset()
>> here.
>
> The objcg gets a reference at init bpf_mem_alloc_init() stage
> and released at bpf_mem_alloc_destroy(). For bpf_global_ma,
> if there is a failure, indeed bpf_mem_alloc_destroy() will be
> called and the reference c->objcg will be released.
>
> So if we move get_obj_cgroup_from_current() to
> bpf_mem_alloc_percpu_init() stage, we should be okay here.
>
> BTW, is check_obj_size() really necessary here? My answer is no
> since as you mentioned, the size->cache_index is pretty stable,
> so check_obj_size() should not return error in such cases.
> What do you think?
How about the following change on top of this patch?
diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
index 43e635c67150..d1403204379e 100644
--- a/include/linux/bpf_mem_alloc.h
+++ b/include/linux/bpf_mem_alloc.h
@@ -11,6 +11,7 @@ struct bpf_mem_caches;
struct bpf_mem_alloc {
struct bpf_mem_caches __percpu *caches;
struct bpf_mem_cache __percpu *cache;
+ struct obj_cgroup *objcg;
bool percpu;
struct work_struct work;
};
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 5cf2738c20a9..6486da4ba097 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -553,6 +553,8 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
if (memcg_bpf_enabled())
objcg = get_obj_cgroup_from_current();
#endif
+ ma->objcg = objcg;
+
for_each_possible_cpu(cpu) {
c = per_cpu_ptr(pc, cpu);
c->unit_size = unit_size;
@@ -573,6 +575,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
#ifdef CONFIG_MEMCG_KMEM
objcg = get_obj_cgroup_from_current();
#endif
+ ma->objcg = objcg;
for_each_possible_cpu(cpu) {
cc = per_cpu_ptr(pcc, cpu);
for (i = 0; i < NUM_CACHES; i++) {
@@ -637,6 +640,12 @@ __init int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma)
ma->caches = pcc;
ma->percpu = true;
+
+#ifdef CONFIG_MEMCG_KMEM
+ ma->objcg = get_obj_cgroup_from_current();
+#else
+ ma->objcg = NULL;
+#endif
return 0;
}
@@ -656,10 +665,8 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
pcc = ma->caches;
unit_size = sizes[i];
+ objcg = ma->objcg;
-#ifdef CONFIG_MEMCG_KMEM
- objcg = get_obj_cgroup_from_current();
-#endif
for_each_possible_cpu(cpu) {
cc = per_cpu_ptr(pcc, cpu);
c = &cc->cache[i];
@@ -799,9 +806,8 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
- /* objcg is the same across cpus */
- if (c->objcg)
- obj_cgroup_put(c->objcg);
+ if (ma->objcg)
+ obj_cgroup_put(ma->objcg);
destroy_mem_alloc(ma, rcu_in_progress);
}
if (ma->caches) {
@@ -817,8 +823,8 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
}
- if (c->objcg)
- obj_cgroup_put(c->objcg);
+ if (ma->objcg)
+ obj_cgroup_put(ma->objcg);
destroy_mem_alloc(ma, rcu_in_progress);
}
}
I still think check_obj_size for percpu allocation is not needed.
But I guess we can address that issue later on.
>
>>> + goto out;
>>> + }
>>> + }
>>> + }
>>> +
>>> +out:
>>> + return err;
>>> +}
>>> +
>> .
>>
>
next prev parent reply other threads:[~2023-12-17 17:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-16 2:30 [PATCH bpf-next v3 0/6] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-16 2:30 ` [PATCH bpf-next v3 1/6] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-16 2:30 ` [PATCH bpf-next v3 2/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-16 3:12 ` Hou Tao
2023-12-17 7:11 ` Yonghong Song
2023-12-17 17:21 ` Yonghong Song [this message]
2023-12-18 1:15 ` Hou Tao
2023-12-16 15:27 ` kernel test robot
2023-12-16 2:30 ` [PATCH bpf-next v3 3/6] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-16 3:13 ` Hou Tao
2023-12-16 2:30 ` [PATCH bpf-next v3 4/6] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-16 4:05 ` Hou Tao
2023-12-16 2:30 ` [PATCH bpf-next v3 5/6] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-16 4:04 ` Hou Tao
2023-12-16 2:30 ` [PATCH bpf-next v3 6/6] selftests/bpf: Add a selftest with > 512-byte percpu allocation size Yonghong Song
2023-12-16 4:07 ` Hou Tao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9bf96304-2cfe-453c-a709-00eb56fdf136@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=houtao@huaweicloud.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox