BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator
Date: Thu, 14 Dec 2023 18:45:01 -0800	[thread overview]
Message-ID: <44612e98-ff40-432e-80db-510c8817bd13@linux.dev> (raw)
In-Reply-To: <20231215001209.3252729-1-yonghong.song@linux.dev>


On 12/14/23 4:12 PM, Yonghong Song wrote:
> Commit 41a5db8d8161 ("Add support for non-fix-size percpu mem allocation")
> added support for non-fix-size percpu memory allocation.
> Such allocation will allocate percpu memory for all buckets on all
> cpus and the memory consumption is in the order to quadratic.
> For example, let us say, 4 cpus, unit size 16 bytes, so each
> cpu has 16 * 4 = 64 bytes, with 4 cpus, total will be 64 * 4 = 256 bytes.
> Then let us say, 8 cpus with the same unit size, each cpu
> has 16 * 8 = 128 bytes, with 8 cpus, total will be 128 * 8 = 1024 bytes.
> So if the number of cpus doubles, the number of memory consumption
> will be 4 times. So for a system with large number of cpus, the
> memory consumption goes up quickly with quadratic order.
> For example, for 4KB percpu allocation, 128 cpus. The total memory
> consumption will 4KB * 128 * 128 = 64MB. Things will become
> worse if the number of cpus is bigger (e.g., 512, 1024, etc.)
>
> In Commit 41a5db8d8161, the non-fix-size percpu memory allocation is
> done in boot time, so for system with large number of cpus, the initial
> percpu memory consumption is very visible. For example, for 128 cpu
> system, the total percpu memory allocation will be at least
> (16 + 32 + 64 + 96 + 128 + 196 + 256 + 512 + 1024 + 2048 + 4096)
>    * 128 * 128 = ~138MB.
> which is pretty big. It will be even bigger for larger number of cpus.
>
> Note that the current prefill also allocates 4 entries if the unit size
> is less than 256. So on top of 138MB memory consumption, this will
> add more consumption with
> 3 * (16 + 32 + 64 + 96 + 128 + 196 + 256) * 128 * 128 = ~38MB.
> Next patch will try to reduce this memory consumption.
>
> Later on, Commit 1fda5bb66ad8 ("bpf: Do not allocate percpu memory
> at init stage") moved the non-fix-size percpu memory allocation
> to bpf verificaiton stage. Once a particular bpf_percpu_obj_new()
> is called by bpf program, the memory allocator will try to fill in
> the cache with all sizes, causing the same amount of percpu memory
> consumption as in the boot stage.
>
> To reduce the initial percpu memory consumption for non-fix-size
> percpu memory allocation, instead of filling the cache with all
> supported allocation sizes, this patch intends to fill the cache
> only for the requested size. As typically users will not use large
> percpu data structure, this can save memory significantly.
> For example, the allocation size is 64 bytes with 128 cpus.
> Then total percpu memory amount will be 64 * 128 * 128 = 1MB,
> much less than previous 138MB.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>   include/linux/bpf.h           |  2 +-
>   include/linux/bpf_mem_alloc.h |  7 ++++
>   kernel/bpf/core.c             |  8 +++--
>   kernel/bpf/memalloc.c         | 68 ++++++++++++++++++++++++++++++++++-
>   kernel/bpf/verifier.c         | 28 ++++++---------
>   5 files changed, 91 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index c87c608a3689..f1f16449fbc4 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -60,7 +60,7 @@ extern struct idr btf_idr;
>   extern spinlock_t btf_idr_lock;
>   extern struct kobject *btf_kobj;
>   extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> -extern bool bpf_global_ma_set;
> +extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>   
>   typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>   typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
> index bb1223b21308..43e635c67150 100644
> --- a/include/linux/bpf_mem_alloc.h
> +++ b/include/linux/bpf_mem_alloc.h
> @@ -21,8 +21,15 @@ struct bpf_mem_alloc {
>    * 'size = 0' is for bpf_mem_alloc which manages many fixed-size objects.
>    * Alloc and free are done with bpf_mem_{alloc,free}() and the size of
>    * the returned object is given by the size argument of bpf_mem_alloc().
> + * If percpu equals true, error will be returned in order to avoid
> + * large memory consumption and the below bpf_mem_alloc_percpu_unit_init()
> + * should be used to do on-demand per-cpu allocation for each size.
>    */
>   int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu);
> +/* Initialize a non-fix-size percpu memory allocator */
> +int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma);
> +/* The percpu allocation with a specific unit size. */
> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
>   void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
>   
>   /* kmalloc/kfree equivalent: */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index c34513d645c4..4a9177770f93 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -64,8 +64,8 @@
>   #define OFF	insn->off
>   #define IMM	insn->imm
>   
> -struct bpf_mem_alloc bpf_global_ma;
> -bool bpf_global_ma_set;
> +struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>   
>   /* No hurry in this branch
>    *
> @@ -2938,7 +2938,9 @@ static int __init bpf_global_ma_init(void)
>   
>   	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
>   	bpf_global_ma_set = !ret;
> -	return ret;
> +	ret = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma);
> +	bpf_global_percpu_ma_set = !ret;
> +	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
>   }
>   late_initcall(bpf_global_ma_init);
>   #endif
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 472158f1fb08..aea4cd07c7b6 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -121,6 +121,8 @@ struct bpf_mem_caches {
>   	struct bpf_mem_cache cache[NUM_CACHES];
>   };
>   
> +static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096};
> +
>   static struct llist_node notrace *__llist_del_first(struct llist_head *head)
>   {
>   	struct llist_node *entry, *next;
> @@ -520,12 +522,14 @@ static int check_obj_size(struct bpf_mem_cache *c, unsigned int idx)
>    */
>   int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>   {
> -	static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096};
>   	int cpu, i, err, unit_size, percpu_size = 0;
>   	struct bpf_mem_caches *cc, __percpu *pcc;
>   	struct bpf_mem_cache *c, __percpu *pc;
>   	struct obj_cgroup *objcg = NULL;
>   
> +	if (percpu && size == 0)
> +		return -EINVAL;
> +
>   	/* room for llist_node and per-cpu pointer */
>   	if (percpu)
>   		percpu_size = LLIST_NODE_SZ + sizeof(void *);
> @@ -625,6 +629,68 @@ static void bpf_mem_alloc_destroy_cache(struct bpf_mem_cache *c)
>   	drain_mem_cache(c);
>   }
>   
> +int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma)
> +{
> +	struct bpf_mem_caches __percpu *pcc;
> +
> +	pcc = __alloc_percpu_gfp(sizeof(struct bpf_mem_caches), 8, GFP_KERNEL | __GFP_ZERO);
> +	if (!pcc)
> +		return -ENOMEM;
> +
> +	ma->caches = pcc;
> +	ma->percpu = true;
> +	return 0;
> +}
> +
> +int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
> +{
> +	static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096};

Sorry, a oversight here. The above should be removed. Will fix in the next revision.

> +	int cpu, i, err, unit_size, percpu_size = 0;
> +	struct bpf_mem_caches *cc, __percpu *pcc;
> +	struct obj_cgroup *objcg = NULL;
> +	struct bpf_mem_cache *c;
> +
> +	/* room for llist_node and per-cpu pointer */
> +	percpu_size = LLIST_NODE_SZ + sizeof(void *);
> +
> +	i = bpf_mem_cache_idx(size);
> +	if (i < 0)
> +		return -EINVAL;
> +
> +	err = 0;
> +	pcc = ma->caches;
> +	unit_size = sizes[i];
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +	objcg = get_obj_cgroup_from_current();
> +#endif
> +	for_each_possible_cpu(cpu) {
> +		cc = per_cpu_ptr(pcc, cpu);
> +		c = &cc->cache[i];
> +		if (cpu == 0 && c->unit_size)
> +			goto out;
> +
> +		c->unit_size = unit_size;
> +		c->objcg = objcg;
> +		c->percpu_size = percpu_size;
> +		c->tgt = c;
> +
> +		init_refill_work(c);
> +		prefill_mem_cache(c, cpu);
> +
> +		if (cpu == 0) {
> +			err = check_obj_size(c, i);
> +			if (err) {
> +				bpf_mem_alloc_destroy_cache(c);
> +				goto out;
> +			}
> +		}
> +	}
> +
> +out:
> +	return err;
> +}
> +
>   static void check_mem_cache(struct bpf_mem_cache *c)
>   {
>   	WARN_ON_ONCE(!llist_empty(&c->free_by_rcu_ttrace));
[...]

  reply	other threads:[~2023-12-15  2:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15  0:11 [PATCH bpf-next v2 0/6] bpf: Reduce memory usage for bpf_global_percpu_ma Yonghong Song
2023-12-15  0:11 ` [PATCH bpf-next v2 1/6] bpf: Refactor to have a memalloc cache destroying function Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 2/6] bpf: Avoid unnecessary extra percpu memory allocation Yonghong Song
2023-12-15  3:40   ` Hou Tao
2023-12-15  0:12 ` [PATCH bpf-next v2 3/6] bpf: Allow per unit prefill for non-fix-size percpu memory allocator Yonghong Song
2023-12-15  2:45   ` Yonghong Song [this message]
2023-12-15  3:19   ` Hou Tao
2023-12-15  6:50     ` Yonghong Song
2023-12-15  7:27       ` Yonghong Song
2023-12-15  7:40         ` Hou Tao
2023-12-15 14:20           ` Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 4/6] bpf: Refill only one percpu element in memalloc Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 5/6] bpf: Limit up to 512 bytes for bpf_global_percpu_ma allocation Yonghong Song
2023-12-15  0:12 ` [PATCH bpf-next v2 6/6] selftests/bpf: Cope with 512 bytes limit with bpf_global_percpu_ma Yonghong Song
2023-12-15  3:33   ` Hou Tao
2023-12-15  7:38     ` Yonghong Song
2023-12-15  7:51       ` Hou Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44612e98-ff40-432e-80db-510c8817bd13@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox