All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <hca@linux.ibm.com>
To: Yonghong Song <yonghong.song@linux.dev>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
	Marc Hartmayer <mhartmay@linux.ibm.com>,
	Mikhail Zaslonko <zaslonko@linux.ibm.com>,
	linux-s390@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
Date: Wed, 15 Nov 2023 16:31:39 +0100	[thread overview]
Message-ID: <20231115153139.29313-A-hca@linux.ibm.com> (raw)
In-Reply-To: <20230827152734.1995725-1-yonghong.song@linux.dev>

On Sun, Aug 27, 2023 at 08:27:34AM -0700, Yonghong Song wrote:
> This is needed for later percpu mem allocation when the
> allocation is done by bpf program. For such cases, a global
> bpf_global_percpu_ma is added where a flexible allocation
> size is needed.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h   |  4 ++--
>  kernel/bpf/core.c     |  8 +++++---
>  kernel/bpf/memalloc.c | 14 ++++++--------
>  3 files changed, 13 insertions(+), 13 deletions(-)

Both Marc and Mikhail reported out-of-memory conditions on s390 machines,
and bisected it down to this upstream commit 41a5db8d8161 ("bpf: Add
support for non-fix-size percpu mem allocation").
This seems to eat up a lot of memory only based on the number of possible
CPUs.

If we have a machine with 8GB, 6 present CPUs and 512 possible CPUs (yes,
this is a realistic scenario) the memory consumption directly after boot
is:

$ cat /sys/devices/system/cpu/present
0-5
$ cat /sys/devices/system/cpu/possible
0-511

Before this commit:

$ cat /proc/meminfo
MemTotal:        8141924 kB
MemFree:         7639872 kB

With this commit

$ cat /proc/meminfo
MemTotal:        8141924 kB
MemFree:         4852248 kB

So, this appears to be a significant regression.
I'm quoting the rest of the original patch below for reference only.

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 12596af59c00..144dbddf53bd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -55,8 +55,8 @@ struct cgroup;
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
>  extern struct kobject *btf_kobj;
> -extern struct bpf_mem_alloc bpf_global_ma;
> -extern bool bpf_global_ma_set;
> +extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>  
>  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 0f8f036d8bd1..95599df82ee4 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -64,8 +64,8 @@
>  #define OFF	insn->off
>  #define IMM	insn->imm
>  
> -struct bpf_mem_alloc bpf_global_ma;
> -bool bpf_global_ma_set;
> +struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>  
>  /* No hurry in this branch
>   *
> @@ -2921,7 +2921,9 @@ static int __init bpf_global_ma_init(void)
>  
>  	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
>  	bpf_global_ma_set = !ret;
> -	return ret;
> +	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
> +	bpf_global_percpu_ma_set = !ret;
> +	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
>  }
>  late_initcall(bpf_global_ma_init);
>  #endif
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 9c49ae53deaf..cb60445de98a 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -499,15 +499,16 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  	struct obj_cgroup *objcg = NULL;
>  	int cpu, i, unit_size, percpu_size = 0;
>  
> +	/* room for llist_node and per-cpu pointer */
> +	if (percpu)
> +		percpu_size = LLIST_NODE_SZ + sizeof(void *);
> +
>  	if (size) {
>  		pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL);
>  		if (!pc)
>  			return -ENOMEM;
>  
> -		if (percpu)
> -			/* room for llist_node and per-cpu pointer */
> -			percpu_size = LLIST_NODE_SZ + sizeof(void *);
> -		else
> +		if (!percpu)
>  			size += LLIST_NODE_SZ; /* room for llist_node */
>  		unit_size = size;
>  
> @@ -527,10 +528,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  		return 0;
>  	}
>  
> -	/* size == 0 && percpu is an invalid combination */
> -	if (WARN_ON_ONCE(percpu))
> -		return -EINVAL;
> -
>  	pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL);
>  	if (!pcc)
>  		return -ENOMEM;
> @@ -543,6 +540,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  			c = &cc->cache[i];
>  			c->unit_size = sizes[i];
>  			c->objcg = objcg;
> +			c->percpu_size = percpu_size;
>  			c->tgt = c;
>  			prefill_mem_cache(c, cpu);
>  		}
> -- 
> 2.34.1
> 

  parent reply	other threads:[~2023-11-15 15:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
2023-09-13 14:10   ` Hou Tao
2023-11-15 15:31   ` Heiko Carstens [this message]
2023-11-15 15:54     ` Alexei Starovoitov
2023-11-16  1:15     ` Hou Tao
2023-11-16  5:52       ` Yonghong Song
2023-11-16 13:54       ` Heiko Carstens
2023-08-27 15:27 ` [PATCH bpf-next v3 02/13] bpf: Add BPF_KPTR_PERCPU as a field type Yonghong Song
2023-08-27 15:27 ` [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
2023-09-06  0:40   ` Alexei Starovoitov
2023-08-27 15:27 ` [PATCH bpf-next v3 04/13] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
2023-08-27 15:27 ` [PATCH bpf-next v3 05/13] selftests/bpf: Update error message in negative linked_list test Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 06/13] libbpf: Add __percpu_kptr macro definition Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 07/13] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 08/13] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
2023-09-06  0:37   ` Alexei Starovoitov
2023-08-27 15:28 ` [PATCH bpf-next v3 10/13] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 11/13] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add some negative tests Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 13/13] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
2023-09-06  0:50 ` [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231115153139.29313-A-hca@linux.ibm.com \
    --to=hca@linux.ibm.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=mhartmay@linux.ibm.com \
    --cc=yonghong.song@linux.dev \
    --cc=zaslonko@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.