RE: [PATCH bpf-next v8] selftests/bpf: Add benchmark for bpf memory allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: Hou Tao <houtao@huaweicloud.com>,
	 bpf@vger.kernel.org,  Martin KaFai Lau <martin.lau@linux.dev>,
	 Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>,  Song Liu <song@kernel.org>,
	 Hao Luo <haoluo@google.com>,  Yonghong Song <yhs@fb.com>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	 KP Singh <kpsingh@kernel.org>,
	 Stanislav Fomichev <sdf@google.com>,
	 Jiri Olsa <jolsa@kernel.org>,
	 John Fastabend <john.fastabend@gmail.com>,
	 "Paul E . McKenney" <paulmck@kernel.org>,
	 rcu@vger.kernel.org,  houtao1@huawei.com
Subject: RE: [PATCH bpf-next v8] selftests/bpf: Add benchmark for bpf memory allocator
Date: Mon, 03 Jul 2023 14:52:09 -0700	[thread overview]
Message-ID: <64a34309e42aa_652052084f@john.notmuch> (raw)
In-Reply-To: <20230703141332.3319271-1-houtao@huaweicloud.com>

Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> The benchmark could be used to compare the performance of hash map
> operations and the memory usage between different flavors of bpf memory
> allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also
> could be used to check the performance improvement or the memory saving
> provided by optimization.
> 
> The benchmark creates a non-preallocated hash map which uses bpf memory
> allocator and shows the operation performance and the memory usage of
> the hash map under different use cases:
> (1) overwrite
> Each CPU overwrites nonoverlapping part of hash map. When each CPU
> completes overwriting of 64 elements in hash map, it increases the
> op_count.
> (2) batch_add_batch_del
> Each CPU adds then deletes nonoverlapping part of hash map in batch.
> When each CPU adds and deletes 64 elements in hash map, it increases
> the op_count twice.
> (3) add_del_on_diff_cpu
> Each two-CPUs pair adds and deletes nonoverlapping part of map
> cooperatively. When each CPU adds or deletes 64 elements in hash map,
> it will increase the op_count.
> 
> The following is the benchmark results when comparing between different
> flavors of bpf memory allocator. These tests are conducted on a KVM guest
> with 8 CPUs and 16 GB memory. The command line below is used to do all
> the following benchmarks:
> 
>   ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8
> 
> These results show that preallocated hash map has both better performance
> and smaller memory footprint.
> 
> (1) non-preallocated + no bpf memory allocator (v6.0.19)
> use kmalloc() + call_rcu
> 
> overwrite            per-prod-op: 11.24 ± 0.07k/s, avg mem: 82.64 ± 26.32MiB, peak mem: 119.18MiB
> batch_add_batch_del  per-prod-op: 18.45 ± 0.10k/s, avg mem: 50.47 ± 14.51MiB, peak mem: 94.96MiB
> add_del_on_diff_cpu  per-prod-op: 14.50 ± 0.03k/s, avg mem: 4.64 ± 0.73MiB, peak mem: 7.20MiB
> 
> (2) preallocated
> OPTS=--preallocated
> 
> overwrite            per-prod-op: 191.92 ± 0.07k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB
> batch_add_batch_del  per-prod-op: 218.10 ± 0.25k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB
> add_del_on_diff_cpu  per-prod-op: 39.59 ± 0.41k/s, avg mem: 1.48 ± 0.11MiB, peak mem: 1.74MiB
> 
> (3) normal bpf memory allocator
> 
> overwrite            per-prod-op: 134.81 ± 0.22k/s, avg mem: 1.67 ± 0.12MiB, peak mem: 2.74MiB
> batch_add_batch_del  per-prod-op: 90.44 ± 0.34k/s, avg mem: 2.27 ± 0.00MiB, peak mem: 2.74MiB
> add_del_on_diff_cpu  per-prod-op: 28.20 ± 0.15k/s, avg mem: 1.73 ± 0.17MiB, peak mem: 2.06MiB

Acked-by: John Fastabend <john.fastabend@gmail.com>

> +
> +static error_t htab_mem_parse_arg(int key, char *arg, struct argp_state *state)
> +{
> +	switch (key) {
> +	case ARG_VALUE_SIZE:
> +		args.value_size = strtoul(arg, NULL, 10);
> +		if (args.value_size > 4096) {
> +			fprintf(stderr, "too big value size %u\n", args.value_size);
> +			argp_usage(state);
> +		}
> +		break;
> +	case ARG_USE_CASE:
> +		args.use_case = strdup(arg);

might be worth checking for null and returning an error? Only matters if we
run from CI or something and then this looks like a flake.

> +		break;
> +	case ARG_PREALLOCATED:
> +		args.preallocated = true;
> +		break;
> +	default:
> +		return ARGP_ERR_UNKNOWN;
> +	}
> +
> +	return 0;
> +}

next prev parent reply	other threads:[~2023-07-03 21:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03 14:13 [PATCH bpf-next v8] selftests/bpf: Add benchmark for bpf memory allocator Hou Tao
2023-07-03 21:52 ` John Fastabend [this message]
2023-07-04  1:13   ` Hou Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64a34309e42aa_652052084f@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=houtao@huaweicloud.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.