From: John Fastabend <john.fastabend@gmail.com>
To: Hou Tao <houtao@huaweicloud.com>,
bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@linux.dev>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
rcu@vger.kernel.org, houtao1@huawei.com
Subject: RE: [PATCH bpf-next v8] selftests/bpf: Add benchmark for bpf memory allocator
Date: Mon, 03 Jul 2023 14:52:09 -0700 [thread overview]
Message-ID: <64a34309e42aa_652052084f@john.notmuch> (raw)
In-Reply-To: <20230703141332.3319271-1-houtao@huaweicloud.com>
Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> The benchmark could be used to compare the performance of hash map
> operations and the memory usage between different flavors of bpf memory
> allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also
> could be used to check the performance improvement or the memory saving
> provided by optimization.
>
> The benchmark creates a non-preallocated hash map which uses bpf memory
> allocator and shows the operation performance and the memory usage of
> the hash map under different use cases:
> (1) overwrite
> Each CPU overwrites nonoverlapping part of hash map. When each CPU
> completes overwriting of 64 elements in hash map, it increases the
> op_count.
> (2) batch_add_batch_del
> Each CPU adds then deletes nonoverlapping part of hash map in batch.
> When each CPU adds and deletes 64 elements in hash map, it increases
> the op_count twice.
> (3) add_del_on_diff_cpu
> Each two-CPUs pair adds and deletes nonoverlapping part of map
> cooperatively. When each CPU adds or deletes 64 elements in hash map,
> it will increase the op_count.
>
> The following is the benchmark results when comparing between different
> flavors of bpf memory allocator. These tests are conducted on a KVM guest
> with 8 CPUs and 16 GB memory. The command line below is used to do all
> the following benchmarks:
>
> ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8
>
> These results show that preallocated hash map has both better performance
> and smaller memory footprint.
>
> (1) non-preallocated + no bpf memory allocator (v6.0.19)
> use kmalloc() + call_rcu
>
> overwrite per-prod-op: 11.24 ± 0.07k/s, avg mem: 82.64 ± 26.32MiB, peak mem: 119.18MiB
> batch_add_batch_del per-prod-op: 18.45 ± 0.10k/s, avg mem: 50.47 ± 14.51MiB, peak mem: 94.96MiB
> add_del_on_diff_cpu per-prod-op: 14.50 ± 0.03k/s, avg mem: 4.64 ± 0.73MiB, peak mem: 7.20MiB
>
> (2) preallocated
> OPTS=--preallocated
>
> overwrite per-prod-op: 191.92 ± 0.07k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB
> batch_add_batch_del per-prod-op: 218.10 ± 0.25k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB
> add_del_on_diff_cpu per-prod-op: 39.59 ± 0.41k/s, avg mem: 1.48 ± 0.11MiB, peak mem: 1.74MiB
>
> (3) normal bpf memory allocator
>
> overwrite per-prod-op: 134.81 ± 0.22k/s, avg mem: 1.67 ± 0.12MiB, peak mem: 2.74MiB
> batch_add_batch_del per-prod-op: 90.44 ± 0.34k/s, avg mem: 2.27 ± 0.00MiB, peak mem: 2.74MiB
> add_del_on_diff_cpu per-prod-op: 28.20 ± 0.15k/s, avg mem: 1.73 ± 0.17MiB, peak mem: 2.06MiB
Acked-by: John Fastabend <john.fastabend@gmail.com>
> +
> +static error_t htab_mem_parse_arg(int key, char *arg, struct argp_state *state)
> +{
> + switch (key) {
> + case ARG_VALUE_SIZE:
> + args.value_size = strtoul(arg, NULL, 10);
> + if (args.value_size > 4096) {
> + fprintf(stderr, "too big value size %u\n", args.value_size);
> + argp_usage(state);
> + }
> + break;
> + case ARG_USE_CASE:
> + args.use_case = strdup(arg);
might be worth checking for null and returning an error? Only matters if we
run from CI or something and then this looks like a flake.
> + break;
> + case ARG_PREALLOCATED:
> + args.preallocated = true;
> + break;
> + default:
> + return ARGP_ERR_UNKNOWN;
> + }
> +
> + return 0;
> +}
next prev parent reply other threads:[~2023-07-03 21:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-03 14:13 [PATCH bpf-next v8] selftests/bpf: Add benchmark for bpf memory allocator Hou Tao
2023-07-03 21:52 ` John Fastabend [this message]
2023-07-04 1:13 ` Hou Tao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64a34309e42aa_652052084f@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=houtao1@huawei.com \
--cc=houtao@huaweicloud.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.