From: Andrii Nakryiko <andrii@kernel.org>
To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
martin.lau@kernel.org
Cc: andrii@kernel.org, kernel-team@meta.com
Subject: [PATCH bpf-next 0/4] Add internal-only BPF per-CPU instructions
Date: Fri, 29 Mar 2024 11:47:36 -0700 [thread overview]
Message-ID: <20240329184740.4084786-1-andrii@kernel.org> (raw)
Add two new BPF instructions for dealing with per-CPU memory.
One, BPF_LDX | BPF_ADDR_PERCPU | BPF_DW (where BPF_ADD_PERCPU is unused
0xe0 opcode), resolved provided per-CPU address (offset) to an absolute
address where per-CPU data resides for "this" CPU. This is the most universal,
and, strictly speaking, the only per-CPU BPF instruction necessary.
I also added BPF_LDX | BPF_MEM_PERCPU | BPF_{B,H,W,DW} (BPF_MEM_PERCPU using
another unused 0xc0 opcode), which can be considered an optimization
instruction, which allows to *read* per-CPU data up to 8 bytes in one
instruction, without having to first resolve the address and then
dereferencing the memory. This one is used in inlining of
bpf_get_smp_processor_id(), but it would be fine to implement the latter with
BPF_ADD_PERCPU, followed by normal BPF_LDX | BPF_MEM, so I'm fine dropping
this one, if requested.
This instructions are currently supported by x86-64 BPF JIT, but it would be
great if this was added for other arches ASAP, of course.
In either case, we also implement inlining for three cases:
- bpf_get_smp_processor_id(), which allows to avoid unnecessary trivial
function call, saving a bit of performance and also not polluting LBR
records with unnecessary function call/return records;
- PERCPU_ARRAY's bpf_map_lookup_elem() is completely inlined, bringing its
performance to implementing per-CPU data structures using global variables
in BPF (which is an awesome improvement, see benchmarks below);
- PERCPU_HASH's bpf_map_lookup_elem() is partially inlined, just like the
same for non-PERCPU HASH map; this still saves a bit of overhead.
To validate performance benefits, I hacked together a tiny benchmark doing
only bpf_map_lookup_elem() and incrementing the value by 1 for PERCPU_ARRAY
(arr-inc benchmark below) and PERCPU_HASH (hash-inc benchmark below) maps. To
establish a baseline, I also implemented logic similar to PERCPU_ARRAY based
on global variable array using bpf_get_smp_processor_id() to index array for
current CPU (glob-arr-inc benchmark below).
BEFORE
======
glob-arr-inc : 163.685 ± 0.092M/s
arr-inc : 138.096 ± 0.160M/s
hash-inc : 66.855 ± 0.123M/s
AFTER
=====
glob-arr-inc : 173.921 ± 0.039M/s (+6%)
arr-inc : 170.729 ± 0.210M/s (+23.7%)
hash-inc : 68.673 ± 0.070M/s (+2.7%)
As can be seen, PERCPU_HASH gets a modest +2.7% improvement, while global
array-based gets a nice +6% due to inlining of bpf_get_smp_processor_id().
But what's really important is that arr-inc benchmark basically catches up
with glob-arr-inc, resulting in +23.7% improvement. This means that in
practice it won't be necessary to avoid PERCPU_ARRAY anymore if performance is
critical (e.g., high-frequent stats collection, which is often a practical use
for PERCPU_ARRAY today).
Andrii Nakryiko (4):
bpf: add internal-only per-CPU LDX instructions
bpf: inline bpf_get_smp_processor_id() helper
bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps
bpf: inline bpf_map_lookup_elem() helper for PERCPU_HASH map
arch/x86/net/bpf_jit_comp.c | 29 +++++++++++++++++++++++++++++
include/linux/filter.h | 27 +++++++++++++++++++++++++++
kernel/bpf/arraymap.c | 33 +++++++++++++++++++++++++++++++++
kernel/bpf/core.c | 5 +++++
kernel/bpf/disasm.c | 33 ++++++++++++++++++++++++++-------
kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
kernel/bpf/verifier.c | 17 +++++++++++++++++
7 files changed, 158 insertions(+), 7 deletions(-)
--
2.43.0
next reply other threads:[~2024-03-29 18:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-29 18:47 Andrii Nakryiko [this message]
2024-03-29 18:47 ` [PATCH bpf-next 1/4] bpf: add internal-only per-CPU LDX instructions Andrii Nakryiko
2024-03-30 0:26 ` Stanislav Fomichev
2024-03-30 5:22 ` Andrii Nakryiko
2024-03-30 10:10 ` kernel test robot
2024-04-02 1:12 ` John Fastabend
2024-04-02 1:47 ` Andrii Nakryiko
2024-03-29 18:47 ` [PATCH bpf-next 2/4] bpf: inline bpf_get_smp_processor_id() helper Andrii Nakryiko
2024-03-29 20:27 ` Andrii Nakryiko
2024-03-29 23:41 ` Alexei Starovoitov
2024-03-30 5:16 ` Andrii Nakryiko
2024-03-30 9:37 ` kernel test robot
2024-03-30 10:53 ` kernel test robot
2024-03-30 20:49 ` kernel test robot
2024-03-29 18:47 ` [PATCH bpf-next 3/4] bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps Andrii Nakryiko
2024-03-29 18:47 ` [PATCH bpf-next 4/4] bpf: inline bpf_map_lookup_elem() helper for PERCPU_HASH map Andrii Nakryiko
2024-03-29 23:52 ` Alexei Starovoitov
2024-03-30 5:22 ` Andrii Nakryiko
2024-03-29 23:47 ` [PATCH bpf-next 0/4] Add internal-only BPF per-CPU instructions Alexei Starovoitov
2024-03-30 5:18 ` Andrii Nakryiko
2024-04-01 16:28 ` Eduard Zingerman
2024-04-01 22:54 ` Andrii Nakryiko
2024-04-02 9:13 ` Eduard Zingerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240329184740.4084786-1-andrii@kernel.org \
--to=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@meta.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.