From: Alexei Starovoitov <ast@fb.com>
To: "David S . Miller" <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
Daniel Wagner <daniel.wagner@bmw-carit.de>,
Tom Zanussi <tom.zanussi@linux.intel.com>,
Wang Nan <wangnan0@huawei.com>, He Kuang <hekuang@huawei.com>,
Martin KaFai Lau <kafai@fb.com>,
Brendan Gregg <brendan.d.gregg@gmail.com>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<kernel-team@fb.com>
Subject: [PATCH net-next 0/9] bpf: hash map pre-alloc
Date: Sun, 6 Mar 2016 17:58:28 -0800 [thread overview]
Message-ID: <1457315917-1970307-1-git-send-email-ast@fb.com> (raw)
Hi,
this path set switches bpf hash map to use pre-allocation by default
and introduces BPF_F_NO_PREALLOC flag to keep old behavior for cases
where full map pre-allocation is too memory expensive.
Some time back Daniel Wagner reported crashes when bpf hash map is
used to compute time intervals between preempt_disable->preempt_enable
and recently Tom Zanussi reported a dead lock in iovisor/bcc/funccount
tool if it's used to count the number of invocations of kernel
'*spin*' functions. Both problems are due to the recursive use of
slub and can only be solved by pre-allocating all map elements.
A lot of different solutions were considered. Many implemented,
but at the end pre-allocation seems to be the only feasible answer.
As far as pre-allocation goes it also was implemented 4 different ways:
- simple free-list with single lock
- percpu_ida with optimizations
- blk-mq-tag variant customized for bpf use case
- percpu_freelist
For bpf style of alloc/free patterns percpu_freelist is the best
and implemented in this patch set.
Detailed performance numbers in patch 3.
Patch 2 introduces percpu_freelist
Patch 1 fixes simple deadlocks due to missing recursion checks
Patches 4-7: prepare test infra
Patch 8: stress test for hash map infra. It attaches to spin_lock
functions and bpf_map_update/delete are called from different contexts
(except nmi, which is unsupported by bpf still)
Patch 9: map performance test
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Alexei Starovoitov (9):
bpf: prevent kprobe+bpf deadlocks
bpf: introduce percpu_freelist
bpf: pre-allocate hash map elements
samples/bpf: make map creation more verbose
samples/bpf: move ksym_search() into library
samples/bpf: add map_flags to bpf loader
samples/bpf: test both pre-alloc and normal maps
samples/bpf: add bpf map stress test
samples/bpf: add map performance test
include/linux/bpf.h | 4 +
include/uapi/linux/bpf.h | 3 +
kernel/bpf/Makefile | 2 +-
kernel/bpf/hashtab.c | 264 ++++++++++++++++++++++++++++-----------
kernel/bpf/percpu_freelist.c | 81 ++++++++++++
kernel/bpf/percpu_freelist.h | 31 +++++
kernel/bpf/syscall.c | 15 ++-
kernel/trace/bpf_trace.c | 2 -
samples/bpf/Makefile | 8 ++
samples/bpf/bpf_helpers.h | 1 +
samples/bpf/bpf_load.c | 70 ++++++++++-
samples/bpf/bpf_load.h | 6 +
samples/bpf/fds_example.c | 2 +-
samples/bpf/libbpf.c | 5 +-
samples/bpf/libbpf.h | 2 +-
samples/bpf/map_perf_test_kern.c | 100 +++++++++++++++
samples/bpf/map_perf_test_user.c | 155 +++++++++++++++++++++++
samples/bpf/offwaketime_user.c | 67 +---------
samples/bpf/sock_example.c | 2 +-
samples/bpf/spintest_kern.c | 59 +++++++++
samples/bpf/spintest_user.c | 50 ++++++++
samples/bpf/test_maps.c | 29 +++--
samples/bpf/test_verifier.c | 4 +-
23 files changed, 802 insertions(+), 160 deletions(-)
create mode 100644 kernel/bpf/percpu_freelist.c
create mode 100644 kernel/bpf/percpu_freelist.h
create mode 100644 samples/bpf/map_perf_test_kern.c
create mode 100644 samples/bpf/map_perf_test_user.c
create mode 100644 samples/bpf/spintest_kern.c
create mode 100644 samples/bpf/spintest_user.c
--
2.6.5
next reply other threads:[~2016-03-07 1:58 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-07 1:58 Alexei Starovoitov [this message]
2016-03-07 1:58 ` [PATCH net-next 1/9] bpf: prevent kprobe+bpf deadlocks Alexei Starovoitov
2016-03-07 10:07 ` Daniel Borkmann
2016-03-07 1:58 ` [PATCH net-next 2/9] bpf: introduce percpu_freelist Alexei Starovoitov
2016-03-07 10:33 ` Daniel Borkmann
2016-03-07 18:26 ` Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 3/9] bpf: pre-allocate hash map elements Alexei Starovoitov
2016-03-07 11:08 ` Daniel Borkmann
2016-03-07 18:29 ` Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 4/9] samples/bpf: make map creation more verbose Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 5/9] samples/bpf: move ksym_search() into library Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 6/9] samples/bpf: add map_flags to bpf loader Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 7/9] samples/bpf: test both pre-alloc and normal maps Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 8/9] samples/bpf: add bpf map stress test Alexei Starovoitov
2016-03-07 1:58 ` [PATCH net-next 9/9] samples/bpf: add map performance test Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1457315917-1970307-1-git-send-email-ast@fb.com \
--to=ast@fb.com \
--cc=brendan.d.gregg@gmail.com \
--cc=daniel.wagner@bmw-carit.de \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hekuang@huawei.com \
--cc=kafai@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tom.zanussi@linux.intel.com \
--cc=wangnan0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).