From: Roman Gushchin <guro@fb.com>
To: Alexei Starovoitov <ast@kernel.org>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 1/4] bpf: enable program stats
Date: Sat, 23 Feb 2019 00:34:38 +0000 [thread overview]
Message-ID: <20190223003434.GA17559@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190222233644.1487087-2-ast@kernel.org>
On Fri, Feb 22, 2019 at 03:36:41PM -0800, Alexei Starovoitov wrote:
> JITed BPF programs are indistinguishable from kernel functions, but unlike
> kernel code BPF code can be changed often.
> Typical approach of "perf record" + "perf report" profiling and tunning of
> kernel code works just as well for BPF programs, but kernel code doesn't
> need to be monitored whereas BPF programs do.
> Users load and run large amount of BPF programs.
> These BPF stats allow tools monitor the usage of BPF on the server.
> The monitoring tools will turn sysctl kernel.bpf_stats_enabled
> on and off for few seconds to sample average cost of the programs.
> Aggregated data over hours and days will provide an insight into cost of BPF
> and alarms can trigger in case given program suddenly gets more expensive.
>
> The cost of two sched_clock() per program invocation adds ~20 nsec.
> Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
> from ~40 nsec to ~60 nsec.
> static_key minimizes the cost of the stats collection.
> There is no measurable difference before/after this patch
> with kernel.bpf_stats_enabled=0
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
> include/linux/bpf.h | 7 +++++++
> include/linux/filter.h | 16 +++++++++++++++-
> kernel/bpf/core.c | 13 +++++++++++++
> kernel/bpf/syscall.c | 24 ++++++++++++++++++++++--
> kernel/bpf/verifier.c | 5 +++++
> kernel/sysctl.c | 34 ++++++++++++++++++++++++++++++++++
> 6 files changed, 96 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index de18227b3d95..14260674bc57 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -340,6 +340,11 @@ enum bpf_cgroup_storage_type {
>
> #define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
>
> +struct bpf_prog_stats {
> + u64 cnt;
> + u64 nsecs;
> +};
> +
> struct bpf_prog_aux {
> atomic_t refcnt;
> u32 used_map_cnt;
> @@ -389,6 +394,7 @@ struct bpf_prog_aux {
> * main prog always has linfo_idx == 0
> */
> u32 linfo_idx;
> + struct bpf_prog_stats __percpu *stats;
> union {
> struct work_struct work;
> struct rcu_head rcu;
> @@ -559,6 +565,7 @@ void bpf_map_area_free(void *base);
> void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
>
> extern int sysctl_unprivileged_bpf_disabled;
> +extern int sysctl_bpf_stats_enabled;
>
> int bpf_map_new_fd(struct bpf_map *map, int flags);
> int bpf_prog_new_fd(struct bpf_prog *prog);
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index f32b3eca5a04..7b14235b6f7d 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -533,7 +533,21 @@ struct sk_filter {
> struct bpf_prog *prog;
> };
>
> -#define BPF_PROG_RUN(filter, ctx) ({ cant_sleep(); (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
> +DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> +
> +#define BPF_PROG_RUN(prog, ctx) ({ \
> + u32 ret; \
> + cant_sleep(); \
> + if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
> + u64 start = sched_clock(); \
> + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> + this_cpu_inc(prog->aux->stats->cnt); \
> + this_cpu_add(prog->aux->stats->nsecs, \
> + sched_clock() - start); \
> + } else { \
> + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> + } \
> + ret; })
Can we bump "cnt" unconditionally and gate only the "nsecs" calculation?
Why it might be useful?
If the performance cost is too high to keep the statistics gathering
always enabled, it will be possible to measure the average cost
periodically and extrapolate.
Thanks!
next prev parent reply other threads:[~2019-02-23 0:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-22 23:36 [PATCH bpf-next 0/4] bpf: per program stats Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 1/4] bpf: enable " Alexei Starovoitov
2019-02-23 0:05 ` Eric Dumazet
2019-02-23 2:29 ` Alexei Starovoitov
2019-02-23 0:34 ` Roman Gushchin [this message]
2019-02-23 1:06 ` Daniel Borkmann
2019-02-23 2:38 ` Alexei Starovoitov
2019-02-25 10:36 ` Daniel Borkmann
2019-02-26 4:27 ` Alexei Starovoitov
2019-02-26 15:29 ` Daniel Borkmann
2019-02-27 4:05 ` Alexei Starovoitov
2019-02-23 2:31 ` Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 2/4] bpf: expose program stats via bpf_prog_info Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 3/4] tools/bpf: sync bpf.h into tools Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 4/4] tools/bpftool: recognize bpf_prog_info runtime and runcnt Alexei Starovoitov
2019-02-23 4:17 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190223003434.GA17559@tower.DHCP.thefacebook.com \
--to=guro@fb.com \
--cc=Kernel-team@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).