From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: He Kuang <hekuang@huawei.com>,
ast@plumgrid.com, rostedt@goodmis.org, mingo@redhat.com,
acme@redhat.com
Cc: wangnan0@huawei.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] bpf: Put perf_events check ahead of bpf prog
Date: Sat, 27 Jun 2015 17:52:25 +0900 [thread overview]
Message-ID: <558E6449.8080103@hitachi.com> (raw)
In-Reply-To: <1435387495-141813-1-git-send-email-hekuang@huawei.com>
On 2015/06/27 15:44, He Kuang wrote:
> When we add a kprobe point and record events by perf, the execution path
> of all threads on each cpu will enter this point, but perf may only
> record events on a particular thread or cpu at this kprobe point, a
> check on call->perf_events list filters out the threads which perf is
> not recording.
>
> Currently, bpf_prog will be entered at the beginning of
> kprobe_perf_func() before the above check, which makes bpf_prog be
> executed in every threads including determined not to be recorded
> threads. A simple test can demonstrate this:
>
> 'bpf_prog_on_write.o' contains a bpf prog which outputs to trace buffer
> when it is entered. Run a background thread 'another-dd' and 'dd'
> simultaneously, but only record 'dd' thread by perf. The result shows
> all threads trigger bpf_prog.
>
> $ another-dd if=/dev/zero of=test1 bs=4k count=1000000
> $ perf record -v --event bpf_prog_on_write.o -- dd if=/dev/zero of=test2 bs=4k count=3
> $ cat /sys/kernel/debug/tracing/trace
> another-dd-1007 [000] d... 120.225835: : generic_perform_write: tgid=1007, pid=1007
> another-dd-1007 [000] d... 120.227123: : generic_perform_write: tgid=1007, pid=1007
> [repeat many times...]
> another-dd-1007 [000] d... 120.412395: : generic_perform_write: tgid=1007, pid=1007
> another-dd-1007 [000] d... 120.412524: : generic_perform_write: tgid=1007, pid=1007
> dd-1009 [000] d... 120.413080: : generic_perform_write: tgid=1009, pid=1009
> dd-1009 [000] d... 120.414846: : generic_perform_write: tgid=1009, pid=1009
> dd-1009 [000] d... 120.415013: : generic_perform_write: tgid=1009, pid=1009
> another-dd-1007 [000] d... 120.416128: : generic_perform_write: tgid=1007, pid=1007
> another-dd-1007 [000] d... 120.416295: : generic_perform_write: tgid=1007, pid=1007
>
> This patch moves the check on perf_events list ahead and skip running
> bpf_prog on threads perf not care.
>
> After this patch:
>
> $ another-dd if=/dev/zero of=test1 bs=4k count=1000000
> $ perf record -v --event bpf_prog_on_write.o -- dd if=/dev/zero of=test2 bs=4k count=3
> $ cat /sys/kernel/debug/tracing/trace
> dd-994 [000] d... 46.386754: : generic_perform_write: tgid=994, pid=994
> dd-994 [000] d... 46.389167: : generic_perform_write: tgid=994, pid=994
> dd-994 [000] d... 46.389551: : generic_perform_write: tgid=994, pid=994
This looks nice :)
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Thank you!
>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> ---
> kernel/trace/trace_kprobe.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index d0ce590..5600df8 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1141,13 +1141,13 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
> int size, __size, dsize;
> int rctx;
>
> - if (prog && !trace_call_bpf(prog, regs))
> - return;
> -
> head = this_cpu_ptr(call->perf_events);
> if (hlist_empty(head))
> return;
>
> + if (prog && !trace_call_bpf(prog, regs))
> + return;
> +
> dsize = __get_data_size(&tk->tp, regs);
> __size = sizeof(*entry) + tk->tp.size + dsize;
> size = ALIGN(__size + sizeof(u32), sizeof(u64));
> @@ -1176,13 +1176,13 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
> int size, __size, dsize;
> int rctx;
>
> - if (prog && !trace_call_bpf(prog, regs))
> - return;
> -
> head = this_cpu_ptr(call->perf_events);
> if (hlist_empty(head))
> return;
>
> + if (prog && !trace_call_bpf(prog, regs))
> + return;
> +
> dsize = __get_data_size(&tk->tp, regs);
> __size = sizeof(*entry) + tk->tp.size + dsize;
> size = ALIGN(__size + sizeof(u32), sizeof(u64));
>
--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu.pt@hitachi.com
prev parent reply other threads:[~2015-06-27 8:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-27 6:44 [PATCH] bpf: Put perf_events check ahead of bpf prog He Kuang
2015-06-27 7:55 ` Alexei Starovoitov
2015-06-27 8:52 ` Masami Hiramatsu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558E6449.8080103@hitachi.com \
--to=masami.hiramatsu.pt@hitachi.com \
--cc=acme@redhat.com \
--cc=ast@plumgrid.com \
--cc=hekuang@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=wangnan0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.