All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Wangnan (F)" <wangnan0@huawei.com>
To: Brendan Gregg <brendan.d.gregg@gmail.com>,
	"linux-perf-use." <linux-perf-users@vger.kernel.org>,
	Alexei Starovoitov <ast@plumgrid.com>
Subject: Re: linux 4.4, perf & BPF, and bpf_perf_event_output
Date: Tue, 12 Jan 2016 10:36:41 +0800	[thread overview]
Message-ID: <569466B9.2040702@huawei.com> (raw)
In-Reply-To: <CAE40pde1XCYDCRXgThopk_wiDN_yw26BWLGgL4iaFnkWCkpW-Q@mail.gmail.com>



On 2016/1/12 8:07, Brendan Gregg wrote:
> G'Day,
>
> Congrats Wang Nan on the new perf bpf features in 4.4! I just
> revisited where I was at with them now that it's official, and have a
> couple of questions.
>
> So I can run something like this:
>
> ---syncsnoop.c---
> /* perf.data: default event record only; trace_pipe:
> bpf_trace_printk() output */
> #include <uapi/linux/bpf.h>
> #include "bpf_helpers.h"
>
> SEC("func=sys_sync")
> int bpf_func__sys_sync(void *ctx)
> {
>      char fmt[] = "sync()\n";
>      bpf_trace_printk(fmt, sizeof (fmt));
>      return 1;
> };
>
> char _license[] SEC("license") = "GPL";
> int _version SEC("version") = LINUX_VERSION_CODE;
> ------
>
> With: perf record -a --clang-opt "-DLINUX_VERSION_CODE=0x40400"
> --event syncsnoop.c sleep 5
>
> And it works. I get PERF_RECORD_SAMPLE entries in perf.data, and my
> custom strings in /sys/kernel/debug/tracing/trace_pipe.
>
> Q1. Given bpf_trace_printk() & trace_pipe is a hack, is there a way
> yet to print my custom strings to perf.data? Eg, something like
> bpf_perf_event_output() to change the format string of the event.
> Currently I just get:
>
> # perf script
>              sync 30207 [002]  2509.576984: perf_bpf_probe:func:
> (ffffffff81221b20)
>              sync 30209 [003]  2509.766632: perf_bpf_probe:func:
> (ffffffff81221b20)
>              sync 30229 [003]  2509.936299: perf_bpf_probe:func:
> (ffffffff81221b20)
>              sync 30230 [004]  2510.099059: perf_bpf_probe:func:
> (ffffffff81221b20)
>              sync 30231 [004]  2510.289351: perf_bpf_probe:func:
> (ffffffff81221b20)
Yes. I have implemented this feature. Patch has posted, but not
in 4.4. I hope you will be able to use this feature in v4.5.
It depends on Arnaldo.

There is a small example at commit message of [1]. The basic workflow is:

  1. Create a bpf-output map in your BPF file
  2. Output data to it by bpf_perf_event_output in BPF source
  3. Create bpf-output event in perf cmdline
  4. Use perf-CTF conversion
  5. User babeltrace to see your output in raw format
  6. Use python script for decoding

Currently you are unable to see string in normal 'perf script' output.
I sent a mail for suggestion about perf script output format in [2]
at Oct. 28. Please check your mailbox.

At the end of this mail I provide some detail description.

[1] 
http://lkml.kernel.org/g/1452520124-2073-26-git-send-email-wangnan0@huawei.com
[2] http://lkml.kernel.org/g/5630AC2A.7030308@huawei.com

> Q2. Apart from per-event strings, is there a way to emit the maps?
> bpf-script-example.c populates a map, but I don't think it ends up in
> perf.data. I think a common case would be to emit it once for the run.
> Another common case would be to emit it once per second. Any way we
> can do this? I did try bpf_perf_event_output(), but I don't think it
> ended up in perf.data (at least, I don't see it using "perf script
> -D").

Not support yet. Map data don't go to 'perf.data' at all. But I have
another idea.

I'm thinking about linking user-mode BPF into perf, then we can put all
things together into one BPF source, including:

data sampling, filtering, accumulation, pretty-printing

makes it similar to dtrace.

Here's an example. In this example we dump a perf.data output when
we found a write system call takes longer than expected (in my recent
patch series perf can act as a flight recorder), then print the
map data we collected during recording.

/* Define message passed between BPF script and perf */
enum perf_cmd {
   DUMP,
}
enum perf_cmd cmd;

/* define histogram map */
SEC("map")
struct bpf_map_def hist_map = { ... };

SEC("func=sys_write")
int sys_write_enter(...) {  }

SEC("func=sys_write%return")
int sys_write_exit(...) {
   enum perf_cmd cmd;

   /* check how long this syscall takes */
   /* maintain histogram */
   if (syscall_too_long) {
     cmd = DUMP;
     bpf_perf_event_output(..., &cmd, ...);   <-- trigger a dump
   }
}

/* Following are user mode BPF scripts which are run in perf context */

SEC("mode=perf;action=recv_bpf_msg")
int perf_recv_bpf_msg(void *pcmd)
{
    ...
    switch (*pcmd) {
      ...
      case DUMP:
        /* make perf dump a perf.data. */
        perf_hist_output(&hist_map);  /* let perf output histogram */
        ...
    }
}

SEC("mode=perf;action=perf_end")
int perf_end(...)
{
   print_the_final_result(&hist_map);
   ...
}

Thank you.

> thanks,
>
> Brendan

Ddetail description about perf event output:

First, you need create a bpf-output map in your BPF file:

  struct bpf_map_def SEC("maps") channel = {
      .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
      .key_size = sizeof(int),
      .value_size = sizeof(u32),
      .max_entries = __NR_CPUS__,
  };

Then you output data to it by bpf_perf_event_output (this is its name in 
samples/bpf/bpf_helper.h. In my
commit message it is perf_event_output):

  SEC("func=sys_write")
  int bpf_func__sys_write(void *ctx)
  {
      char fmt[] = "write()\n";
      bpf_perf_event_output(ctx, &channel,
                            bpf_get_smp_processor_id(),
                            fmt, sizeof (fmt));
      return 1;
  };

When running perf you need create the channel for it by opening a 
bpf-output event and connect it with
'channel' map you defined in your BPF source file:

# perf record -e evt=bpf-output/no-inherit/ -e 
./test.c/maps:channel.event=evt/ ls

Then in perf.data you will see two samples, one is the function you 
probe in (sys_write), another is the
BPF output.

# perf script
               ls 32115 1009664.006228: evt=bpf-output/no-inherit/: 
ffffffff811fcac1 sys_write ([kernel.kallsyms])
               ls 32115 [002] 1009664.006229: perf_bpf_probe:func: 
(ffffffff811fcac0)

You can't see the string you output now, because perf script doesn't support
decoding data you passed through bpf-output event. Now you can see your
string by perf script by 'perf script -D' like this:

  . ... raw event: size 56 bytes
  .  0000:  09 00 00 00 01 00 38 00 01 16 00 00 00 00 00 00 ......8.........
  .  0010:  c1 ca 1f 81 ff ff ff ff 73 7d 00 00 73 7d 00 00 ........s}..s}..
  .  0020:  bb 09 6f b8 48 96 03 00 0c 00 00 00 77 72 69 74 
..o.H.......writ      <--- *HERE*
  .  0030:  65 28 29 0a 00 00 00 00                          e().....
  .
  4294967295 1009664006228411 0x2ed0 [0x38]: PERF_RECORD_SAMPLE(IP, 
0x1): 32115/32115: 0xffffffff811fcac1 period: 1 addr: 0
   ... thread: ls:32115
   ...... dso: 
/home/wangnan/.debug/.build-id/fb/0f83d011364583af77e563a390dfc1b78cce8d
                ls 32115 1009664.006228: evt=bpf-output/no-inherit/: 
ffffffff811fcac1 sys_write ([kernel.kallsyms])

For me, I prefer CTF conversion:

  # ~/perf data convert --to-ctf ./out.ctf
  [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
  [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]
  # babeltrace ./out.ctf/
  [16:27:44.006228411] (+?.?????????) evt=bpf-output/no-inherit/: { 
cpu_id = 0 }, { perf_ip = 0xFFFFFFFF811FCAC1, perf_tid = 32115, perf_pid 
= 32115, perf_id = 5633,  raw_len = 3, raw_data = [ [0] = 0x74697277, 
[1] = 0xA292865, [2] = 0x0 ] }
  [16:27:44.006229678] (+0.000001267) perf_bpf_probe:func: { cpu_id = 2 
}, { perf_ip = 0xFFFFFFFF811FCAC1, perf_tid = 32115, perf_pid = 32115, 
perf_id = 5641, perf_period = 1, common_type = 1177, common_flags = 1, 
common_preempt_count = 0, common_pid = 32115, _probe_ip = 
0xFFFFFFFF811FCAC0 }

Your string is there:

  raw_data = [ [0] = 0x74697277, [1] = 0xA292865, [2] = 0x0 ] }

Still not good, but don't forget we have babeltrace python binding. A
simple python script can decode it for you.

  reply	other threads:[~2016-01-12  2:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-12  0:07 linux 4.4, perf & BPF, and bpf_perf_event_output Brendan Gregg
2016-01-12  2:36 ` Wangnan (F) [this message]
2016-01-12 15:27   ` Arnaldo Carvalho de Melo
2016-01-12 20:56   ` Brendan Gregg
2016-01-13  2:54     ` Wangnan (F)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569466B9.2040702@huawei.com \
    --to=wangnan0@huawei.com \
    --cc=ast@plumgrid.com \
    --cc=brendan.d.gregg@gmail.com \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.