* [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support
@ 2016-02-22 9:10 Wang Nan
2016-02-22 9:10 ` [PATCH 01/48] perf tools: Record text offset in dso to calculate objdump address Wang Nan
` (47 more replies)
0 siblings, 48 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Hi Arnaldo,
I change all 'maps:' to 'map:' in examples in commit messages and
rebase this patch set onto your newest perf/core.
The following changes since commit 9dd130f324b9ba2c7cf9292d5addb1aef100e751:
perf tools: Remove duplicate typedef config_term_func_t definition (2016-02-19 19:51:13 -0300)
are available in the git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux.git tags/perf-core-for-acme
for you to fetch changes up to 84f9a58d51acfc4d1677fc24dbc13c07f60d7b7d:
perf tools: Don't warn about out of order event if write_backward is used (2016-02-22 08:58:48 +0000)
----------------------------------------------------------------
Change 'maps:' to 'map:' in examples in commit messages.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
----------------------------------------------------------------
Wang Nan (48):
perf tools: Record text offset in dso to calculate objdump address
perf tools: Adjust symbol for shared objects
perf bpf: Add API to set values to map entries in a bpf object
perf tools: Enable BPF object configure syntax
perf record: Apply config to BPF objects before recording
perf tools: Enable passing event to BPF object
perf tools: Support setting different slots in a BPF map separately
perf tools: Enable indices setting syntax for BPF map
perf tools: Pass tracepoint options to BPF script
perf tools: Introduce bpf-output event
perf data: Support converting data from bpf_perf_event_output()
perf data: Explicitly set byte order for integer types
perf core: Introduce new ioctl options to pause and resume ring buffer
perf core: Set event's default overflow_handler
perf core: Prepare writing into ring buffer from end
perf core: Add backward attribute to perf event
perf core: Reduce perf event output overhead by new overflow handler
perf tools: Only validate is_pos for tracking evsels
perf tools: Print write_backward value in perf_event_attr__fprintf
perf tools: Make ordered_events reusable
perf record: Extract synthesize code to record__synthesize()
perf tools: Add perf_data_file__switch() helper
perf record: Turns auxtrace_snapshot_enable into 3 states
perf record: Introduce record__finish_output() to finish a perf.data
perf record: Add '--timestamp-filename' option to append timestamp to output filename
perf record: Split output into multiple files via '--switch-output'
perf record: Force enable --timestamp-filename when --switch-output is provided
perf record: Disable buildid cache options by default in switch output mode
perf record: Re-synthesize tracking events after output switching
perf record: Generate tracking events for process forked by perf
perf record: Ensure return non-zero rc when mmap fail
perf record: Prevent reading invalid data in record__mmap_read
perf tools: Add evlist channel helpers
perf tools: Automatically add new channel according to evlist
perf tools: Operate multiple channels
perf tools: Squash overwrite setting into channel
perf record: Don't read from and poll overwrite channel
perf record: Don't poll on overwrite channel
perf tools: Detect avalibility of write_backward
perf tools: Enable overwrite settings
perf tools: Set write_backward attribut bit for overwrite events
perf tools: Record fd into perf_mmap
perf tools: Add API to pause a channel
perf record: Toggle overwrite ring buffer for reading
perf record: Rename variable to make code clear
perf record: Read from backward ring buffer
perf record: Allow generate tracking events at the end of output
perf tools: Don't warn about out of order event if write_backward is used
include/linux/perf_event.h | 22 +-
include/uapi/linux/perf_event.h | 4 +-
kernel/events/core.c | 73 +++-
kernel/events/internal.h | 11 +
kernel/events/ring_buffer.c | 63 +++-
tools/perf/builtin-record.c | 598 ++++++++++++++++++++++++++-----
tools/perf/perf.h | 2 +
tools/perf/tests/bpf.c | 2 +-
tools/perf/util/bpf-loader.c | 718 ++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 59 ++++
tools/perf/util/data-convert-bt.c | 118 ++++++-
tools/perf/util/data.c | 36 ++
tools/perf/util/data.h | 11 +-
tools/perf/util/dso.h | 1 +
tools/perf/util/evlist.c | 355 ++++++++++++++++---
tools/perf/util/evlist.h | 70 +++-
tools/perf/util/evsel.c | 23 ++
tools/perf/util/evsel.h | 11 +
tools/perf/util/map.c | 14 +
tools/perf/util/ordered-events.c | 5 +
tools/perf/util/parse-events.c | 136 +++++++-
tools/perf/util/parse-events.h | 19 +-
tools/perf/util/parse-events.l | 18 +-
tools/perf/util/parse-events.y | 95 ++++-
tools/perf/util/record.c | 11 +
tools/perf/util/session.c | 22 +-
tools/perf/util/symbol-elf.c | 25 +-
27 files changed, 2327 insertions(+), 195 deletions(-)
^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH 01/48] perf tools: Record text offset in dso to calculate objdump address
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 02/48] perf tools: Adjust symbol for shared objects Wang Nan
` (46 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
In this patch, the offset of '.text' section is stored into dso
and used here to re-calculate address to objdump.
In most of the cases, executable code is in '.text' section, so the
adjustment made to a symbol in dso__load_sym (using
sym.st_value -= shdr.sh_addr - shdr.sh_offset) should equal to
'sym.st_value -= dso->text_offset'. Therefore, adding text_offset back
get objdump address from symbol address (rip). However, it is not true
for kernel and kernel module since there could be multiple executable
sections with different offset. Exclude kernel for this reason.
After this patch, even dso->adjust_symbols is set to true for shared
objects, map__rip_2objdump() and map__objdump_2mem() would return
correct result, so perf behavior of annotate won't be changed.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/dso.h | 1 +
tools/perf/util/map.c | 14 ++++++++++++++
tools/perf/util/symbol-elf.c | 12 ++++++------
3 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 45ec4d0..ef3dbc9 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -162,6 +162,7 @@ struct dso {
u8 loaded;
u8 rel;
u8 build_id[BUILD_ID_SIZE];
+ u64 text_offset;
const char *short_name;
const char *long_name;
u16 long_name_len;
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 171b6d1..02c3186 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -431,6 +431,13 @@ u64 map__rip_2objdump(struct map *map, u64 rip)
if (map->dso->rel)
return rip - map->pgoff;
+ /*
+ * kernel modules also have DSO_TYPE_USER in dso->kernel,
+ * but all kernel modules are ET_REL, so won't get here.
+ */
+ if (map->dso->kernel == DSO_TYPE_USER)
+ return rip + map->dso->text_offset;
+
return map->unmap_ip(map, rip) - map->reloc;
}
@@ -454,6 +461,13 @@ u64 map__objdump_2mem(struct map *map, u64 ip)
if (map->dso->rel)
return map->unmap_ip(map, ip + map->pgoff);
+ /*
+ * kernel modules also have DSO_TYPE_USER in dso->kernel,
+ * but all kernel modules are ET_REL, so won't get here.
+ */
+ if (map->dso->kernel == DSO_TYPE_USER)
+ return map->unmap_ip(map, ip - map->dso->text_offset);
+
return ip + map->reloc;
}
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index b1dd68f..bc229a7 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -793,6 +793,7 @@ int dso__load_sym(struct dso *dso, struct map *map,
uint32_t idx;
GElf_Ehdr ehdr;
GElf_Shdr shdr;
+ GElf_Shdr tshdr;
Elf_Data *syms, *opddata = NULL;
GElf_Sym sym;
Elf_Scn *sec, *sec_strndx;
@@ -832,6 +833,9 @@ int dso__load_sym(struct dso *dso, struct map *map,
sec = syms_ss->symtab;
shdr = syms_ss->symshdr;
+ if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
+ dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;
+
if (runtime_ss->opdsec)
opddata = elf_rawdata(runtime_ss->opdsec, NULL);
@@ -880,12 +884,8 @@ int dso__load_sym(struct dso *dso, struct map *map,
* Handle any relocation of vdso necessary because older kernels
* attempted to prelink vdso to its virtual address.
*/
- if (dso__is_vdso(dso)) {
- GElf_Shdr tshdr;
-
- if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
- map->reloc = map->start - tshdr.sh_addr + tshdr.sh_offset;
- }
+ if (dso__is_vdso(dso))
+ map->reloc = map->start - dso->text_offset;
dso->adjust_symbols = runtime_ss->adjust_symbols || ref_reloc(kmap);
/*
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 02/48] perf tools: Adjust symbol for shared objects
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
2016-02-22 9:10 ` [PATCH 01/48] perf tools: Record text offset in dso to calculate objdump address Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 03/48] perf bpf: Add API to set values to map entries in a bpf object Wang Nan
` (45 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
He Kuang reported a problem that perf fails to get correct symbol on
Android platform in [1]. The problem can be reproduced on normal x86_64
platform. I will describe the reproducing steps in detail at the end of
commit message.
The reason of this problem is the missing of symbol adjustment for normal
shared objects. In most of the cases it is okay skipping adjustment. However,
the result is wrong when '.text' section have different 'address' and 'offset'.
I checked all shared objects in my working platform, only wine dll objects and
debug objects (in .debug) have this problem. However, it is common on Android.
For example:
$ readelf -S ./libsurfaceflinger.so | grep \.text
[10] .text PROGBITS 0000000000029030 00012030
This patch enables symbol adjustment for dynamic objects so the symbol
address got from elfutils would be adjusted correctly.
Now nearly all type of ELF file should adjust symbols. Makes
ss->adjust_symbols default to true.
Steps to reproduce the problem:
$ cat ./Makefile
PWD := $(shell pwd)
LDFLAGS += "-Wl,-rpath=$(PWD)"
CFLAGS += -g
main: main.c libbuggy.so
libbuggy.so: buggy.c
gcc -g -shared -fPIC -Wl,-Ttext-segment=0x200000 $< -o $@
clean:
rm -rf main libbuggy.so *.o
$ cat ./buggy.c
int fib(int x)
{
return (x == 0) ? 1 : (x == 1) ? 1 : fib(x - 1) + fib(x - 2);
}
$ cat ./main.c
#include <stdio.h>
extern int fib(int x);
int main()
{
int i;
for (i = 0; i < 40; i++)
printf("%d\n", fib(i));
return 0;
}
$ make
$ perf record ./main
...
$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...............................
#
14.97% main libbuggy.so [.] 0x000000000000066c
8.68% main libbuggy.so [.] 0x00000000000006aa
8.52% main libbuggy.so [.] fib@plt
7.95% main libbuggy.so [.] 0x0000000000000664
5.94% main libbuggy.so [.] 0x00000000000006a9
5.35% main libbuggy.so [.] 0x0000000000000678
...
The correct result should be (after this patch):
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...............................
#
91.47% main libbuggy.so [.] fib
8.52% main libbuggy.so [.] fib@plt
0.00% main [kernel.kallsyms] [k] kmem_cache_free
[1] http://lkml.kernel.org/g/1452567507-54013-1-git-send-email-hekuang@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/symbol-elf.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index bc229a7..3f9d679 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -709,17 +709,10 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
if (ss->opdshdr.sh_type != SHT_PROGBITS)
ss->opdsec = NULL;
- if (dso->kernel == DSO_TYPE_USER) {
- GElf_Shdr shdr;
- ss->adjust_symbols = (ehdr.e_type == ET_EXEC ||
- ehdr.e_type == ET_REL ||
- dso__is_vdso(dso) ||
- elf_section_by_name(elf, &ehdr, &shdr,
- ".gnu.prelink_undo",
- NULL) != NULL);
- } else {
+ if (dso->kernel == DSO_TYPE_USER)
+ ss->adjust_symbols = true;
+ else
ss->adjust_symbols = elf__needs_adjust_symbols(ehdr);
- }
ss->name = strdup(name);
if (!ss->name) {
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 03/48] perf bpf: Add API to set values to map entries in a bpf object
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
2016-02-22 9:10 ` [PATCH 01/48] perf tools: Record text offset in dso to calculate objdump address Wang Nan
2016-02-22 9:10 ` [PATCH 02/48] perf tools: Adjust symbol for shared objects Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:39 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 04/48] perf tools: Enable BPF object configure syntax Wang Nan
` (44 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
bpf__config_obj() is introduced as a core API to config BPF object after
loading. One configuration option of maps is introduced. After this
patch BPF object can accept assignments like:
map:my_map.value=1234
(map.my_map.value looks pretty. However, there's a small but hard to
fix problem related to flex's greedy matching. Please see [1]. Choose
':' to avoid it in a simpler way.)
This patch is more complex than the work it does because the
consideration of extension. In designing BPF map configuration, the
following things should be considered:
1. Array indices selection: perf should allow user setting different
value for different slots in an array, with syntax like:
map:my_map.value[0,3...6]=1234;
2. A map should be set by different config terms, each for a part
of it. For example, set each slot to the pid of a thread;
3. Type of value: integer is not the only valid value type. A perf
counter can also be put into a map after commit 35578d798400
("bpf: Implement function bpf_perf_event_read() that get the
selected hardware PMU counter")
4. For a hash table, it should be possible to use a string or other
value as a key;
5. It is possible that map configuration is unable to be setup
during parsing. A perf counter is an example.
Therefore, this patch does the following:
1. Instead of updating map element during parsing, this patch stores
map config options in 'struct bpf_map_priv'. Following patches
will apply those configs at an appropriate time;
2. Link map operations in a list so a map can have multiple config
terms attached, so different parts can be configured separately;
3. Make 'struct bpf_map_priv' extensible so that the following patches
can add new types of keys and operations;
4. Use bpf_obj_config__map_funcs array to support more map config options.
Since the patch changing the event parser to parse BPF object config is
relative large, I've put it in another commit. Code in this patch can be
tested after applying the next patch.
[1] http://lkml.kernel.org/g/564ED621.4050500@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1454680939-24963-7-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Changes "maps:my_map.value" to "map:my_map.value", improved error messages ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/bpf-loader.c | 276 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 38 ++++++
2 files changed, 314 insertions(+)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 0bdccf4..caeef9e 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -739,6 +739,261 @@ int bpf__foreach_tev(struct bpf_object *obj,
return 0;
}
+enum bpf_map_op_type {
+ BPF_MAP_OP_SET_VALUE,
+};
+
+enum bpf_map_key_type {
+ BPF_MAP_KEY_ALL,
+};
+
+struct bpf_map_op {
+ struct list_head list;
+ enum bpf_map_op_type op_type;
+ enum bpf_map_key_type key_type;
+ union {
+ u64 value;
+ } v;
+};
+
+struct bpf_map_priv {
+ struct list_head ops_list;
+};
+
+static void
+bpf_map_op__delete(struct bpf_map_op *op)
+{
+ if (!list_empty(&op->list))
+ list_del(&op->list);
+ free(op);
+}
+
+static void
+bpf_map_priv__purge(struct bpf_map_priv *priv)
+{
+ struct bpf_map_op *pos, *n;
+
+ list_for_each_entry_safe(pos, n, &priv->ops_list, list) {
+ list_del_init(&pos->list);
+ bpf_map_op__delete(pos);
+ }
+}
+
+static void
+bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
+ void *_priv)
+{
+ struct bpf_map_priv *priv = _priv;
+
+ bpf_map_priv__purge(priv);
+ free(priv);
+}
+
+static struct bpf_map_op *
+bpf_map_op__new(void)
+{
+ struct bpf_map_op *op;
+
+ op = zalloc(sizeof(*op));
+ if (!op) {
+ pr_debug("Failed to alloc bpf_map_op\n");
+ return ERR_PTR(-ENOMEM);
+ }
+ INIT_LIST_HEAD(&op->list);
+
+ op->key_type = BPF_MAP_KEY_ALL;
+ return op;
+}
+
+static int
+bpf_map__add_op(struct bpf_map *map, struct bpf_map_op *op)
+{
+ struct bpf_map_priv *priv;
+ const char *map_name;
+ int err;
+
+ map_name = bpf_map__get_name(map);
+ err = bpf_map__get_private(map, (void **)&priv);
+ if (err) {
+ pr_debug("Failed to get private from map %s\n", map_name);
+ return err;
+ }
+
+ if (!priv) {
+ priv = zalloc(sizeof(*priv));
+ if (!priv) {
+ pr_debug("No enough memory to alloc map private\n");
+ return -ENOMEM;
+ }
+ INIT_LIST_HEAD(&priv->ops_list);
+
+ if (bpf_map__set_private(map, priv, bpf_map_priv__clear)) {
+ free(priv);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ }
+
+ list_add_tail(&op->list, &priv->ops_list);
+ return 0;
+}
+
+static int
+__bpf_map__config_value(struct bpf_map *map,
+ struct parse_events_term *term)
+{
+ struct bpf_map_def def;
+ struct bpf_map_op *op;
+ const char *map_name;
+ int err;
+
+ map_name = bpf_map__get_name(map);
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("Unable to get map definition from '%s'\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ if (def.type != BPF_MAP_TYPE_ARRAY) {
+ pr_debug("Map %s type is not BPF_MAP_TYPE_ARRAY\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
+ }
+ if (def.key_size < sizeof(unsigned int)) {
+ pr_debug("Map %s has incorrect key size\n", map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE;
+ }
+ switch (def.value_size) {
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ break;
+ default:
+ pr_debug("Map %s has incorrect value size\n", map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
+ }
+
+ op = bpf_map_op__new();
+ if (IS_ERR(op))
+ return PTR_ERR(op);
+ op->op_type = BPF_MAP_OP_SET_VALUE;
+ op->v.value = term->val.num;
+
+ err = bpf_map__add_op(map, op);
+ if (err)
+ bpf_map_op__delete(op);
+ return err;
+}
+
+static int
+bpf_map__config_value(struct bpf_map *map,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist __maybe_unused)
+{
+ if (!term->err_val) {
+ pr_debug("Config value not set\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_CONF;
+ }
+
+ if (term->type_val != PARSE_EVENTS__TERM_TYPE_NUM) {
+ pr_debug("ERROR: wrong value type\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
+ }
+
+ return __bpf_map__config_value(map, term);
+}
+
+struct bpf_obj_config__map_func {
+ const char *config_opt;
+ int (*config_func)(struct bpf_map *, struct parse_events_term *,
+ struct perf_evlist *);
+};
+
+struct bpf_obj_config__map_func bpf_obj_config__map_funcs[] = {
+ {"value", bpf_map__config_value},
+};
+
+static int
+bpf__obj_config_map(struct bpf_object *obj,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist,
+ int *key_scan_pos)
+{
+ /* key is "map:<mapname>.<config opt>" */
+ char *map_name = strdup(term->config + sizeof("map:") - 1);
+ struct bpf_map *map;
+ int err = -BPF_LOADER_ERRNO__OBJCONF_OPT;
+ char *map_opt;
+ size_t i;
+
+ if (!map_name)
+ return -ENOMEM;
+
+ map_opt = strchr(map_name, '.');
+ if (!map_opt) {
+ pr_debug("ERROR: Invalid map config: %s\n", map_name);
+ goto out;
+ }
+
+ *map_opt++ = '\0';
+ if (*map_opt == '\0') {
+ pr_debug("ERROR: Invalid map option: %s\n", term->config);
+ goto out;
+ }
+
+ map = bpf_object__get_map_by_name(obj, map_name);
+ if (!map) {
+ pr_debug("ERROR: Map %s doesn't exist\n", map_name);
+ err = -BPF_LOADER_ERRNO__OBJCONF_MAP_NOTEXIST;
+ goto out;
+ }
+
+ *key_scan_pos += map_opt - map_name;
+ for (i = 0; i < ARRAY_SIZE(bpf_obj_config__map_funcs); i++) {
+ struct bpf_obj_config__map_func *func =
+ &bpf_obj_config__map_funcs[i];
+
+ if (strcmp(map_opt, func->config_opt) == 0) {
+ err = func->config_func(map, term, evlist);
+ goto out;
+ }
+ }
+
+ pr_debug("ERROR: Invalid map config option '%s'\n", map_opt);
+ err = -BPF_LOADER_ERRNO__OBJCONF_MAP_OPT;
+out:
+ free(map_name);
+ if (!err)
+ key_scan_pos += strlen(map_opt);
+ return err;
+}
+
+int bpf__config_obj(struct bpf_object *obj,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist,
+ int *error_pos)
+{
+ int key_scan_pos = 0;
+ int err;
+
+ if (!obj || !term || !term->config)
+ return -EINVAL;
+
+ if (!prefixcmp(term->config, "map:")) {
+ key_scan_pos = sizeof("map:") - 1;
+ err = bpf__obj_config_map(obj, term, evlist, &key_scan_pos);
+ goto out;
+ }
+ err = -BPF_LOADER_ERRNO__OBJCONF_OPT;
+out:
+ if (error_pos)
+ *error_pos = key_scan_pos;
+ return err;
+
+}
+
#define ERRNO_OFFSET(e) ((e) - __BPF_LOADER_ERRNO__START)
#define ERRCODE_OFFSET(c) ERRNO_OFFSET(BPF_LOADER_ERRNO__##c)
#define NR_ERRNO (__BPF_LOADER_ERRNO__END - __BPF_LOADER_ERRNO__START)
@@ -753,6 +1008,14 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(PROLOGUE)] = "Failed to generate prologue",
[ERRCODE_OFFSET(PROLOGUE2BIG)] = "Prologue too big for program",
[ERRCODE_OFFSET(PROLOGUEOOB)] = "Offset out of bound for prologue",
+ [ERRCODE_OFFSET(OBJCONF_OPT)] = "Invalid object config option",
+ [ERRCODE_OFFSET(OBJCONF_CONF)] = "Config value not set (missing '=')",
+ [ERRCODE_OFFSET(OBJCONF_MAP_OPT)] = "Invalid object map config option",
+ [ERRCODE_OFFSET(OBJCONF_MAP_NOTEXIST)] = "Target map doesn't exist",
+ [ERRCODE_OFFSET(OBJCONF_MAP_VALUE)] = "Incorrect value type for map",
+ [ERRCODE_OFFSET(OBJCONF_MAP_TYPE)] = "Incorrect map type",
+ [ERRCODE_OFFSET(OBJCONF_MAP_KEYSIZE)] = "Incorrect map key size",
+ [ERRCODE_OFFSET(OBJCONF_MAP_VALUESIZE)] = "Incorrect map value size",
};
static int
@@ -872,3 +1135,16 @@ int bpf__strerror_load(struct bpf_object *obj,
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
+ struct parse_events_term *term __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused,
+ int *error_pos __maybe_unused, int err,
+ char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE,
+ "Can't use this config term with this map type");
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 6fdc045..cc46a07 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -10,6 +10,7 @@
#include <string.h>
#include <bpf/libbpf.h>
#include "probe-event.h"
+#include "evlist.h"
#include "debug.h"
enum bpf_loader_errno {
@@ -24,10 +25,19 @@ enum bpf_loader_errno {
BPF_LOADER_ERRNO__PROLOGUE, /* Failed to generate prologue */
BPF_LOADER_ERRNO__PROLOGUE2BIG, /* Prologue too big for program */
BPF_LOADER_ERRNO__PROLOGUEOOB, /* Offset out of bound for prologue */
+ BPF_LOADER_ERRNO__OBJCONF_OPT, /* Invalid object config option */
+ BPF_LOADER_ERRNO__OBJCONF_CONF, /* Config value not set (lost '=')) */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_OPT, /* Invalid object map config option */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_NOTEXIST, /* Target map not exist */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE, /* Incorrect value type for map */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE, /* Incorrect map type */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE, /* Incorrect map key size */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE,/* Incorrect map value size */
__BPF_LOADER_ERRNO__END,
};
struct bpf_object;
+struct parse_events_term;
#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
@@ -53,6 +63,14 @@ int bpf__strerror_load(struct bpf_object *obj, int err,
char *buf, size_t size);
int bpf__foreach_tev(struct bpf_object *obj,
bpf_prog_iter_callback_t func, void *arg);
+
+int bpf__config_obj(struct bpf_object *obj, struct parse_events_term *term,
+ struct perf_evlist *evlist, int *error_pos);
+int bpf__strerror_config_obj(struct bpf_object *obj,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist,
+ int *error_pos, int err, char *buf,
+ size_t size);
#else
static inline struct bpf_object *
bpf__prepare_load(const char *filename __maybe_unused,
@@ -84,6 +102,15 @@ bpf__foreach_tev(struct bpf_object *obj __maybe_unused,
}
static inline int
+bpf__config_obj(struct bpf_object *obj __maybe_unused,
+ struct parse_events_term *term __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused,
+ int *error_pos __maybe_unused)
+{
+ return 0;
+}
+
+static inline int
__bpf_strerror(char *buf, size_t size)
{
if (!size)
@@ -118,5 +145,16 @@ static inline int bpf__strerror_load(struct bpf_object *obj __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int
+bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
+ struct parse_events_term *term __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused,
+ int *error_pos __maybe_unused,
+ int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 04/48] perf tools: Enable BPF object configure syntax
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (2 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 03/48] perf bpf: Add API to set values to map entries in a bpf object Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:39 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 05/48] perf record: Apply config to BPF objects before recording Wang Nan
` (43 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
This patch adds the final step for BPF map configuration. A new syntax
is appended into parser so user can config BPF objects through '/' '/'
enclosed config terms.
After this patch, following syntax is available:
# perf record -e ./test_bpf_map_1.c/map:channel.value=10/ ...
It would takes effect after appling following commits.
Test result:
# cat ./test_bpf_map_1.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
SEC("func=sys_nanosleep")
int func(void *ctx)
{
int key = 0;
char fmt[] = "%d\n";
int *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), *pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
- Normal case:
# ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
- Error case:
# ./perf record -e './test_bpf_map_1.c/map:channel.value/' usleep 10
event syntax error: '..ps:channel:value/'
\___ Config value not set (missing '=')
Hint: Valid config term:
map:[<arraymap>]:value=[value]
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
# ./perf record -e './test_bpf_map_1.c/xmap:channel.value=10/' usleep 10
event syntax error: '..pf_map_1.c/xmap:channel.value=10/'
\___ Invalid object config option
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:xchannel.value=10/' usleep 10
event syntax error: '..p_1.c/map:xchannel.value=10/'
\___ Target map not exist
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:channel.xvalue=10/' usleep 10
event syntax error: '..ps:channel.xvalue=10/'
\___ Invalid object map config option
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:channel.value=x10/' usleep 10
event syntax error: '..nnel.value=x10/'
\___ Incorrect value type for map
[SNIP]
Change BPF_MAP_TYPE_ARRAY to '1' in test_bpf_map_1.c:
# ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
event syntax error: '..ps:channel.value=10/'
\___ Can't use this config term to this type of map
Hint: Valid config term:
map:[<arraymap>].value=[value]
(add -v to see detail)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
[for parser part]
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/parse-events.c | 55 +++++++++++++++++++++++++++++++++++++++---
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.l | 2 +-
tools/perf/util/parse-events.y | 10 +++++---
4 files changed, 61 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index b0b3295..a5dd670 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -672,17 +672,63 @@ errout:
return err;
}
+static int
+parse_events_config_bpf(struct parse_events_evlist *data,
+ struct bpf_object *obj,
+ struct list_head *head_config)
+{
+ struct parse_events_term *term;
+ int error_pos;
+
+ if (!head_config || list_empty(head_config))
+ return 0;
+
+ list_for_each_entry(term, head_config, list) {
+ char errbuf[BUFSIZ];
+ int err;
+
+ if (term->type_term != PARSE_EVENTS__TERM_TYPE_USER) {
+ snprintf(errbuf, sizeof(errbuf),
+ "Invalid config term for BPF object");
+ errbuf[BUFSIZ - 1] = '\0';
+
+ data->error->idx = term->err_term;
+ data->error->str = strdup(errbuf);
+ return -EINVAL;
+ }
+
+ err = bpf__config_obj(obj, term, NULL, &error_pos);
+ if (err) {
+ bpf__strerror_config_obj(obj, term, NULL,
+ &error_pos, err, errbuf,
+ sizeof(errbuf));
+ data->error->help = strdup(
+"Hint:\tValid config term:\n"
+" \tmap:[<arraymap>].value=[value]\n"
+" \t(add -v to see detail)");
+ data->error->str = strdup(errbuf);
+ if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
+ data->error->idx = term->err_val;
+ else
+ data->error->idx = term->err_term + error_pos;
+ return err;
+ }
+ }
+ return 0;
+}
+
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
char *bpf_file_name,
- bool source)
+ bool source,
+ struct list_head *head_config)
{
struct bpf_object *obj;
+ int err;
obj = bpf__prepare_load(bpf_file_name, source);
if (IS_ERR(obj)) {
char errbuf[BUFSIZ];
- int err;
err = PTR_ERR(obj);
@@ -700,7 +746,10 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
return err;
}
- return parse_events_load_bpf_obj(data, list, obj);
+ err = parse_events_load_bpf_obj(data, list, obj);
+ if (err)
+ return err;
+ return parse_events_config_bpf(data, obj, head_config);
}
static int
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index d5eb2af..c48377a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -129,7 +129,8 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
char *bpf_file_name,
- bool source);
+ bool source,
+ struct list_head *head_config);
/* Provide this function for perf test */
struct bpf_object;
int parse_events_load_bpf_obj(struct parse_events_evlist *data,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 99486e6..0cc6b84 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -122,7 +122,7 @@ num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
name [a-zA-Z_*?][a-zA-Z0-9_*?.]*
-name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.]*
+name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
/* If you add a modifier you need to update check_modifier() */
modifier_event [ukhpPGHSDI]+
modifier_bp [rwx]{1,3}
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 6a2d006..0e2d433 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -437,24 +437,26 @@ PE_RAW opt_event_config
}
event_bpf_file:
-PE_BPF_OBJECT
+PE_BPF_OBJECT opt_event_config
{
struct parse_events_evlist *data = _data;
struct parse_events_error *error = data->error;
struct list_head *list;
ALLOC_LIST(list);
- ABORT_ON(parse_events_load_bpf(data, list, $1, false));
+ ABORT_ON(parse_events_load_bpf(data, list, $1, false, $2));
+ parse_events_terms__delete($2);
$$ = list;
}
|
-PE_BPF_SOURCE
+PE_BPF_SOURCE opt_event_config
{
struct parse_events_evlist *data = _data;
struct list_head *list;
ALLOC_LIST(list);
- ABORT_ON(parse_events_load_bpf(data, list, $1, true));
+ ABORT_ON(parse_events_load_bpf(data, list, $1, true, $2));
+ parse_events_terms__delete($2);
$$ = list;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 05/48] perf record: Apply config to BPF objects before recording
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (3 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 04/48] perf tools: Enable BPF object configure syntax Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:39 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 06/48] perf tools: Enable passing event to BPF object Wang Nan
` (42 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
bpf__apply_obj_config() is introduced as the core API to apply object
config options to all BPF objects. This patch also does the real work
for setting values for BPF_MAP_TYPE_PERF_ARRAY maps by inserting value
stored in map's private field into the BPF map.
This patch is required because we are not always able to set all
BPF config during parsing. Further patch will set events created
by perf to BPF_MAP_TYPE_PERF_EVENT_ARRAY maps, which is not exist
until perf_evsel__open().
bpf_map_foreach_key() is introduced to iterate over each key
needs to be configured. This function would be extended to support
more map types and different key settings.
In perf record, before start recording, call bpf__apply_config() to
turn on all BPF config options.
Test result:
# cat ./test_bpf_map_1.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
SEC("func=sys_nanosleep")
int func(void *ctx)
{
int key = 0;
char fmt[] = "%d\n";
int *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), *pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -e './test_bpf_map_1.c/map:channel.value=11/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-18593 [007] d... 2394714.395539: : 11
# ./perf record -e './test_bpf_map_1.c/map:channel.value=101/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-18593 [007] d... 2394714.395539: : 11
usleep-19000 [006] d... 2394831.057840: : 101
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 11 +++
tools/perf/util/bpf-loader.c | 184 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 15 ++++
3 files changed, 210 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cf3a28d..7d11162 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -32,6 +32,7 @@
#include "util/parse-branch-options.h"
#include "util/parse-regs-options.h"
#include "util/llvm-utils.h"
+#include "util/bpf-loader.h"
#include <unistd.h>
#include <sched.h>
@@ -536,6 +537,16 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}
+ err = bpf__apply_obj_config();
+ if (err) {
+ char errbuf[BUFSIZ];
+
+ bpf__strerror_apply_obj_config(err, errbuf, sizeof(errbuf));
+ pr_err("ERROR: Apply config to BPF failed: %s\n",
+ errbuf);
+ goto out_child;
+ }
+
/*
* Normally perf_session__new would do this, but it doesn't have the
* evlist.
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index caeef9e..dbbd17c 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -7,6 +7,7 @@
#include <linux/bpf.h>
#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
#include <linux/err.h>
#include <linux/string.h>
#include "perf.h"
@@ -994,6 +995,182 @@ out:
}
+typedef int (*map_config_func_t)(const char *name, int map_fd,
+ struct bpf_map_def *pdef,
+ struct bpf_map_op *op,
+ void *pkey, void *arg);
+
+static int
+foreach_key_array_all(map_config_func_t func,
+ void *arg, const char *name,
+ int map_fd, struct bpf_map_def *pdef,
+ struct bpf_map_op *op)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < pdef->max_entries; i++) {
+ err = func(name, map_fd, pdef, op, &i, arg);
+ if (err) {
+ pr_debug("ERROR: failed to insert value to %s[%u]\n",
+ name, i);
+ return err;
+ }
+ }
+ return 0;
+}
+
+static int
+bpf_map_config_foreach_key(struct bpf_map *map,
+ map_config_func_t func,
+ void *arg)
+{
+ int err, map_fd;
+ const char *name;
+ struct bpf_map_op *op;
+ struct bpf_map_def def;
+ struct bpf_map_priv *priv;
+
+ name = bpf_map__get_name(map);
+
+ err = bpf_map__get_private(map, (void **)&priv);
+ if (err) {
+ pr_debug("ERROR: failed to get private from map %s\n", name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ if (!priv || list_empty(&priv->ops_list)) {
+ pr_debug("INFO: nothing to config for map %s\n", name);
+ return 0;
+ }
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("ERROR: failed to get definition from map %s\n", name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ map_fd = bpf_map__get_fd(map);
+ if (map_fd < 0) {
+ pr_debug("ERROR: failed to get fd from map %s\n", name);
+ return map_fd;
+ }
+
+ list_for_each_entry(op, &priv->ops_list, list) {
+ switch (def.type) {
+ case BPF_MAP_TYPE_ARRAY:
+ switch (op->key_type) {
+ case BPF_MAP_KEY_ALL:
+ err = foreach_key_array_all(func, arg, name,
+ map_fd, &def, op);
+ if (err)
+ return err;
+ break;
+ default:
+ pr_debug("ERROR: keytype for map '%s' invalid\n",
+ name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ break;
+ default:
+ pr_debug("ERROR: type of '%s' incorrect\n", name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
+ }
+ }
+
+ return 0;
+}
+
+static int
+apply_config_value_for_key(int map_fd, void *pkey,
+ size_t val_size, u64 val)
+{
+ int err = 0;
+
+ switch (val_size) {
+ case 1: {
+ u8 _val = (u8)(val);
+ err = bpf_map_update_elem(map_fd, pkey, &_val, BPF_ANY);
+ break;
+ }
+ case 2: {
+ u16 _val = (u16)(val);
+ err = bpf_map_update_elem(map_fd, pkey, &_val, BPF_ANY);
+ break;
+ }
+ case 4: {
+ u32 _val = (u32)(val);
+ err = bpf_map_update_elem(map_fd, pkey, &_val, BPF_ANY);
+ break;
+ }
+ case 8: {
+ err = bpf_map_update_elem(map_fd, pkey, &val, BPF_ANY);
+ break;
+ }
+ default:
+ pr_debug("ERROR: invalid value size\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
+ }
+ if (err && errno)
+ err = -errno;
+ return err;
+}
+
+static int
+apply_obj_config_map_for_key(const char *name, int map_fd,
+ struct bpf_map_def *pdef __maybe_unused,
+ struct bpf_map_op *op,
+ void *pkey, void *arg __maybe_unused)
+{
+ int err;
+
+ switch (op->op_type) {
+ case BPF_MAP_OP_SET_VALUE:
+ err = apply_config_value_for_key(map_fd, pkey,
+ pdef->value_size,
+ op->v.value);
+ break;
+ default:
+ pr_debug("ERROR: unknown value type for '%s'\n", name);
+ err = -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ return err;
+}
+
+static int
+apply_obj_config_map(struct bpf_map *map)
+{
+ return bpf_map_config_foreach_key(map,
+ apply_obj_config_map_for_key,
+ NULL);
+}
+
+static int
+apply_obj_config_object(struct bpf_object *obj)
+{
+ struct bpf_map *map;
+ int err;
+
+ bpf_map__for_each(map, obj) {
+ err = apply_obj_config_map(map);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
+int bpf__apply_obj_config(void)
+{
+ struct bpf_object *obj, *tmp;
+ int err;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ err = apply_obj_config_object(obj);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
#define ERRNO_OFFSET(e) ((e) - __BPF_LOADER_ERRNO__START)
#define ERRCODE_OFFSET(c) ERRNO_OFFSET(BPF_LOADER_ERRNO__##c)
#define NR_ERRNO (__BPF_LOADER_ERRNO__END - __BPF_LOADER_ERRNO__START)
@@ -1148,3 +1325,10 @@ int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_apply_obj_config(int err, char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index cc46a07..5d3b931 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -71,6 +71,8 @@ int bpf__strerror_config_obj(struct bpf_object *obj,
struct perf_evlist *evlist,
int *error_pos, int err, char *buf,
size_t size);
+int bpf__apply_obj_config(void);
+int bpf__strerror_apply_obj_config(int err, char *buf, size_t size);
#else
static inline struct bpf_object *
bpf__prepare_load(const char *filename __maybe_unused,
@@ -111,6 +113,12 @@ bpf__config_obj(struct bpf_object *obj __maybe_unused,
}
static inline int
+bpf__apply_obj_config(void)
+{
+ return 0;
+}
+
+static inline int
__bpf_strerror(char *buf, size_t size)
{
if (!size)
@@ -156,5 +164,12 @@ bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int
+bpf__strerror_apply_obj_config(int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 06/48] perf tools: Enable passing event to BPF object
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (4 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 05/48] perf record: Apply config to BPF objects before recording Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:40 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 07/48] perf tools: Support setting different slots in a BPF map separately Wang Nan
` (41 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
A new syntax is appended into parser so user can pass predefined perf
events into BPF objects.
After this patch, BPF programs for perf are finally able to utilize
bpf_perf_event_read() introduced in commit 35578d7984003097af2b1e3
(bpf: Implement function bpf_perf_event_read() that get the selected
hardware PMU conuter).
Test result:
# cat ./test_bpf_map_2.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_read)(struct bpf_map_def *, int) =
(void *)BPF_FUNC_perf_event_read;
struct bpf_map_def SEC("maps") pmu_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = __NR_CPUS__,
};
SEC("func_write=sys_write")
int func_write(void *ctx)
{
unsigned long long val;
char fmt[] = "sys_write: pmu=%llu\n";
val = perf_event_read(&pmu_map, get_smp_processor_id());
trace_printk(fmt, sizeof(fmt), val);
return 0;
}
SEC("func_write_return=sys_write%return")
int func_write_return(void *ctx)
{
unsigned long long val = 0;
char fmt[] = "sys_write_return: pmu=%llu\n";
val = perf_event_read(&pmu_map, get_smp_processor_id());
trace_printk(fmt, sizeof(fmt), val);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Normal case:
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
[SNIP]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
# cat /sys/kernel/debug/tracing/trace | grep ls
ls-17066 [000] d... 938449.863301: : sys_write: pmu=1157327
ls-17066 [000] dN.. 938449.863342: : sys_write_return: pmu=1225218
ls-17066 [000] d... 938449.863349: : sys_write: pmu=1241922
ls-17066 [000] dN.. 938449.863369: : sys_write_return: pmu=1267445
Normal case (system wide):
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.811 MB perf.data (120 samples) ]
# cat /sys/kernel/debug/tracing/trace | grep -v '18446744073709551594' | grep -v perf | head -n 20
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
gmain-30828 [002] d... 2740551.068992: : sys_write: pmu=84373
gmain-30828 [002] d... 2740551.068992: : sys_write_return: pmu=87696
gmain-30828 [002] d... 2740551.068996: : sys_write: pmu=100658
gmain-30828 [002] d... 2740551.068997: : sys_write_return: pmu=102572
Error case 1:
# ./perf record -e './test_bpf_map_2.c' ls /
[SNIP]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep ls
ls-17115 [007] d... 2724279.665625: : sys_write: pmu=18446744073709551614
ls-17115 [007] dN.. 2724279.665651: : sys_write_return: pmu=18446744073709551614
ls-17115 [007] d... 2724279.665658: : sys_write: pmu=18446744073709551614
ls-17115 [007] dN.. 2724279.665677: : sys_write_return: pmu=18446744073709551614
(18446744073709551614 is 0xfffffffffffffffe (-2))
Error case 2:
# ./perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=evt/' -a
event syntax error: '..ps:pmu_map.event=evt/'
\___ Event not found for map setting
Hint: Valid config terms:
map:[<arraymap>].value=[value]
map:[<eventmap>].event=[event]
[SNIP]
Error case 3:
# ls /proc/2348/task/
2348 2505 2506 2507 2508
# ./perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -p 2348
ERROR: Apply config to BPF failed: Cannot set event to BPF map in multi-thread tracing
Error case 4:
# ./perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
ERROR: Apply config to BPF failed: Doesn't support inherit event (Hint: use -i to turn off inherit)
Error case 5:
# ./perf record -i -e raw_syscalls:sys_enter -e './test_bpf_map_2.c/map:pmu_map.event=raw_syscalls:sys_enter/' ls
ERROR: Apply config to BPF failed: Can only put raw, hardware and BPF output event into a BPF map
Error case 6:
# ./perf record -i -e './test_bpf_map_2.c/map:pmu_map.event=123/' ls /
event syntax error: '.._map.event=123/'
\___ Incorrect value type for map
[SNIP]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/bpf-loader.c | 163 +++++++++++++++++++++++++++++++++++++++--
tools/perf/util/bpf-loader.h | 5 ++
tools/perf/util/evlist.c | 16 ++++
tools/perf/util/evlist.h | 3 +
tools/perf/util/parse-events.c | 15 ++--
tools/perf/util/parse-events.h | 1 +
6 files changed, 190 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index dbbd17c..deacb95 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -742,6 +742,7 @@ int bpf__foreach_tev(struct bpf_object *obj,
enum bpf_map_op_type {
BPF_MAP_OP_SET_VALUE,
+ BPF_MAP_OP_SET_EVSEL,
};
enum bpf_map_key_type {
@@ -754,6 +755,7 @@ struct bpf_map_op {
enum bpf_map_key_type key_type;
union {
u64 value;
+ struct perf_evsel *evsel;
} v;
};
@@ -838,6 +840,24 @@ bpf_map__add_op(struct bpf_map *map, struct bpf_map_op *op)
return 0;
}
+static struct bpf_map_op *
+bpf_map__add_newop(struct bpf_map *map)
+{
+ struct bpf_map_op *op;
+ int err;
+
+ op = bpf_map_op__new();
+ if (IS_ERR(op))
+ return op;
+
+ err = bpf_map__add_op(map, op);
+ if (err) {
+ bpf_map_op__delete(op);
+ return ERR_PTR(err);
+ }
+ return op;
+}
+
static int
__bpf_map__config_value(struct bpf_map *map,
struct parse_events_term *term)
@@ -876,16 +896,12 @@ __bpf_map__config_value(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
}
- op = bpf_map_op__new();
+ op = bpf_map__add_newop(map);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_VALUE;
op->v.value = term->val.num;
-
- err = bpf_map__add_op(map, op);
- if (err)
- bpf_map_op__delete(op);
- return err;
+ return 0;
}
static int
@@ -899,13 +915,75 @@ bpf_map__config_value(struct bpf_map *map,
}
if (term->type_val != PARSE_EVENTS__TERM_TYPE_NUM) {
- pr_debug("ERROR: wrong value type\n");
+ pr_debug("ERROR: wrong value type for 'value'\n");
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
}
return __bpf_map__config_value(map, term);
}
+static int
+__bpf_map__config_event(struct bpf_map *map,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel;
+ struct bpf_map_def def;
+ struct bpf_map_op *op;
+ const char *map_name;
+ int err;
+
+ map_name = bpf_map__get_name(map);
+ evsel = perf_evlist__find_evsel_by_str(evlist, term->val.str);
+ if (!evsel) {
+ pr_debug("Event (for '%s') '%s' doesn't exist\n",
+ map_name, term->val.str);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_NOEVT;
+ }
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("Unable to get map definition from '%s'\n",
+ map_name);
+ return err;
+ }
+
+ /*
+ * No need to check key_size and value_size:
+ * kernel has already checked them.
+ */
+ if (def.type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
+ pr_debug("Map %s type is not BPF_MAP_TYPE_PERF_EVENT_ARRAY\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
+ }
+
+ op = bpf_map__add_newop(map);
+ if (IS_ERR(op))
+ return PTR_ERR(op);
+ op->op_type = BPF_MAP_OP_SET_EVSEL;
+ op->v.evsel = evsel;
+ return 0;
+}
+
+static int
+bpf_map__config_event(struct bpf_map *map,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist)
+{
+ if (!term->err_val) {
+ pr_debug("Config value not set\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_CONF;
+ }
+
+ if (term->type_val != PARSE_EVENTS__TERM_TYPE_STR) {
+ pr_debug("ERROR: wrong value type for 'event'\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
+ }
+
+ return __bpf_map__config_event(map, term, evlist);
+}
+
struct bpf_obj_config__map_func {
const char *config_opt;
int (*config_func)(struct bpf_map *, struct parse_events_term *,
@@ -914,6 +992,7 @@ struct bpf_obj_config__map_func {
struct bpf_obj_config__map_func bpf_obj_config__map_funcs[] = {
{"value", bpf_map__config_value},
+ {"event", bpf_map__config_event},
};
static int
@@ -1057,6 +1136,7 @@ bpf_map_config_foreach_key(struct bpf_map *map,
list_for_each_entry(op, &priv->ops_list, list) {
switch (def.type) {
case BPF_MAP_TYPE_ARRAY:
+ case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
switch (op->key_type) {
case BPF_MAP_KEY_ALL:
err = foreach_key_array_all(func, arg, name,
@@ -1115,6 +1195,60 @@ apply_config_value_for_key(int map_fd, void *pkey,
}
static int
+apply_config_evsel_for_key(const char *name, int map_fd, void *pkey,
+ struct perf_evsel *evsel)
+{
+ struct xyarray *xy = evsel->fd;
+ struct perf_event_attr *attr;
+ unsigned int key, events;
+ bool check_pass = false;
+ int *evt_fd;
+ int err;
+
+ if (!xy) {
+ pr_debug("ERROR: evsel not ready for map %s\n", name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ if (xy->row_size / xy->entry_size != 1) {
+ pr_debug("ERROR: Dimension of target event is incorrect for map %s\n",
+ name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM;
+ }
+
+ attr = &evsel->attr;
+ if (attr->inherit) {
+ pr_debug("ERROR: Can't put inherit event into map %s\n", name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH;
+ }
+
+ if (attr->type == PERF_TYPE_RAW)
+ check_pass = true;
+ if (attr->type == PERF_TYPE_HARDWARE)
+ check_pass = true;
+ if (attr->type == PERF_TYPE_SOFTWARE &&
+ attr->config == PERF_COUNT_SW_BPF_OUTPUT)
+ check_pass = true;
+ if (!check_pass) {
+ pr_debug("ERROR: Event type is wrong for map %s\n", name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE;
+ }
+
+ events = xy->entries / (xy->row_size / xy->entry_size);
+ key = *((unsigned int *)pkey);
+ if (key >= events) {
+ pr_debug("ERROR: there is no event %d for map %s\n",
+ key, name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_MAPSIZE;
+ }
+ evt_fd = xyarray__entry(xy, key, 0);
+ err = bpf_map_update_elem(map_fd, pkey, evt_fd, BPF_ANY);
+ if (err && errno)
+ err = -errno;
+ return err;
+}
+
+static int
apply_obj_config_map_for_key(const char *name, int map_fd,
struct bpf_map_def *pdef __maybe_unused,
struct bpf_map_op *op,
@@ -1128,6 +1262,10 @@ apply_obj_config_map_for_key(const char *name, int map_fd,
pdef->value_size,
op->v.value);
break;
+ case BPF_MAP_OP_SET_EVSEL:
+ err = apply_config_evsel_for_key(name, map_fd, pkey,
+ op->v.evsel);
+ break;
default:
pr_debug("ERROR: unknown value type for '%s'\n", name);
err = -BPF_LOADER_ERRNO__INTERNAL;
@@ -1193,6 +1331,11 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(OBJCONF_MAP_TYPE)] = "Incorrect map type",
[ERRCODE_OFFSET(OBJCONF_MAP_KEYSIZE)] = "Incorrect map key size",
[ERRCODE_OFFSET(OBJCONF_MAP_VALUESIZE)] = "Incorrect map value size",
+ [ERRCODE_OFFSET(OBJCONF_MAP_NOEVT)] = "Event not found for map setting",
+ [ERRCODE_OFFSET(OBJCONF_MAP_MAPSIZE)] = "Invalid map size for event setting",
+ [ERRCODE_OFFSET(OBJCONF_MAP_EVTDIM)] = "Event dimension too large",
+ [ERRCODE_OFFSET(OBJCONF_MAP_EVTINH)] = "Doesn't support inherit event",
+ [ERRCODE_OFFSET(OBJCONF_MAP_EVTTYPE)] = "Wrong event type for map",
};
static int
@@ -1329,6 +1472,12 @@ int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
int bpf__strerror_apply_obj_config(int err, char *buf, size_t size)
{
bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM,
+ "Cannot set event to BPF map in multi-thread tracing");
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH,
+ "%s (Hint: use -i to turn off inherit)", emsg);
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE,
+ "Can only put raw, hardware and BPF output event into a BPF map");
bpf__strerror_end(buf, size);
return 0;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 5d3b931..7c7689f 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -33,6 +33,11 @@ enum bpf_loader_errno {
BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE, /* Incorrect map type */
BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE, /* Incorrect map key size */
BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE,/* Incorrect map value size */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_NOEVT, /* Event not found for map setting */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_MAPSIZE, /* Invalid map size for event setting */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM, /* Event dimension too large */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH, /* Doesn't support inherit event */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE, /* Wrong event type for map */
__BPF_LOADER_ERRNO__END,
};
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 0f57716..c42e196 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1741,3 +1741,19 @@ void perf_evlist__set_tracking_event(struct perf_evlist *evlist,
tracking_evsel->tracking = true;
}
+
+struct perf_evsel *
+perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
+ const char *str)
+{
+ struct perf_evsel *evsel;
+
+ evlist__for_each(evlist, evsel) {
+ if (!evsel->name)
+ continue;
+ if (strcmp(str, evsel->name) == 0)
+ return evsel;
+ }
+
+ return NULL;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 7c4d9a2..a0d1522 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -294,4 +294,7 @@ void perf_evlist__set_tracking_event(struct perf_evlist *evlist,
struct perf_evsel *tracking_evsel);
void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr);
+
+struct perf_evsel *
+perf_evlist__find_evsel_by_str(struct perf_evlist *evlist, const char *str);
#endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a5dd670..5909fd2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -697,14 +697,16 @@ parse_events_config_bpf(struct parse_events_evlist *data,
return -EINVAL;
}
- err = bpf__config_obj(obj, term, NULL, &error_pos);
+ err = bpf__config_obj(obj, term, data->evlist, &error_pos);
if (err) {
- bpf__strerror_config_obj(obj, term, NULL,
+ bpf__strerror_config_obj(obj, term, data->evlist,
&error_pos, err, errbuf,
sizeof(errbuf));
data->error->help = strdup(
-"Hint:\tValid config term:\n"
+"Hint:\tValid config terms:\n"
" \tmap:[<arraymap>].value=[value]\n"
+" \tmap:[<eventmap>].event=[event]\n"
+"\n"
" \t(add -v to see detail)");
data->error->str = strdup(errbuf);
if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
@@ -1530,9 +1532,10 @@ int parse_events(struct perf_evlist *evlist, const char *str,
struct parse_events_error *err)
{
struct parse_events_evlist data = {
- .list = LIST_HEAD_INIT(data.list),
- .idx = evlist->nr_entries,
- .error = err,
+ .list = LIST_HEAD_INIT(data.list),
+ .idx = evlist->nr_entries,
+ .error = err,
+ .evlist = evlist,
};
int ret;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index c48377a..e036969 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -99,6 +99,7 @@ struct parse_events_evlist {
int idx;
int nr_groups;
struct parse_events_error *error;
+ struct perf_evlist *evlist;
};
struct parse_events_terms {
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 07/48] perf tools: Support setting different slots in a BPF map separately
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (5 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 06/48] perf tools: Enable passing event to BPF object Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:40 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 08/48] perf tools: Enable indices setting syntax for BPF map Wang Nan
` (40 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
This patch introduces basic facilities to support config different
slots in a BPF map one by one.
array.nr_ranges and array.ranges are introduced into 'struct
parse_events_term', where ranges is an array of indices range (start,
length) which will be configured by this config term. nr_ranges
is the size of the array. The array is passed to 'struct bpf_map_priv'.
To indicate the new type of configuration, BPF_MAP_KEY_RANGES is
added as a new key type. bpf_map_config_foreach_key() is extended to
iterate over those indices instead of all possible keys.
Code in this commit will be enabled by following commit which enables
the indices syntax for array configuration.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/bpf-loader.c | 128 ++++++++++++++++++++++++++++++++++++++---
tools/perf/util/bpf-loader.h | 1 +
tools/perf/util/parse-events.c | 7 +++
tools/perf/util/parse-events.h | 10 ++++
4 files changed, 137 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index deacb95..44824e3 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -17,6 +17,7 @@
#include "llvm-utils.h"
#include "probe-event.h"
#include "probe-finder.h" // for MAX_PROBES
+#include "parse-events.h"
#include "llvm-utils.h"
#define DEFINE_PRINT_FN(name, level) \
@@ -747,6 +748,7 @@ enum bpf_map_op_type {
enum bpf_map_key_type {
BPF_MAP_KEY_ALL,
+ BPF_MAP_KEY_RANGES,
};
struct bpf_map_op {
@@ -754,6 +756,9 @@ struct bpf_map_op {
enum bpf_map_op_type op_type;
enum bpf_map_key_type key_type;
union {
+ struct parse_events_array array;
+ } k;
+ union {
u64 value;
struct perf_evsel *evsel;
} v;
@@ -768,6 +773,8 @@ bpf_map_op__delete(struct bpf_map_op *op)
{
if (!list_empty(&op->list))
list_del(&op->list);
+ if (op->key_type == BPF_MAP_KEY_RANGES)
+ parse_events__clear_array(&op->k.array);
free(op);
}
@@ -792,10 +799,33 @@ bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
free(priv);
}
+static int
+bpf_map_op_setkey(struct bpf_map_op *op, struct parse_events_term *term)
+{
+ op->key_type = BPF_MAP_KEY_ALL;
+ if (!term)
+ return 0;
+
+ if (term->array.nr_ranges) {
+ size_t memsz = term->array.nr_ranges *
+ sizeof(op->k.array.ranges[0]);
+
+ op->k.array.ranges = memdup(term->array.ranges, memsz);
+ if (!op->k.array.ranges) {
+ pr_debug("No enough memory to alloc indices for map\n");
+ return -ENOMEM;
+ }
+ op->key_type = BPF_MAP_KEY_RANGES;
+ op->k.array.nr_ranges = term->array.nr_ranges;
+ }
+ return 0;
+}
+
static struct bpf_map_op *
-bpf_map_op__new(void)
+bpf_map_op__new(struct parse_events_term *term)
{
struct bpf_map_op *op;
+ int err;
op = zalloc(sizeof(*op));
if (!op) {
@@ -804,7 +834,11 @@ bpf_map_op__new(void)
}
INIT_LIST_HEAD(&op->list);
- op->key_type = BPF_MAP_KEY_ALL;
+ err = bpf_map_op_setkey(op, term);
+ if (err) {
+ free(op);
+ return ERR_PTR(err);
+ }
return op;
}
@@ -841,12 +875,12 @@ bpf_map__add_op(struct bpf_map *map, struct bpf_map_op *op)
}
static struct bpf_map_op *
-bpf_map__add_newop(struct bpf_map *map)
+bpf_map__add_newop(struct bpf_map *map, struct parse_events_term *term)
{
struct bpf_map_op *op;
int err;
- op = bpf_map_op__new();
+ op = bpf_map_op__new(term);
if (IS_ERR(op))
return op;
@@ -896,7 +930,7 @@ __bpf_map__config_value(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
}
- op = bpf_map__add_newop(map);
+ op = bpf_map__add_newop(map, term);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_VALUE;
@@ -958,7 +992,7 @@ __bpf_map__config_event(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
}
- op = bpf_map__add_newop(map);
+ op = bpf_map__add_newop(map, term);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_EVSEL;
@@ -996,6 +1030,44 @@ struct bpf_obj_config__map_func bpf_obj_config__map_funcs[] = {
};
static int
+config_map_indices_range_check(struct parse_events_term *term,
+ struct bpf_map *map,
+ const char *map_name)
+{
+ struct parse_events_array *array = &term->array;
+ struct bpf_map_def def;
+ unsigned int i;
+ int err;
+
+ if (!array->nr_ranges)
+ return 0;
+ if (!array->ranges) {
+ pr_debug("ERROR: map %s: array->nr_ranges is %d but range array is NULL\n",
+ map_name, (int)array->nr_ranges);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("ERROR: Unable to get map definition from '%s'\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ for (i = 0; i < array->nr_ranges; i++) {
+ unsigned int start = array->ranges[i].start;
+ size_t length = array->ranges[i].length;
+ unsigned int idx = start + length - 1;
+
+ if (idx >= def.max_entries) {
+ pr_debug("ERROR: index %d too large\n", idx);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_IDX2BIG;
+ }
+ }
+ return 0;
+}
+
+static int
bpf__obj_config_map(struct bpf_object *obj,
struct parse_events_term *term,
struct perf_evlist *evlist,
@@ -1030,7 +1102,12 @@ bpf__obj_config_map(struct bpf_object *obj,
goto out;
}
- *key_scan_pos += map_opt - map_name;
+ *key_scan_pos += strlen(map_opt);
+ err = config_map_indices_range_check(term, map, map_name);
+ if (err)
+ goto out;
+ *key_scan_pos -= strlen(map_opt);
+
for (i = 0; i < ARRAY_SIZE(bpf_obj_config__map_funcs); i++) {
struct bpf_obj_config__map_func *func =
&bpf_obj_config__map_funcs[i];
@@ -1100,6 +1177,33 @@ foreach_key_array_all(map_config_func_t func,
}
static int
+foreach_key_array_ranges(map_config_func_t func, void *arg,
+ const char *name, int map_fd,
+ struct bpf_map_def *pdef,
+ struct bpf_map_op *op)
+{
+ unsigned int i, j;
+ int err;
+
+ for (i = 0; i < op->k.array.nr_ranges; i++) {
+ unsigned int start = op->k.array.ranges[i].start;
+ size_t length = op->k.array.ranges[i].length;
+
+ for (j = 0; j < length; j++) {
+ unsigned int idx = start + j;
+
+ err = func(name, map_fd, pdef, op, &idx, arg);
+ if (err) {
+ pr_debug("ERROR: failed to insert value to %s[%u]\n",
+ name, idx);
+ return err;
+ }
+ }
+ }
+ return 0;
+}
+
+static int
bpf_map_config_foreach_key(struct bpf_map *map,
map_config_func_t func,
void *arg)
@@ -1141,14 +1245,19 @@ bpf_map_config_foreach_key(struct bpf_map *map,
case BPF_MAP_KEY_ALL:
err = foreach_key_array_all(func, arg, name,
map_fd, &def, op);
- if (err)
- return err;
+ break;
+ case BPF_MAP_KEY_RANGES:
+ err = foreach_key_array_ranges(func, arg, name,
+ map_fd, &def,
+ op);
break;
default:
pr_debug("ERROR: keytype for map '%s' invalid\n",
name);
return -BPF_LOADER_ERRNO__INTERNAL;
}
+ if (err)
+ return err;
break;
default:
pr_debug("ERROR: type of '%s' incorrect\n", name);
@@ -1336,6 +1445,7 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(OBJCONF_MAP_EVTDIM)] = "Event dimension too large",
[ERRCODE_OFFSET(OBJCONF_MAP_EVTINH)] = "Doesn't support inherit event",
[ERRCODE_OFFSET(OBJCONF_MAP_EVTTYPE)] = "Wrong event type for map",
+ [ERRCODE_OFFSET(OBJCONF_MAP_IDX2BIG)] = "Index too large",
};
static int
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 7c7689f..be43119 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -38,6 +38,7 @@ enum bpf_loader_errno {
BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM, /* Event dimension too large */
BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH, /* Doesn't support inherit event */
BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE, /* Wrong event type for map */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_IDX2BIG, /* Index too large */
__BPF_LOADER_ERRNO__END,
};
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5909fd2..697d350 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2211,6 +2211,8 @@ void parse_events_terms__purge(struct list_head *terms)
struct parse_events_term *term, *h;
list_for_each_entry_safe(term, h, terms, list) {
+ if (term->array.nr_ranges)
+ free(term->array.ranges);
list_del_init(&term->list);
free(term);
}
@@ -2224,6 +2226,11 @@ void parse_events_terms__delete(struct list_head *terms)
free(terms);
}
+void parse_events__clear_array(struct parse_events_array *a)
+{
+ free(a->ranges);
+}
+
void parse_events_evlist_error(struct parse_events_evlist *data,
int idx, const char *str)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e036969..e445622 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -72,8 +72,17 @@ enum {
__PARSE_EVENTS__TERM_TYPE_NR,
};
+struct parse_events_array {
+ size_t nr_ranges;
+ struct {
+ unsigned int start;
+ size_t length;
+ } *ranges;
+};
+
struct parse_events_term {
char *config;
+ struct parse_events_array array;
union {
char *str;
u64 num;
@@ -120,6 +129,7 @@ int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term);
void parse_events_terms__delete(struct list_head *terms);
void parse_events_terms__purge(struct list_head *terms);
+void parse_events__clear_array(struct parse_events_array *a);
int parse_events__modifier_event(struct list_head *list, char *str, bool add);
int parse_events__modifier_group(struct list_head *list, char *event_mod);
int parse_events_name(struct list_head *list, char *name);
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 08/48] perf tools: Enable indices setting syntax for BPF map
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (6 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 07/48] perf tools: Support setting different slots in a BPF map separately Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:40 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 09/48] perf tools: Pass tracepoint options to BPF script Wang Nan
` (39 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
This patch introduces a new syntax to perf event parser:
# perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
By utilizing the basic facilities in bpf-loader.c which allow setting
different slots in a BPF map separately, the newly introduced syntax
allows perf to control specific elements in a BPF map.
Test result:
# cat ./test_bpf_map_3.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(unsigned char),
.max_entries = 100,
};
SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
int func(void *ctx, int err, long nsec)
{
char fmt[] = "%ld\n";
long usec = nsec * 0x10624dd3 >> 38; // nsec / 1000
int key = (int)usec;
unsigned char *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), (unsigned char)*pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Normal case:
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
usleep-655 [006] d... 2745434.122814: : 102
usleep-904 [006] d... 2745439.916264: : 103
# ./perf record -e './test_bpf_map_3.c/map:channel.value[all]=104/' usleep 99
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
usleep-655 [006] d... 2745434.122814: : 102
usleep-904 [006] d... 2745439.916264: : 103
usleep-1537 [003] d... 2745538.053737: : 104
Error case:
# ./perf record -e './test_bpf_map_3.c/map:channel.value[10...1000]=104/' usleep 99
event syntax error: '..annel.value[10...1000]=104/'
\___ Index too large
Hint: Valid config terms:
map:[<arraymap>].value<indices>=[value]
map:[<eventmap>].event<indices>=[event]
where <indices> is something like [0,3...5] or [all]
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/parse-events.c | 5 ++-
tools/perf/util/parse-events.l | 13 ++++++-
tools/perf/util/parse-events.y | 85 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 100 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 697d350..6e2f203 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -704,9 +704,10 @@ parse_events_config_bpf(struct parse_events_evlist *data,
sizeof(errbuf));
data->error->help = strdup(
"Hint:\tValid config terms:\n"
-" \tmap:[<arraymap>].value=[value]\n"
-" \tmap:[<eventmap>].event=[event]\n"
+" \tmap:[<arraymap>].value<indices>=[value]\n"
+" \tmap:[<eventmap>].event<indices>=[event]\n"
"\n"
+" \twhere <indices> is something like [0,3...5] or [all]\n"
" \t(add -v to see detail)");
data->error->str = strdup(errbuf);
if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 0cc6b84..fb85d03 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -9,8 +9,8 @@
%{
#include <errno.h>
#include "../perf.h"
-#include "parse-events-bison.h"
#include "parse-events.h"
+#include "parse-events-bison.h"
char *parse_events_get_text(yyscan_t yyscanner);
YYSTYPE *parse_events_get_lval(yyscan_t yyscanner);
@@ -111,6 +111,7 @@ do { \
%x mem
%s config
%x event
+%x array
group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
@@ -176,6 +177,14 @@ modifier_bp [rwx]{1,3}
}
+<array>{
+"]" { BEGIN(config); return ']'; }
+{num_dec} { return value(yyscanner, 10); }
+{num_hex} { return value(yyscanner, 16); }
+, { return ','; }
+"\.\.\." { return PE_ARRAY_RANGE; }
+}
+
<config>{
/*
* Please update config_term_names when new static term is added.
@@ -195,6 +204,8 @@ no-inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
+\[all\] { return PE_ARRAY_ALL; }
+"[" { BEGIN(array); return '['; }
}
<mem>{
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 0e2d433..d1fbcab 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -48,6 +48,7 @@ static inc_group_count(struct list_head *list,
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
%token PE_ERROR
%token PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_KERNEL_PMU_EVENT
+%token PE_ARRAY_ALL PE_ARRAY_RANGE
%type <num> PE_VALUE
%type <num> PE_VALUE_SYM_HW
%type <num> PE_VALUE_SYM_SW
@@ -83,6 +84,9 @@ static inc_group_count(struct list_head *list,
%type <head> group_def
%type <head> group
%type <head> groups
+%type <array> array
+%type <array> array_term
+%type <array> array_terms
%union
{
@@ -94,6 +98,7 @@ static inc_group_count(struct list_head *list,
char *sys;
char *event;
} tracepoint_name;
+ struct parse_events_array array;
}
%%
@@ -572,6 +577,86 @@ PE_TERM
ABORT_ON(parse_events_term__num(&term, (int)$1, NULL, 1, &@1, NULL));
$$ = term;
}
+|
+PE_NAME array '=' PE_NAME
+{
+ struct parse_events_term *term;
+ int i;
+
+ ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER,
+ $1, $4, &@1, &@4));
+
+ term->array = $2;
+ $$ = term;
+}
+|
+PE_NAME array '=' PE_VALUE
+{
+ struct parse_events_term *term;
+
+ ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER,
+ $1, $4, &@1, &@4));
+ term->array = $2;
+ $$ = term;
+}
+
+array:
+'[' array_terms ']'
+{
+ $$ = $2;
+}
+|
+PE_ARRAY_ALL
+{
+ $$.nr_ranges = 0;
+ $$.ranges = NULL;
+}
+
+array_terms:
+array_terms ',' array_term
+{
+ struct parse_events_array new_array;
+
+ new_array.nr_ranges = $1.nr_ranges + $3.nr_ranges;
+ new_array.ranges = malloc(sizeof(new_array.ranges[0]) *
+ new_array.nr_ranges);
+ ABORT_ON(!new_array.ranges);
+ memcpy(&new_array.ranges[0], $1.ranges,
+ $1.nr_ranges * sizeof(new_array.ranges[0]));
+ memcpy(&new_array.ranges[$1.nr_ranges], $3.ranges,
+ $3.nr_ranges * sizeof(new_array.ranges[0]));
+ free($1.ranges);
+ free($3.ranges);
+ $$ = new_array;
+}
+|
+array_term
+
+array_term:
+PE_VALUE
+{
+ struct parse_events_array array;
+
+ array.nr_ranges = 1;
+ array.ranges = malloc(sizeof(array.ranges[0]));
+ ABORT_ON(!array.ranges);
+ array.ranges[0].start = $1;
+ array.ranges[0].length = 1;
+ $$ = array;
+}
+|
+PE_VALUE PE_ARRAY_RANGE PE_VALUE
+{
+ struct parse_events_array array;
+
+ ABORT_ON($3 < $1);
+ array.nr_ranges = 1;
+ array.ranges = malloc(sizeof(array.ranges[0]));
+ ABORT_ON(!array.ranges);
+ array.ranges[0].start = $1;
+ array.ranges[0].length = $3 - $1 + 1;
+ $$ = array;
+}
sep_dc: ':' |
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 09/48] perf tools: Pass tracepoint options to BPF script
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (7 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 08/48] perf tools: Enable indices setting syntax for BPF map Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-25 5:41 ` [tip:perf/core] perf tools: Apply tracepoint event definition " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 10/48] perf tools: Introduce bpf-output event Wang Nan
` (38 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Users can pass options to tracepoints defined in the BPF script.
For example:
# perf record -e ./test.c/no-inherit/ bash
# dd if=/dev/zero of=/dev/null count=10000
# exit
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.022 MB perf.data (139 samples) ]
(no-inherit works, only sys_read issued by bash is captured, at least
10000 sys_read issued by dd is skipped.)
test.c:
#define SEC(NAME) __attribute__((section(NAME), used))
SEC("func=sys_read")
int bpf_func__sys_read(void *ctx)
{
return 1;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
no-inherit is applied to the kprobe event defined in test.c.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/tests/bpf.c | 2 +-
tools/perf/util/parse-events.c | 56 +++++++++++++++++++++++++++++++++++++-----
tools/perf/util/parse-events.h | 3 ++-
3 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
index 4aed5cb..199501c 100644
--- a/tools/perf/tests/bpf.c
+++ b/tools/perf/tests/bpf.c
@@ -112,7 +112,7 @@ static int do_test(struct bpf_object *obj, int (*func)(void),
parse_evlist.error = &parse_error;
INIT_LIST_HEAD(&parse_evlist.list);
- err = parse_events_load_bpf_obj(&parse_evlist, &parse_evlist.list, obj);
+ err = parse_events_load_bpf_obj(&parse_evlist, &parse_evlist.list, obj, NULL);
if (err || list_empty(&parse_evlist.list)) {
pr_debug("Failed to add events selected by BPF\n");
return TEST_FAIL;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6e2f203..4c19d5e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -581,6 +581,7 @@ static int add_tracepoint_multi_sys(struct list_head *list, int *idx,
struct __add_bpf_event_param {
struct parse_events_evlist *data;
struct list_head *list;
+ struct list_head *head_config;
};
static int add_bpf_event(struct probe_trace_event *tev, int fd,
@@ -597,7 +598,8 @@ static int add_bpf_event(struct probe_trace_event *tev, int fd,
tev->group, tev->event, fd);
err = parse_events_add_tracepoint(&new_evsels, &evlist->idx, tev->group,
- tev->event, evlist->error, NULL);
+ tev->event, evlist->error,
+ param->head_config);
if (err) {
struct perf_evsel *evsel, *tmp;
@@ -622,11 +624,12 @@ static int add_bpf_event(struct probe_trace_event *tev, int fd,
int parse_events_load_bpf_obj(struct parse_events_evlist *data,
struct list_head *list,
- struct bpf_object *obj)
+ struct bpf_object *obj,
+ struct list_head *head_config)
{
int err;
char errbuf[BUFSIZ];
- struct __add_bpf_event_param param = {data, list};
+ struct __add_bpf_event_param param = {data, list, head_config};
static bool registered_unprobe_atexit = false;
if (IS_ERR(obj) || !obj) {
@@ -720,14 +723,47 @@ parse_events_config_bpf(struct parse_events_evlist *data,
return 0;
}
+/*
+ * Split config terms:
+ * perf record -e bpf.c/call-graph=fp,map:array.value[0]=1/ ...
+ * 'call-graph=fp' is 'evt config', should be applied to each
+ * events in bpf.c.
+ * 'map:array.value[0]=1' is 'obj config', should be processed
+ * with parse_events_config_bpf.
+ *
+ * Move object config terms from the first list to obj_head_config.
+ */
+static void
+split_bpf_config_terms(struct list_head *evt_head_config,
+ struct list_head *obj_head_config)
+{
+ struct parse_events_term *term, *temp;
+
+ /*
+ * Currectly, all possible user config term
+ * belong to bpf object. parse_events__is_hardcoded_term()
+ * happends to be a good flag.
+ *
+ * See parse_events_config_bpf() and
+ * config_term_tracepoint().
+ */
+ list_for_each_entry_safe(term, temp, evt_head_config, list)
+ if (!parse_events__is_hardcoded_term(term))
+ list_move_tail(&term->list, obj_head_config);
+}
+
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
char *bpf_file_name,
bool source,
struct list_head *head_config)
{
- struct bpf_object *obj;
int err;
+ struct bpf_object *obj;
+ LIST_HEAD(obj_head_config);
+
+ if (head_config)
+ split_bpf_config_terms(head_config, &obj_head_config);
obj = bpf__prepare_load(bpf_file_name, source);
if (IS_ERR(obj)) {
@@ -749,10 +785,18 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
return err;
}
- err = parse_events_load_bpf_obj(data, list, obj);
+ err = parse_events_load_bpf_obj(data, list, obj, head_config);
if (err)
return err;
- return parse_events_config_bpf(data, obj, head_config);
+ err = parse_events_config_bpf(data, obj, &obj_head_config);
+
+ /*
+ * Caller doesn't know anything about obj_head_config,
+ * so combine them together again before returnning.
+ */
+ if (head_config)
+ list_splice_tail(&obj_head_config, head_config);
+ return err;
}
static int
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e445622..67e4930 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -146,7 +146,8 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
struct bpf_object;
int parse_events_load_bpf_obj(struct parse_events_evlist *data,
struct list_head *list,
- struct bpf_object *obj);
+ struct bpf_object *obj,
+ struct list_head *head_config);
int parse_events_add_numeric(struct parse_events_evlist *data,
struct list_head *list,
u32 type, u64 config,
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (8 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 09/48] perf tools: Pass tracepoint options to BPF script Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-23 17:45 ` Arnaldo Carvalho de Melo
2016-02-25 5:41 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output() Wang Nan
` (37 subsequent siblings)
47 siblings, 2 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
bpf_perf_event_output() helper) add a helper to enable BPF program
output data to perf ring buffer through a new type of perf event
PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
event of that type. Now perf user can use following cmdline to
receive output data from BPF programs:
# ./perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/ ls /
# ./perf script
perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
...
Test result:
# cat ./test_bpf_output.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
#define SEC(NAME) __attribute__((section(NAME), used))
static u64 (*ktime_get_ns)(void) =
(void *)BPF_FUNC_ktime_get_ns;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
(void *)BPF_FUNC_perf_event_output;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
SEC("func_write=sys_write")
int func_write(void *ctx)
{
struct {
u64 ktime;
int cpuid;
} __attribute__((packed)) output_data;
char error_data[] = "Error: failed to output: %d\n";
output_data.cpuid = get_smp_processor_id();
output_data.ktime = ktime_get_ns();
int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
&output_data, sizeof(output_data));
if (err)
trace_printk(error_data, sizeof(error_data), err);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************ END ***************************/
# ./perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/ ls /
# ./perf script | grep ls
ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/bpf-loader.c | 5 ++---
tools/perf/util/evsel.c | 5 +++++
tools/perf/util/evsel.h | 8 ++++++++
tools/perf/util/parse-events.l | 1 +
4 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 44824e3..0967ce6 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -1331,13 +1331,12 @@ apply_config_evsel_for_key(const char *name, int map_fd, void *pkey,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH;
}
+ if (perf_evsel__is_bpf_output(evsel))
+ check_pass = true;
if (attr->type == PERF_TYPE_RAW)
check_pass = true;
if (attr->type == PERF_TYPE_HARDWARE)
check_pass = true;
- if (attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT)
- check_pass = true;
if (!check_pass) {
pr_debug("ERROR: Event type is wrong for map %s\n", name);
return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 6ae20d0..0902fe4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -225,6 +225,11 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
if (evsel != NULL)
perf_evsel__init(evsel, attr, idx);
+ if (perf_evsel__is_bpf_output(evsel)) {
+ evsel->attr.sample_type |= PERF_SAMPLE_RAW;
+ evsel->attr.sample_period = 1;
+ }
+
return evsel;
}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8e75434..efad78f 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -364,6 +364,14 @@ static inline bool perf_evsel__is_function_event(struct perf_evsel *evsel)
#undef FUNCTION_EVENT
}
+static inline bool perf_evsel__is_bpf_output(struct perf_evsel *evsel)
+{
+ struct perf_event_attr *attr = &evsel->attr;
+
+ return (attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+ (attr->type == PERF_TYPE_SOFTWARE);
+}
+
struct perf_attr_details {
bool freq;
bool verbose;
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index fb85d03..1477fbc 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -248,6 +248,7 @@ cpu-migrations|migrations { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COU
alignment-faults { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
emulation-faults { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
dummy { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
+bpf-output { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
/*
* We have to handle the kernel PMU event cycles-ct/cycles-t/mem-loads/mem-stores separately.
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output()
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (9 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 10/48] perf tools: Introduce bpf-output event Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-23 16:14 ` Arnaldo Carvalho de Melo
` (2 more replies)
2016-02-22 9:10 ` [PATCH 12/48] perf data: Explicitly set byte order for integer types Wang Nan
` (36 subsequent siblings)
47 siblings, 3 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
bpf_perf_event_output() outputs data through sample->raw_data. This
patch adds support to convert those data into CTF. A python script
then can be used to process output data from BPF programs.
Test result:
# cat ./test_bpf_output_2.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
#define SEC(NAME) __attribute__((section(NAME), used))
static u64 (*ktime_get_ns)(void) =
(void *)BPF_FUNC_ktime_get_ns;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
(void *)BPF_FUNC_perf_event_output;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
static inline int __attribute__((always_inline))
func(void *ctx, int type)
{
struct {
u64 ktime;
int type;
} __attribute__((packed)) output_data;
char error_data[] = "Error: failed to output\n";
int err;
output_data.type = type;
output_data.ktime = ktime_get_ns();
err = perf_event_output(ctx, &channel, get_smp_processor_id(),
&output_data, sizeof(output_data));
if (err)
trace_printk(error_data, sizeof(error_data));
return 0;
}
SEC("func_begin=sys_nanosleep")
int func_begin(void *ctx) {return func(ctx, 1);}
SEC("func_end=sys_nanosleep%return")
int func_end(void *ctx) { return func(ctx, 2);}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
# ./perf record -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output_2.c/map:channel.event=evt/ \
usleep 100000
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
# ./perf script
usleep 14942 92503.198504: evt: ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0....
usleep 14942 92503.298562: evt: ffffffff810585e9 kretprobe_trampoline_holder (/lib....
# ./perf data convert --to-ctf ./out.ctf
[ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
[ perf data convert: Converted and wrote 0.000 MB (2 samples) ]
# babeltrace ./out.ctf
[01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
[01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }
# cat ./test_bpf_output_2.py
from babeltrace import TraceCollection
tc = TraceCollection()
tc.add_trace('./out.ctf', 'ctf')
d = {1:[], 2:[]}
for event in tc.events:
if not event.name.startswith('evt'):
continue
raw_data = event['raw_data']
(time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
d[type].append(time)
print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));
# python3 ./test_bpf_output_2.py
[100056879]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/data-convert-bt.c | 112 +++++++++++++++++++++++++++++++++++++-
1 file changed, 111 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index b722e57..70f462d 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -352,6 +352,84 @@ static int add_tracepoint_values(struct ctf_writer *cw,
return ret;
}
+static int
+add_bpf_output_values(struct bt_ctf_event_class *event_class,
+ struct bt_ctf_event *event,
+ struct perf_sample *sample)
+{
+ struct bt_ctf_field_type *len_type, *seq_type;
+ struct bt_ctf_field *len_field, *seq_field;
+ unsigned int raw_size = sample->raw_size;
+ unsigned int nr_elements = raw_size / sizeof(u32);
+ unsigned int i;
+ int ret;
+
+ if (nr_elements * sizeof(u32) != raw_size)
+ pr_warning("Incorrect raw_size (%u) in bpf output event, skip %lu bytes\n",
+ raw_size, nr_elements * sizeof(u32) - raw_size);
+
+ len_type = bt_ctf_event_class_get_field_by_name(event_class, "raw_len");
+ len_field = bt_ctf_field_create(len_type);
+ if (!len_field) {
+ pr_err("failed to create 'raw_len' for bpf output event\n");
+ ret = -1;
+ goto put_len_type;
+ }
+
+ ret = bt_ctf_field_unsigned_integer_set_value(len_field, nr_elements);
+ if (ret) {
+ pr_err("failed to set field value for raw_len\n");
+ goto put_len_field;
+ }
+ ret = bt_ctf_event_set_payload(event, "raw_len", len_field);
+ if (ret) {
+ pr_err("failed to set payload to raw_len\n");
+ goto put_len_field;
+ }
+
+ seq_type = bt_ctf_event_class_get_field_by_name(event_class, "raw_data");
+ seq_field = bt_ctf_field_create(seq_type);
+ if (!seq_field) {
+ pr_err("failed to create 'raw_data' for bpf output event\n");
+ ret = -1;
+ goto put_seq_type;
+ }
+
+ ret = bt_ctf_field_sequence_set_length(seq_field, len_field);
+ if (ret) {
+ pr_err("failed to set length of 'raw_data'\n");
+ goto put_seq_field;
+ }
+
+ for (i = 0; i < nr_elements; i++) {
+ struct bt_ctf_field *elem_field =
+ bt_ctf_field_sequence_get_field(seq_field, i);
+
+ ret = bt_ctf_field_unsigned_integer_set_value(elem_field,
+ ((u32 *)(sample->raw_data))[i]);
+
+ bt_ctf_field_put(elem_field);
+ if (ret) {
+ pr_err("failed to set raw_data[%d]\n", i);
+ goto put_seq_field;
+ }
+ }
+
+ ret = bt_ctf_event_set_payload(event, "raw_data", seq_field);
+ if (ret)
+ pr_err("failed to set payload for raw_data\n");
+
+put_seq_field:
+ bt_ctf_field_put(seq_field);
+put_seq_type:
+ bt_ctf_field_type_put(seq_type);
+put_len_field:
+ bt_ctf_field_put(len_field);
+put_len_type:
+ bt_ctf_field_type_put(len_type);
+ return ret;
+}
+
static int add_generic_values(struct ctf_writer *cw,
struct bt_ctf_event *event,
struct perf_evsel *evsel,
@@ -597,6 +675,12 @@ static int process_sample_event(struct perf_tool *tool,
return -1;
}
+ if (perf_evsel__is_bpf_output(evsel)) {
+ ret = add_bpf_output_values(event_class, event, sample);
+ if (ret)
+ return -1;
+ }
+
cs = ctf_stream(cw, get_sample_cpu(cw, sample, evsel));
if (cs) {
if (is_flush_needed(cs))
@@ -744,6 +828,25 @@ static int add_tracepoint_types(struct ctf_writer *cw,
return ret;
}
+static int add_bpf_output_types(struct ctf_writer *cw,
+ struct bt_ctf_event_class *class)
+{
+ struct bt_ctf_field_type *len_type = cw->data.u32;
+ struct bt_ctf_field_type *seq_base_type = cw->data.u32_hex;
+ struct bt_ctf_field_type *seq_type;
+ int ret;
+
+ ret = bt_ctf_event_class_add_field(class, len_type, "raw_len");
+ if (ret)
+ return ret;
+
+ seq_type = bt_ctf_field_type_sequence_create(seq_base_type, "raw_len");
+ if (!seq_type)
+ return -1;
+
+ return bt_ctf_event_class_add_field(class, seq_type, "raw_data");
+}
+
static int add_generic_types(struct ctf_writer *cw, struct perf_evsel *evsel,
struct bt_ctf_event_class *event_class)
{
@@ -755,7 +858,8 @@ static int add_generic_types(struct ctf_writer *cw, struct perf_evsel *evsel,
* ctf event header
* PERF_SAMPLE_READ - TODO
* PERF_SAMPLE_CALLCHAIN - TODO
- * PERF_SAMPLE_RAW - tracepoint fields are handled separately
+ * PERF_SAMPLE_RAW - tracepoint fields and BPF output
+ * are handled separately
* PERF_SAMPLE_BRANCH_STACK - TODO
* PERF_SAMPLE_REGS_USER - TODO
* PERF_SAMPLE_STACK_USER - TODO
@@ -824,6 +928,12 @@ static int add_event(struct ctf_writer *cw, struct perf_evsel *evsel)
goto err;
}
+ if (perf_evsel__is_bpf_output(evsel)) {
+ ret = add_bpf_output_types(cw, event_class);
+ if (ret)
+ goto err;
+ }
+
ret = bt_ctf_stream_class_add_event_class(cw->stream_class, event_class);
if (ret) {
pr("Failed to add event class into stream.\n");
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 12/48] perf data: Explicitly set byte order for integer types
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (10 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output() Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 13/48] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
` (35 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
After babeltrace commit 5cec03e402aa ("ir: copy variants and
sequences when setting a field path"), 'perf data convert' gets
incorrect result if there's bpf output data. For example:
# perf data convert --to-ctf ./out.ctf
# babeltrace ./out.ctf
[10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] }
[10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] }
The expected result of the first sample should be:
raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] }
however, 'perf data convert' output big endian value to resuling CTF
file.
The reason is a internal change (or a bug?) of babeltrace.
Before this patch, at the first add_bpf_output_values(), byte order of
all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)).
It would be fixed by:
perf_evlist__deliver_sample
-> process_sample_event
-> ctf_stream
...
->bt_ctf_trace_add_stream_class
->bt_ctf_field_type_structure_set_byte_order
->bt_ctf_field_type_integer_set_byte_order
during creating the stream.
However, the babeltrace commit mentioned above duplicates types in
sequence to prevent potential conflict in following call stack and
link the newly allocated type into the 'raw_data' sequence:
perf_evlist__deliver_sample
-> process_sample_event
-> ctf_stream
...
-> bt_ctf_trace_add_stream_class
-> bt_ctf_stream_class_resolve_types
...
-> bt_ctf_field_type_sequence_copy
->bt_ctf_field_type_integer_copy
This happens before byte order setting, so only the newly allocated
type is initialized, the byte order of original type perf choose to
create the first raw_data is still uncertain.
Byte order in CTF output is not related to byte order in perf.data.
Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this
problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce
behavior changing, set byte order according to compiling options.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/data-convert-bt.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 70f462d..3f723f4a 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1080,6 +1080,12 @@ static struct bt_ctf_field_type *create_int_type(int size, bool sign, bool hex)
bt_ctf_field_type_integer_set_base(type, BT_CTF_INTEGER_BASE_HEXADECIMAL))
goto err;
+#if __BYTE_ORDER == __BIG_ENDIAN
+ bt_ctf_field_type_set_byte_order(type, BT_CTF_BYTE_ORDER_BIG_ENDIAN);
+#else
+ bt_ctf_field_type_set_byte_order(type, BT_CTF_BYTE_ORDER_LITTLE_ENDIAN);
+#endif
+
pr2("Created type: INTEGER %d-bit %ssigned %s\n",
size, sign ? "un" : "", hex ? "hex" : "");
return type;
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 13/48] perf core: Introduce new ioctl options to pause and resume ring buffer
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (11 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 12/48] perf data: Explicitly set byte order for integer types Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 14/48] perf core: Set event's default overflow_handler Wang Nan
` (34 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Add new ioctl() to pause/resume ring-buffer output.
In some situations we want to read from ring buffer only when we
ensure nothing can write to the ring buffer during reading. Without
this patch we have to turn off all events attached to this ring buffer
to achieve this.
This patch is for supporting overwrite ring buffer. Following
commits will introduce new methods support reading from overwrite ring
buffer. Before reading caller must ensure the ring buffer is frozen, or
the reading is unreliable.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 13 +++++++++++++
kernel/events/internal.h | 11 +++++++++++
kernel/events/ring_buffer.c | 7 ++++++-
4 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 1afe962..a3c1903 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -401,6 +401,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *)
#define PERF_EVENT_IOC_SET_BPF _IOW('$', 8, __u32)
+#define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW('$', 9, __u32)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 94c47e3..a7075ae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4231,6 +4231,19 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
case PERF_EVENT_IOC_SET_BPF:
return perf_event_set_bpf_prog(event, arg);
+ case PERF_EVENT_IOC_PAUSE_OUTPUT: {
+ struct ring_buffer *rb;
+
+ rcu_read_lock();
+ rb = rcu_dereference(event->rb);
+ if (!event->rb) {
+ rcu_read_unlock();
+ return -EINVAL;
+ }
+ rb_toggle_paused(rb, !!arg);
+ rcu_read_unlock();
+ return 0;
+ }
default:
return -ENOTTY;
}
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c..6a93d1b 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -18,6 +18,7 @@ struct ring_buffer {
#endif
int nr_pages; /* nr of data pages */
int overwrite; /* can overwrite itself */
+ int paused; /* can write into ring buffer */
atomic_t poll; /* POLL_ for wakeups */
@@ -65,6 +66,16 @@ static inline void rb_free_rcu(struct rcu_head *rcu_head)
rb_free(rb);
}
+static inline void
+rb_toggle_paused(struct ring_buffer *rb,
+ bool pause)
+{
+ if (!pause && rb->nr_pages)
+ rb->paused = 0;
+ else
+ rb->paused = 1;
+}
+
extern struct ring_buffer *
rb_alloc(int nr_pages, long watermark, int cpu, int flags);
extern void perf_event_wakeup(struct perf_event *event);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 1faad2c..22e1a47 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -125,8 +125,11 @@ int perf_output_begin(struct perf_output_handle *handle,
if (unlikely(!rb))
goto out;
- if (unlikely(!rb->nr_pages))
+ if (unlikely(rb->paused)) {
+ if (rb->nr_pages)
+ local_inc(&rb->lost);
goto out;
+ }
handle->rb = rb;
handle->event = event;
@@ -244,6 +247,8 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
INIT_LIST_HEAD(&rb->event_list);
spin_lock_init(&rb->event_lock);
init_irq_work(&rb->irq_work, rb_irq_work);
+
+ rb->paused = rb->nr_pages ? 0 : 1;
}
static void ring_buffer_put_async(struct ring_buffer *rb)
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 14/48] perf core: Set event's default overflow_handler
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (12 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 13/48] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 15/48] perf core: Prepare writing into ring buffer from end Wang Nan
` (33 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Set a default event->overflow_handler in perf_event_alloc() so don't
need checking event->overflow_handler in __perf_event_overflow().
Following commits can give a different default overflow_handler.
No extra performance introduced into hot path because in the original
code we still need reading this handler from memory. A conditional branch
is avoided so actually we remove some instructions.
Initial idea comes from Peter at [1].
[1] http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
kernel/events/core.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a7075ae..ae34061 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6392,10 +6392,7 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending);
}
- if (event->overflow_handler)
- event->overflow_handler(event, data, regs);
- else
- perf_event_output(event, data, regs);
+ event->overflow_handler(event, data, regs);
if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
@@ -7868,8 +7865,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
context = parent_event->overflow_handler_context;
}
- event->overflow_handler = overflow_handler;
- event->overflow_handler_context = context;
+ if (overflow_handler) {
+ event->overflow_handler = overflow_handler;
+ event->overflow_handler_context = context;
+ } else {
+ event->overflow_handler = perf_event_output;
+ event->overflow_handler_context = NULL;
+ }
perf_event__state_init(event);
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 15/48] perf core: Prepare writing into ring buffer from end
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (13 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 14/48] perf core: Set event's default overflow_handler Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 16/48] perf core: Add backward attribute to perf event Wang Nan
` (32 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Convert perf_output_begin to __perf_output_begin and make the later
function able to write records from the end of the ring buffer.
Following commits will utilize the 'backward' flag.
This patch doesn't introduce any extra performance overhead since we
use always_inline.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
kernel/events/ring_buffer.c | 42 ++++++++++++++++++++++++++++++++++++------
1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 22e1a47..37c11c6 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -102,8 +102,21 @@ out:
preempt_enable();
}
-int perf_output_begin(struct perf_output_handle *handle,
- struct perf_event *event, unsigned int size)
+static bool __always_inline
+ring_buffer_has_space(unsigned long head, unsigned long tail,
+ unsigned long data_size, unsigned int size,
+ bool backward)
+{
+ if (!backward)
+ return CIRC_SPACE(head, tail, data_size) >= size;
+ else
+ return CIRC_SPACE(tail, head, data_size) >= size;
+}
+
+static int __always_inline
+__perf_output_begin(struct perf_output_handle *handle,
+ struct perf_event *event, unsigned int size,
+ bool backward)
{
struct ring_buffer *rb;
unsigned long tail, offset, head;
@@ -146,9 +159,12 @@ int perf_output_begin(struct perf_output_handle *handle,
do {
tail = READ_ONCE(rb->user_page->data_tail);
offset = head = local_read(&rb->head);
- if (!rb->overwrite &&
- unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
- goto fail;
+ if (!rb->overwrite) {
+ if (unlikely(!ring_buffer_has_space(head, tail,
+ perf_data_size(rb),
+ size, backward)))
+ goto fail;
+ }
/*
* The above forms a control dependency barrier separating the
@@ -162,9 +178,17 @@ int perf_output_begin(struct perf_output_handle *handle,
* See perf_output_put_handle().
*/
- head += size;
+ if (!backward)
+ head += size;
+ else
+ head -= size;
} while (local_cmpxchg(&rb->head, offset, head) != offset);
+ if (backward) {
+ offset = head;
+ head = (u64)(-head);
+ }
+
/*
* We rely on the implied barrier() by local_cmpxchg() to ensure
* none of the data stores below can be lifted up by the compiler.
@@ -206,6 +230,12 @@ out:
return -ENOSPC;
}
+int perf_output_begin(struct perf_output_handle *handle,
+ struct perf_event *event, unsigned int size)
+{
+ return __perf_output_begin(handle, event, size, false);
+}
+
unsigned int perf_output_copy(struct perf_output_handle *handle,
const void *buf, unsigned int len)
{
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 16/48] perf core: Add backward attribute to perf event
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (14 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 15/48] perf core: Prepare writing into ring buffer from end Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 13:08 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 17/48] perf core: Reduce perf event output overhead by new overflow handler Wang Nan
` (31 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
In perf_event_attr a new bit 'write_backward' is appended to indicate
this event should write ring buffer from its end to beginning.
In perf_output_begin(), prepare ring buffer according this bit.
This patch introduces small overhead into perf_output_begin():
an extra memory read and a conditional branch. Further patch can remove
this overhead by using custom output handler.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
include/linux/perf_event.h | 5 +++++
include/uapi/linux/perf_event.h | 3 ++-
kernel/events/core.c | 7 +++++++
kernel/events/ring_buffer.c | 2 ++
4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b35a61a..0ce1015 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1029,6 +1029,11 @@ static inline bool has_aux(struct perf_event *event)
return event->pmu->setup_aux;
}
+static inline bool is_write_backward(struct perf_event *event)
+{
+ return !!event->attr.write_backward;
+}
+
extern int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size);
extern void perf_output_end(struct perf_output_handle *handle);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index a3c1903..43fc8d2 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -340,7 +340,8 @@ struct perf_event_attr {
comm_exec : 1, /* flag comm events that are due to an exec */
use_clockid : 1, /* use @clockid for time fields */
context_switch : 1, /* context switch data */
- __reserved_1 : 37;
+ write_backward : 1, /* Write ring buffer from end to beginning */
+ __reserved_1 : 36;
union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ae34061..9353154 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8101,6 +8101,13 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
goto out;
/*
+ * Either writing ring buffer from beginning or from end.
+ * Mixing is not allowed.
+ */
+ if (is_write_backward(output_event) != is_write_backward(event))
+ goto out;
+
+ /*
* If both events generate aux data, they must be on the same PMU
*/
if (has_aux(event) && has_aux(output_event) &&
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 37c11c6..80b1fa7 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -233,6 +233,8 @@ out:
int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
+ if (unlikely(is_write_backward(event)))
+ return __perf_output_begin(handle, event, size, true);
return __perf_output_begin(handle, event, size, false);
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 17/48] perf core: Reduce perf event output overhead by new overflow handler
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (15 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 16/48] perf core: Add backward attribute to perf event Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 18/48] perf tools: Only validate is_pos for tracking evsels Wang Nan
` (30 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
By creating onward and backward specific overflow handlers and setting
them according to event's backward setting, normal sampling events
don't need checking backward setting of an event any more.
This is the last patch of backward writing patchset. After this patch,
there's no extra overhead introduced to the fast path of sampling
output.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
include/linux/perf_event.h | 17 +++++++++++++++--
kernel/events/core.c | 41 ++++++++++++++++++++++++++++++++++++-----
kernel/events/ring_buffer.c | 12 ++++++++++++
3 files changed, 63 insertions(+), 7 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0ce1015..e466cc6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -827,9 +827,15 @@ extern int perf_event_overflow(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs);
+extern void perf_event_output_onward(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
+extern void perf_event_output_backward(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
extern void perf_event_output(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs);
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
extern void
perf_event_header__init_id(struct perf_event_header *header,
@@ -1036,6 +1042,13 @@ static inline bool is_write_backward(struct perf_event *event)
extern int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size);
+extern int perf_output_begin_onward(struct perf_output_handle *handle,
+ struct perf_event *event,
+ unsigned int size);
+extern int perf_output_begin_backward(struct perf_output_handle *handle,
+ struct perf_event *event,
+ unsigned int size);
+
extern void perf_output_end(struct perf_output_handle *handle);
extern unsigned int perf_output_copy(struct perf_output_handle *handle,
const void *buf, unsigned int len);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9353154..ce70f54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5531,9 +5531,13 @@ void perf_prepare_sample(struct perf_event_header *header,
}
}
-void perf_event_output(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static void __always_inline
+__perf_event_output(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs,
+ int (*output_begin)(struct perf_output_handle *,
+ struct perf_event *,
+ unsigned int))
{
struct perf_output_handle handle;
struct perf_event_header header;
@@ -5543,7 +5547,7 @@ void perf_event_output(struct perf_event *event,
perf_prepare_sample(&header, data, event, regs);
- if (perf_output_begin(&handle, event, header.size))
+ if (output_begin(&handle, event, header.size))
goto exit;
perf_output_sample(&handle, &header, data, event);
@@ -5554,6 +5558,30 @@ exit:
rcu_read_unlock();
}
+void
+perf_event_output_onward(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ __perf_event_output(event, data, regs, perf_output_begin_onward);
+}
+
+void
+perf_event_output_backward(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ __perf_event_output(event, data, regs, perf_output_begin_backward);
+}
+
+void
+perf_event_output(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ __perf_event_output(event, data, regs, perf_output_begin);
+}
+
/*
* read event_id
*/
@@ -7868,8 +7896,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (overflow_handler) {
event->overflow_handler = overflow_handler;
event->overflow_handler_context = context;
+ } else if (is_write_backward(event)){
+ event->overflow_handler = perf_event_output_backward;
+ event->overflow_handler_context = NULL;
} else {
- event->overflow_handler = perf_event_output;
+ event->overflow_handler = perf_event_output_onward;
event->overflow_handler_context = NULL;
}
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 80b1fa7..7e30e012 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -230,6 +230,18 @@ out:
return -ENOSPC;
}
+int perf_output_begin_onward(struct perf_output_handle *handle,
+ struct perf_event *event, unsigned int size)
+{
+ return __perf_output_begin(handle, event, size, false);
+}
+
+int perf_output_begin_backward(struct perf_output_handle *handle,
+ struct perf_event *event, unsigned int size)
+{
+ return __perf_output_begin(handle, event, size, true);
+}
+
int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 18/48] perf tools: Only validate is_pos for tracking evsels
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (16 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 17/48] perf core: Reduce perf event output overhead by new overflow handler Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 14:21 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 19/48] perf tools: Print write_backward value in perf_event_attr__fprintf Wang Nan
` (29 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
is_pos only useful for tracking events (fork, mmap, exit, ...).
Perf collects those events through evsel with 'tracking' set.
Therefore, there's no need to validate every is_pos against
evlist->is_pos.
This patch is required after perf support PERF_SAMPLE_TAILSIZE.
Since there an extra u64 at the end of this type of evsels, is_pos
for evsel with PERF_SAMPLE_TAILSIZE setting is different from other
evsels.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/evlist.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c42e196..fef465a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1274,8 +1274,15 @@ bool perf_evlist__valid_sample_type(struct perf_evlist *evlist)
return false;
evlist__for_each(evlist, pos) {
- if (pos->id_pos != evlist->id_pos ||
- pos->is_pos != evlist->is_pos)
+ if (pos->id_pos != evlist->id_pos)
+ return false;
+ /*
+ * Only tracking events needs is_pos. Those events are
+ * collected if evsel->tracking is selected.
+ * For other evsel, is_pos is useless for other evsels,
+ * so skip validating them.
+ */
+ if (pos->tracking && pos->is_pos != evlist->is_pos)
return false;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 19/48] perf tools: Print write_backward value in perf_event_attr__fprintf
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (17 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 18/48] perf tools: Only validate is_pos for tracking evsels Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 20/48] perf tools: Make ordered_events reusable Wang Nan
` (28 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Print write_backward setting when printing perf evsel.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/evsel.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0902fe4..510afa4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1299,6 +1299,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(comm_exec, p_unsigned);
PRINT_ATTRf(use_clockid, p_unsigned);
PRINT_ATTRf(context_switch, p_unsigned);
+ PRINT_ATTRf(write_backward, p_unsigned);
PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
PRINT_ATTRf(bp_type, p_unsigned);
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 20/48] perf tools: Make ordered_events reusable
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (18 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 19/48] perf tools: Print write_backward value in perf_event_attr__fprintf Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 14:18 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 21/48] perf record: Extract synthesize code to record__synthesize() Wang Nan
` (27 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
ordered_events__free() leaves linked lists and timestamps not cleared,
so unable to be reused after ordered_events__free(). Which is inconvenient
after 'perf record' supports generating multiple perf.data output and
process build-ids for each of them.
Calls ordered_events__init() in ordered_events__free() so ordered_events
can be reused.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/ordered-events.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
index b1b9e23..70c0dc8 100644
--- a/tools/perf/util/ordered-events.c
+++ b/tools/perf/util/ordered-events.c
@@ -299,6 +299,8 @@ void ordered_events__init(struct ordered_events *oe, ordered_events__deliver_t d
void ordered_events__free(struct ordered_events *oe)
{
+ ordered_events__deliver_t old_deliver = oe->deliver;
+
while (!list_empty(&oe->to_free)) {
struct ordered_event *event;
@@ -307,4 +309,7 @@ void ordered_events__free(struct ordered_events *oe)
free_dup_event(oe, event->event);
free(event);
}
+
+ memset(oe, '\0', sizeof(*oe));
+ ordered_events__init(oe, old_deliver);
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 21/48] perf record: Extract synthesize code to record__synthesize()
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (19 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 20/48] perf tools: Make ordered_events reusable Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 14:29 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 22/48] perf tools: Add perf_data_file__switch() helper Wang Nan
` (26 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Create record__synthesize(). It can be used to create tracking events
for each perf.data after perf supporting splitting into multiple
outputs.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 132 +++++++++++++++++++++++++-------------------
1 file changed, 76 insertions(+), 56 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7d11162..4633c0a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -485,6 +485,81 @@ static void workload_exec_failed_signal(int signo __maybe_unused,
static void snapshot_sig_handler(int sig);
+static int record__synthesize(struct record *rec)
+{
+ struct perf_session *session = rec->session;
+ struct machine *machine = &session->machines.host;
+ struct perf_data_file *file = &rec->file;
+ struct record_opts *opts = &rec->opts;
+ struct perf_tool *tool = &rec->tool;
+ int fd = perf_data_file__fd(file);
+ int err = 0;
+ static bool warned_kmaps = false, warned_modules = false;
+
+ if (file->is_pipe) {
+ err = perf_event__synthesize_attrs(tool, session,
+ process_synthesized_event);
+ if (err < 0) {
+ pr_err("Couldn't synthesize attrs.\n");
+ goto out;
+ }
+
+ if (have_tracepoints(&rec->evlist->entries)) {
+ /*
+ * FIXME err <= 0 here actually means that
+ * there were no tracepoints so its not really
+ * an error, just that we don't need to
+ * synthesize anything. We really have to
+ * return this more properly and also
+ * propagate errors that now are calling die()
+ */
+ err = perf_event__synthesize_tracing_data(tool, fd, rec->evlist,
+ process_synthesized_event);
+ if (err <= 0) {
+ pr_err("Couldn't record tracing data.\n");
+ goto out;
+ }
+ rec->bytes_written += err;
+ }
+ }
+
+ if (rec->opts.full_auxtrace) {
+ err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
+ session, process_synthesized_event);
+ if (err)
+ goto out;
+ }
+
+ err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
+ machine);
+ if (err < 0 && !warned_kmaps) {
+ warned_kmaps = true;
+ pr_err("Couldn't record kernel reference relocation symbol\n"
+ "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+ "Check /proc/kallsyms permission or run as root.\n");
+ }
+
+ err = perf_event__synthesize_modules(tool, process_synthesized_event,
+ machine);
+ if (err < 0 && !warned_modules) {
+ warned_modules = true;
+ pr_err("Couldn't record kernel module information.\n"
+ "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+ "Check /proc/modules permission or run as root.\n");
+ }
+
+ if (perf_guest) {
+ machines__process_guests(&session->machines,
+ perf_event__synthesize_guest_os, tool);
+ }
+
+ err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
+ process_synthesized_event, opts->sample_address,
+ opts->proc_map_timeout);
+out:
+ return err;
+}
+
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@@ -579,63 +654,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
machine = &session->machines.host;
- if (file->is_pipe) {
- err = perf_event__synthesize_attrs(tool, session,
- process_synthesized_event);
- if (err < 0) {
- pr_err("Couldn't synthesize attrs.\n");
- goto out_child;
- }
-
- if (have_tracepoints(&rec->evlist->entries)) {
- /*
- * FIXME err <= 0 here actually means that
- * there were no tracepoints so its not really
- * an error, just that we don't need to
- * synthesize anything. We really have to
- * return this more properly and also
- * propagate errors that now are calling die()
- */
- err = perf_event__synthesize_tracing_data(tool, fd, rec->evlist,
- process_synthesized_event);
- if (err <= 0) {
- pr_err("Couldn't record tracing data.\n");
- goto out_child;
- }
- rec->bytes_written += err;
- }
- }
-
- if (rec->opts.full_auxtrace) {
- err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
- session, process_synthesized_event);
- if (err)
- goto out_delete_session;
- }
-
- err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
- machine);
- if (err < 0)
- pr_err("Couldn't record kernel reference relocation symbol\n"
- "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
- "Check /proc/kallsyms permission or run as root.\n");
-
- err = perf_event__synthesize_modules(tool, process_synthesized_event,
- machine);
+ err = record__synthesize(rec);
if (err < 0)
- pr_err("Couldn't record kernel module information.\n"
- "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
- "Check /proc/modules permission or run as root.\n");
-
- if (perf_guest) {
- machines__process_guests(&session->machines,
- perf_event__synthesize_guest_os, tool);
- }
-
- err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
- process_synthesized_event, opts->sample_address,
- opts->proc_map_timeout);
- if (err != 0)
goto out_child;
if (rec->realtime_prio) {
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 22/48] perf tools: Add perf_data_file__switch() helper
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (20 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 21/48] perf record: Extract synthesize code to record__synthesize() Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 14:34 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 23/48] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
` (25 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
perf_data_file__switch() closes current output file, renames it, then
open a new one to continue record. It will be used by perf record
to split output into multiple perf.data files.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/data.c | 36 ++++++++++++++++++++++++++++++++++++
tools/perf/util/data.h | 11 ++++++++++-
2 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 1921942..bfded6a 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -136,3 +136,39 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
{
return writen(file->fd, buf, size);
}
+
+int perf_data_file__switch(struct perf_data_file *file,
+ const char *postfix,
+ size_t pos, bool at_exit)
+{
+ char *new_filepath;
+ int ret;
+
+ if (check_pipe(file))
+ return -EINVAL;
+ if (perf_data_file__is_read(file))
+ return -EINVAL;
+
+ if (asprintf(&new_filepath, "%s.%s", file->path, postfix) < 0)
+ return -ENOMEM;
+
+ rename(file->path, new_filepath);
+
+ if (!at_exit) {
+ close(file->fd);
+ ret = perf_data_file__open(file);
+ if (ret < 0)
+ goto out;
+
+ if (lseek(file->fd, pos, SEEK_SET) == (off_t)-1) {
+ ret = -errno;
+ pr_debug("Failed to lseek to %zu: %s",
+ pos, strerror(errno));
+ goto out;
+ }
+ }
+ ret = file->fd;
+out:
+ free(new_filepath);
+ return ret;
+}
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 2b15d0c..ae510ce 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -46,5 +46,14 @@ int perf_data_file__open(struct perf_data_file *file);
void perf_data_file__close(struct perf_data_file *file);
ssize_t perf_data_file__write(struct perf_data_file *file,
void *buf, size_t size);
-
+/*
+ * If at_exit is set, only rename current perf.data to
+ * perf.data.<postfix>, continue write on original file.
+ * Set at_exit when flushing the last output.
+ *
+ * Return value is fd of new output.
+ */
+int perf_data_file__switch(struct perf_data_file *file,
+ const char *postfix,
+ size_t pos, bool at_exit);
#endif /* __PERF_DATA_H */
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 23/48] perf record: Turns auxtrace_snapshot_enable into 3 states
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (21 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 22/48] perf tools: Add perf_data_file__switch() helper Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 14:43 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 24/48] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
` (24 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
auxtrace_snapshot_enable has only two states (0/1). Turns it into a
triple states enum so SIGUSR2 handler can safely do other works without
triggering auxtrace snapshot.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 59 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 49 insertions(+), 10 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4633c0a..8caace3 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -123,7 +123,43 @@ out:
static volatile int done;
static volatile int signr = -1;
static volatile int child_finished;
-static volatile int auxtrace_snapshot_enabled;
+
+static volatile enum {
+ AUXTRACE_SNAPSHOT_OFF = -1,
+ AUXTRACE_SNAPSHOT_DISABLED = 0,
+ AUXTRACE_SNAPSHOT_ENABLED = 1,
+} auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_OFF;
+
+static inline void
+auxtrace_snapshot_on(void)
+{
+ auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_DISABLED;
+}
+
+static inline void
+auxtrace_snapshot_enable(void)
+{
+ if (auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_OFF)
+ return;
+ auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_ENABLED;
+}
+
+static inline void
+auxtrace_snapshot_disable(void)
+{
+ if (auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_OFF)
+ return;
+ auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_DISABLED;
+}
+
+static inline bool
+auxtrace_snapshot_is_enabled(void)
+{
+ if (auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_OFF)
+ return false;
+ return auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_ENABLED;
+}
+
static volatile int auxtrace_snapshot_err;
static volatile int auxtrace_record__snapshot_started;
@@ -247,7 +283,7 @@ static void record__read_auxtrace_snapshot(struct record *rec)
} else {
auxtrace_snapshot_err = auxtrace_record__snapshot_finish(rec->itr);
if (!auxtrace_snapshot_err)
- auxtrace_snapshot_enabled = 1;
+ auxtrace_snapshot_enable();
}
}
@@ -580,10 +616,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
signal(SIGCHLD, sig_handler);
signal(SIGINT, sig_handler);
signal(SIGTERM, sig_handler);
- if (rec->opts.auxtrace_snapshot_mode)
+
+ if (rec->opts.auxtrace_snapshot_mode) {
signal(SIGUSR2, snapshot_sig_handler);
- else
+ auxtrace_snapshot_on();
+ } else {
signal(SIGUSR2, SIG_IGN);
+ }
session = perf_session__new(file, false, tool);
if (session == NULL) {
@@ -709,12 +748,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
perf_evlist__enable(rec->evlist);
}
- auxtrace_snapshot_enabled = 1;
+ auxtrace_snapshot_enable();
for (;;) {
unsigned long long hits = rec->samples;
if (record__mmap_read_all(rec) < 0) {
- auxtrace_snapshot_enabled = 0;
+ auxtrace_snapshot_disable();
err = -1;
goto out_child;
}
@@ -752,12 +791,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
* disable events in this case.
*/
if (done && !disabled && !target__none(&opts->target)) {
- auxtrace_snapshot_enabled = 0;
+ auxtrace_snapshot_disable();
perf_evlist__disable(rec->evlist);
disabled = true;
}
}
- auxtrace_snapshot_enabled = 0;
+ auxtrace_snapshot_disable();
if (forks && workload_exec_errno) {
char msg[STRERR_BUFSIZE];
@@ -1325,9 +1364,9 @@ out_symbol_exit:
static void snapshot_sig_handler(int sig __maybe_unused)
{
- if (!auxtrace_snapshot_enabled)
+ if (!auxtrace_snapshot_is_enabled())
return;
- auxtrace_snapshot_enabled = 0;
+ auxtrace_snapshot_disable();
auxtrace_snapshot_err = auxtrace_record__snapshot_start(record.itr);
auxtrace_record__snapshot_started = 1;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 24/48] perf record: Introduce record__finish_output() to finish a perf.data
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (22 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 23/48] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 25/48] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
` (23 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Move code for finalizing 'perf.data' to record__finish_output(). It
will be used by following commits to split output to multiple files.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8caace3..2411c37 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -503,6 +503,29 @@ static void record__init_features(struct record *rec)
perf_header__clear_feat(&session->header, HEADER_STAT);
}
+static void
+record__finish_output(struct record *rec)
+{
+ struct perf_data_file *file = &rec->file;
+ int fd = perf_data_file__fd(file);
+
+ if (file->is_pipe)
+ return;
+
+ rec->session->header.data_size += rec->bytes_written;
+ file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
+
+ if (!rec->no_buildid) {
+ process_buildids(rec);
+
+ if (rec->buildid_all)
+ dsos__hit_all(rec->session);
+ }
+ perf_session__write_header(rec->session, rec->evlist, fd, true);
+
+ return;
+}
+
static volatile int workload_exec_errno;
/*
@@ -830,18 +853,8 @@ out_child:
/* this will be recalculated during process_buildids() */
rec->samples = 0;
- if (!err && !file->is_pipe) {
- rec->session->header.data_size += rec->bytes_written;
- file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
-
- if (!rec->no_buildid) {
- process_buildids(rec);
-
- if (rec->buildid_all)
- dsos__hit_all(rec->session);
- }
- perf_session__write_header(rec->session, rec->evlist, fd, true);
- }
+ if (!err)
+ record__finish_output(rec);
if (!err && !quiet) {
char samples[128];
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 25/48] perf record: Add '--timestamp-filename' option to append timestamp to output filename
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (23 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 24/48] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 26/48] perf record: Split output into multiple files via '--switch-output' Wang Nan
` (22 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
This options append current timestamp to output. For example:
# perf record -a --timestamp-filename
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2015122622265847 ]
[ perf record: Captured and wrote 0.742 MB perf.data (90 samples) ]
# ls
perf.data.201512262226584
After 'perf record' support generating multiple output files, timestamp
would be useful to identify each of them.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 47 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2411c37..e5714b6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -54,6 +54,7 @@ struct record {
bool no_buildid_cache;
bool no_buildid_cache_set;
bool buildid_all;
+ bool timestamp_filename;
unsigned long long samples;
};
@@ -526,6 +527,37 @@ record__finish_output(struct record *rec)
return;
}
+static int
+record__switch_output(struct record *rec, bool at_exit)
+{
+ struct perf_data_file *file = &rec->file;
+ int fd, err;
+
+ /* Same Size: "2015122520103046"*/
+ char timestamp[] = "InvalidTimestamp";
+
+ rec->samples = 0;
+ record__finish_output(rec);
+ err = fetch_current_timestamp(timestamp, sizeof(timestamp));
+ if (err) {
+ pr_err("Failed to get current timestamp\n");
+ return -EINVAL;
+ }
+
+ fd = perf_data_file__switch(file, timestamp,
+ rec->session->header.data_offset,
+ at_exit);
+ if (fd >= 0 && !at_exit) {
+ rec->bytes_written = 0;
+ rec->session->header.data_size = 0;
+ }
+
+ if (!quiet)
+ fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
+ file->path, timestamp);
+ return fd;
+}
+
static volatile int workload_exec_errno;
/*
@@ -853,8 +885,17 @@ out_child:
/* this will be recalculated during process_buildids() */
rec->samples = 0;
- if (!err)
- record__finish_output(rec);
+ if (!err) {
+ if (!rec->timestamp_filename) {
+ record__finish_output(rec);
+ } else {
+ fd = record__switch_output(rec, true);
+ if (fd < 0) {
+ status = fd;
+ goto out_delete_session;
+ }
+ }
+ }
if (!err && !quiet) {
char samples[128];
@@ -1237,6 +1278,8 @@ struct option __record_options[] = {
"file", "vmlinux pathname"),
OPT_BOOLEAN(0, "buildid-all", &record.buildid_all,
"Record build-id of all DSOs regardless of hits"),
+ OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
+ "append timestamp to output filename"),
OPT_END()
};
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 26/48] perf record: Split output into multiple files via '--switch-output'
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (24 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 25/48] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 27/48] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
` (21 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Allow 'perf record' splits its output into multiple files.
For example:
# ~/perf record -a --timestamp-filename --switch-output &
[1] 10763
# kill -s SIGUSR2 10763
[ perf record: dump data: Woken up 1 times ]
# [ perf record: Dump perf.data.2015122622314468 ]
# kill -s SIGUSR2 10763
[ perf record: dump data: Woken up 1 times ]
# [ perf record: Dump perf.data.2015122622314762 ]
# kill -s SIGUSR2 10763
[ perf record: dump data: Woken up 1 times ]
#[ perf record: Dump perf.data.2015122622315171 ]
# fg
perf record -a --timestamp-filename --switch-output
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2015122622315513 ]
[ perf record: Captured and wrote 0.014 MB perf.data (296 samples) ]
# ls -l
total 920
-rw------- 1 root root 797692 Dec 26 22:31 perf.data.2015122622314468
-rw------- 1 root root 59960 Dec 26 22:31 perf.data.2015122622314762
-rw------- 1 root root 59912 Dec 26 22:31 perf.data.2015122622315171
-rw------- 1 root root 19220 Dec 26 22:31 perf.data.2015122622315513
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 34 ++++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e5714b6..15f9576 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -55,6 +55,7 @@ struct record {
bool no_buildid_cache_set;
bool buildid_all;
bool timestamp_filename;
+ bool switch_output;
unsigned long long samples;
};
@@ -163,6 +164,7 @@ auxtrace_snapshot_is_enabled(void)
static volatile int auxtrace_snapshot_err;
static volatile int auxtrace_record__snapshot_started;
+static volatile int switch_output_started;
static void sig_handler(int sig)
{
@@ -672,7 +674,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
signal(SIGINT, sig_handler);
signal(SIGTERM, sig_handler);
- if (rec->opts.auxtrace_snapshot_mode) {
+ if (rec->opts.auxtrace_snapshot_mode || rec->switch_output) {
signal(SIGUSR2, snapshot_sig_handler);
auxtrace_snapshot_on();
} else {
@@ -824,9 +826,25 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
}
}
+ if (switch_output_started) {
+ switch_output_started = 0;
+
+ if (!quiet)
+ fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
+ waking);
+ waking = 0;
+ fd = record__switch_output(rec, false);
+ if (fd < 0) {
+ pr_err("Failed to switch to new file\n");
+ err = fd;
+ goto out_child;
+ }
+ }
+
if (hits == rec->samples) {
if (done || draining)
break;
+
err = perf_evlist__poll(rec->evlist, -1);
/*
* Propagate error, only if there's any. Ignore positive
@@ -1280,6 +1298,8 @@ struct option __record_options[] = {
"Record build-id of all DSOs regardless of hits"),
OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
"append timestamp to output filename"),
+ OPT_BOOLEAN(0, "switch-output", &record.switch_output,
+ "Switch output when receive SIGUSR2"),
OPT_END()
};
@@ -1420,9 +1440,11 @@ out_symbol_exit:
static void snapshot_sig_handler(int sig __maybe_unused)
{
- if (!auxtrace_snapshot_is_enabled())
- return;
- auxtrace_snapshot_disable();
- auxtrace_snapshot_err = auxtrace_record__snapshot_start(record.itr);
- auxtrace_record__snapshot_started = 1;
+ if (auxtrace_snapshot_is_enabled()) {
+ auxtrace_snapshot_disable();
+ auxtrace_snapshot_err = auxtrace_record__snapshot_start(record.itr);
+ auxtrace_record__snapshot_started = 1;
+ }
+
+ switch_output_started = 1;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 27/48] perf record: Force enable --timestamp-filename when --switch-output is provided
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (25 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 26/48] perf record: Split output into multiple files via '--switch-output' Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 28/48] perf record: Disable buildid cache options by default in switch output mode Wang Nan
` (20 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Without this patch, the last output doesn't have timestamp appended if
--timestamp-filename is not explicitly provided. For example:
# perf record -a --switch-output &
[1] 11224
# kill -s SIGUSR2 11224
[ perf record: dump data: Woken up 1 times ]
# [ perf record: Dump perf.data.2015122622372823 ]
# fg
perf record -a --switch-output
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.027 MB perf.data (540 samples) ]
# ls -l
total 836
-rw------- 1 root root 33256 Dec 26 22:37 perf.data <---- *Odd*
-rw------- 1 root root 817156 Dec 26 22:37 perf.data.2015122622372823
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 15f9576..8987ce8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1355,6 +1355,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
return -EINVAL;
}
+ if (rec->switch_output)
+ rec->timestamp_filename = true;
+
if (!rec->itr) {
rec->itr = auxtrace_record__init(rec->evlist, &err);
if (err)
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 28/48] perf record: Disable buildid cache options by default in switch output mode
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (26 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 27/48] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 29/48] perf record: Re-synthesize tracking events after output switching Wang Nan
` (19 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Cost of buildid cache processing is high: read all events in output
perf.data, open elf files to read buildid then copy them into
~/.debug directory. In switch output mode, causes perf stop receiving
from perf events for too long.
Enable no-buildid and no-buildid-cache by default if --switch-output
is provided. Still allow user use --no-no-buildid to explicitly enable
buildid in this case.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8987ce8..2839715 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1383,8 +1383,36 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
"If some relocation was applied (e.g. kexec) symbols may be misresolved\n"
"even with a suitable vmlinux or kallsyms file.\n\n");
- if (rec->no_buildid_cache || rec->no_buildid)
+ if (rec->no_buildid_cache || rec->no_buildid) {
disable_buildid_cache();
+ } else if (rec->switch_output) {
+ /*
+ * In 'perf record --switch-output', disable buildid
+ * generation by default to reduce data file switching
+ * overhead. Still generate buildid if they are required
+ * explicitly using
+ *
+ * perf record --signal-trigger --no-no-buildid \
+ * --no-no-buildid-cache
+ *
+ * Following code equals to:
+ *
+ * if ((rec->no_buildid || !rec->no_buildid_set) &&
+ * (rec->no_buildid_cache || !rec->no_buildid_cache_set))
+ * disable_buildid_cache();
+ */
+ bool disable = true;
+
+ if (rec->no_buildid_set && !rec->no_buildid)
+ disable = false;
+ if (rec->no_buildid_cache_set && !rec->no_buildid_cache)
+ disable = false;
+ if (disable) {
+ rec->no_buildid = true;
+ rec->no_buildid_cache = true;
+ disable_buildid_cache();
+ }
+ }
if (rec->evlist->nr_entries == 0 &&
perf_evlist__add_default(rec->evlist) < 0) {
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 29/48] perf record: Re-synthesize tracking events after output switching
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (27 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 28/48] perf record: Disable buildid cache options by default in switch output mode Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 14:57 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 30/48] perf record: Generate tracking events for process forked by perf Wang Nan
` (18 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Tracking events describe kernel and threads. They are generated by
reading /proc/kallsyms, /proc/*/maps and /proc/*/task/* during
initialization of 'perf record', serialized into event sequences and put
at the head of 'perf.data'. In case of output switching, each output
file should contain those events.
This patch calls record__synthesize() during output switching, so the
event sequences described above can be collected again.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2839715..3a11102 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -529,6 +529,8 @@ record__finish_output(struct record *rec)
return;
}
+static int record__synthesize(struct record *rec);
+
static int
record__switch_output(struct record *rec, bool at_exit)
{
@@ -557,6 +559,15 @@ record__switch_output(struct record *rec, bool at_exit)
if (!quiet)
fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
file->path, timestamp);
+
+ /* Reinit machine */
+ if (!at_exit) {
+ machines__exit(&rec->session->machines);
+ machines__init(&rec->session->machines);
+ perf_session__create_kernel_maps(rec->session);
+ perf_session__set_id_hdr_size(rec->session);
+ record__synthesize(rec);
+ }
return fd;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 30/48] perf record: Generate tracking events for process forked by perf
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (28 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 29/48] perf record: Re-synthesize tracking events after output switching Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-24 15:01 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 31/48] perf record: Ensure return non-zero rc when mmap fail Wang Nan
` (17 subsequent siblings)
47 siblings, 1 reply; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
With 'perf record --switch-output' without -a, record__synthesize() in
record__switch_output() won't generate tracking events because there's
no thread_map in evlist. Which causes newly created perf.data doesn't
contain map and comm information.
This patch creates a fake thread_map and directly call
perf_event__synthesize_thread_map() for those events.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3a11102..7d4d8bf 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -567,6 +567,23 @@ record__switch_output(struct record *rec, bool at_exit)
perf_session__create_kernel_maps(rec->session);
perf_session__set_id_hdr_size(rec->session);
record__synthesize(rec);
+
+ if (target__none(&rec->opts.target)) {
+ struct {
+ struct thread_map map;
+ struct thread_map_data map_data;
+ } thread_map;
+
+ thread_map.map.nr = 1;
+ thread_map.map.map[0].pid = rec->evlist->workload.pid;
+ thread_map.map.map[0].comm = NULL;
+ perf_event__synthesize_thread_map(&rec->tool,
+ &thread_map.map,
+ process_synthesized_event,
+ &rec->session->machines.host,
+ rec->opts.sample_address,
+ rec->opts.proc_map_timeout);
+ }
}
return fd;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 31/48] perf record: Ensure return non-zero rc when mmap fail
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (29 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 30/48] perf record: Generate tracking events for process forked by perf Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:10 ` [PATCH 32/48] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
` (16 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
perf_evlist__mmap_ex() can fail without setting errno (for example,
fail in condition checking. In this case all syscall is success).
If this happen, record__open() incorrectly returns 0. Force setting
rc is a quick way to avoid this problem, or we have to follow all
possible code path in perf_evlist__mmap_ex() to make sure there's
at least one system call before returning an error.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7d4d8bf..310e290 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -362,7 +362,10 @@ try_again:
} else {
pr_err("failed to mmap with %d (%s)\n", errno,
strerror_r(errno, msg, sizeof(msg)));
- rc = -errno;
+ if (errno)
+ rc = -errno;
+ else
+ rc = -EINVAL;
}
goto out;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 32/48] perf record: Prevent reading invalid data in record__mmap_read
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (30 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 31/48] perf record: Ensure return non-zero rc when mmap fail Wang Nan
@ 2016-02-22 9:10 ` Wang Nan
2016-02-22 9:11 ` [PATCH 33/48] perf tools: Add evlist channel helpers Wang Nan
` (15 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
When record__mmap_read() requires data more than the size of ring
buffer, drop those data to avoid accessing invalid memory.
This can happen when reading from overwritable ring buffer, which
should be avoided. However, check this for robustness.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 310e290..3a7de24 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -37,6 +37,7 @@
#include <unistd.h>
#include <sched.h>
#include <sys/mman.h>
+#include <asm/bug.h>
struct record {
@@ -95,6 +96,13 @@ static int record__mmap_read(struct record *rec, int idx)
rec->samples++;
size = head - old;
+ if (size > (unsigned long)(md->mask) + 1) {
+ WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
+
+ md->prev = head;
+ perf_evlist__mmap_consume(rec->evlist, idx);
+ return 0;
+ }
if ((old & md->mask) + size != (head & md->mask)) {
buf = &data[old & md->mask];
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 33/48] perf tools: Add evlist channel helpers
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (31 preceding siblings ...)
2016-02-22 9:10 ` [PATCH 32/48] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 34/48] perf tools: Automatically add new channel according to evlist Wang Nan
` (14 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
In this commit sereval helpers are introduced to support the principle
of channel. Channels hold different groups of evsels which configured
differently. It will be used for overwritable evsels, which allows perf
record some events continuously while capture snapshot for other events
when something happen. Tracking events (mmap, mmap2, fork, exit ...)
are another possible events worth to be put into a separated channel.
Channels are represented by an array with channel flags. Each channel
contains evlist->nr_mmaps mmaps. Channels are configured before
perf_evlist__mmap_ex(). During that function nr_mmaps mmaps for each
channel are allocated together as a big array.
perf_evlist__channel_idx() converts index in the big array and the
channel number. For API functions which accept idx, _ex() versions are
introduced to accept selecting an mmap from a channel.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 6 ++
tools/perf/util/evlist.c | 132 ++++++++++++++++++++++++++++++++++++++++++--
tools/perf/util/evlist.h | 58 +++++++++++++++++++
3 files changed, 190 insertions(+), 6 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3a7de24..24c776c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -356,6 +356,12 @@ try_again:
goto out;
}
+ perf_evlist__channel_reset(evlist);
+ rc = perf_evlist__channel_add(evlist, 0, true);
+ if (rc < 0)
+ goto out;
+ rc = 0;
+
if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
opts->auxtrace_mmap_pages,
opts->auxtrace_snapshot_mode) < 0) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index fef465a..a6b52fc 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -679,14 +679,51 @@ static struct perf_evsel *perf_evlist__event2evsel(struct perf_evlist *evlist,
return NULL;
}
-union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
+int perf_evlist__channel_idx(struct perf_evlist *evlist,
+ int *p_channel, int *p_idx)
+{
+ int channel = *p_channel;
+ int _idx = *p_idx;
+
+ if (_idx < 0)
+ return -EINVAL;
+ /*
+ * Negative channel means caller explicitly use real index.
+ */
+ if (channel < 0) {
+ channel = perf_evlist__idx_channel(evlist, _idx);
+ _idx = _idx % evlist->nr_mmaps;
+ }
+ if (channel < 0)
+ return channel;
+ if (channel >= PERF_EVLIST__NR_CHANNELS)
+ return -E2BIG;
+ if (_idx >= evlist->nr_mmaps)
+ return -E2BIG;
+
+ *p_channel = channel;
+ *p_idx = evlist->nr_mmaps * channel + _idx;
+ return 0;
+}
+
+union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
+ int channel, int idx)
{
+ int err = perf_evlist__channel_idx(evlist, &channel, &idx);
struct perf_mmap *md = &evlist->mmap[idx];
u64 head;
- u64 old = md->prev;
- unsigned char *data = md->base + page_size;
+ u64 old;
+ unsigned char *data;
union perf_event *event = NULL;
+ if (err || !perf_evlist__channel_is_enabled(evlist, channel)) {
+ pr_err("ERROR: invalid mmap index: channel %d, idx: %d\n",
+ channel, idx);
+ return NULL;
+ }
+ old = md->prev;
+ data = md->base + page_size;
+
/*
* Check if event was unmapped due to a POLLHUP/POLLERR.
*/
@@ -748,6 +785,11 @@ union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
return event;
}
+union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
+{
+ return perf_evlist__mmap_read_ex(evlist, -1, idx);
+}
+
static bool perf_mmap__empty(struct perf_mmap *md)
{
return perf_mmap__read_head(md) == md->prev && !md->auxtrace_mmap.base;
@@ -766,10 +808,18 @@ static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
__perf_evlist__munmap(evlist, idx);
}
-void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
+void perf_evlist__mmap_consume_ex(struct perf_evlist *evlist,
+ int channel, int idx)
{
+ int err = perf_evlist__channel_idx(evlist, &channel, &idx);
struct perf_mmap *md = &evlist->mmap[idx];
+ if (err || !perf_evlist__channel_is_enabled(evlist, channel)) {
+ pr_err("ERROR: invalid mmap index: channel %d, idx: %d\n",
+ channel, idx);
+ return;
+ }
+
if (!evlist->overwrite) {
u64 old = md->prev;
@@ -780,6 +830,11 @@ void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
perf_evlist__mmap_put(evlist, idx);
}
+void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
+{
+ perf_evlist__mmap_consume_ex(evlist, -1, idx);
+}
+
int __weak auxtrace_mmap__mmap(struct auxtrace_mmap *mm __maybe_unused,
struct auxtrace_mmap_params *mp __maybe_unused,
void *userpg __maybe_unused,
@@ -825,7 +880,7 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
if (evlist->mmap == NULL)
return;
- for (i = 0; i < evlist->nr_mmaps; i++)
+ for (i = 0; i < perf_evlist__mmap_nr(evlist); i++)
__perf_evlist__munmap(evlist, i);
zfree(&evlist->mmap);
@@ -833,10 +888,17 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
{
+ int total_mmaps;
+
evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
if (cpu_map__empty(evlist->cpus))
evlist->nr_mmaps = thread_map__nr(evlist->threads);
- evlist->mmap = zalloc(evlist->nr_mmaps * sizeof(struct perf_mmap));
+
+ total_mmaps = perf_evlist__mmap_nr(evlist);
+ if (!total_mmaps)
+ return -EINVAL;
+
+ evlist->mmap = zalloc(total_mmaps * sizeof(struct perf_mmap));
return evlist->mmap != NULL ? 0 : -ENOMEM;
}
@@ -1137,6 +1199,12 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
bool overwrite)
{
+ int err;
+
+ perf_evlist__channel_reset(evlist);
+ err = perf_evlist__channel_add(evlist, 0, true);
+ if (err < 0)
+ return err;
return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
}
@@ -1764,3 +1832,55 @@ perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
return NULL;
}
+
+int perf_evlist__channel_nr(struct perf_evlist *evlist)
+{
+ int i;
+
+ for (i = PERF_EVLIST__NR_CHANNELS - 1; i >= 0; i--) {
+ unsigned long flags = evlist->channel_flags[i];
+
+ if (flags & PERF_EVLIST__CHANNEL_ENABLED)
+ return i + 1;
+ }
+ return 0;
+}
+
+int perf_evlist__mmap_nr(struct perf_evlist *evlist)
+{
+ return evlist->nr_mmaps * perf_evlist__channel_nr(evlist);
+}
+
+void perf_evlist__channel_reset(struct perf_evlist *evlist)
+{
+ int i;
+
+ BUG_ON(evlist->mmap);
+
+ for (i = 0; i < PERF_EVLIST__NR_CHANNELS; i++)
+ evlist->channel_flags[i] = 0;
+}
+
+int perf_evlist__channel_add(struct perf_evlist *evlist,
+ unsigned long flag,
+ bool is_default)
+{
+ int n = perf_evlist__channel_nr(evlist);
+ unsigned long *flags = evlist->channel_flags;
+
+ BUG_ON(evlist->mmap);
+
+ if (n >= PERF_EVLIST__NR_CHANNELS) {
+ pr_debug("ERROR: too many channels. Increase PERF_EVLIST__NR_CHANNELS\n");
+ return -ENOSPC;
+ }
+
+ if (is_default) {
+ memmove(&flags[1], &flags[0],
+ sizeof(evlist->channel_flags) -
+ sizeof(evlist->channel_flags[0]));
+ n = 0;
+ }
+ flags[n] = flag | PERF_EVLIST__CHANNEL_ENABLED;
+ return n;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a0d1522..1812652 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -20,6 +20,11 @@ struct record_opts;
#define PERF_EVLIST__HLIST_BITS 8
#define PERF_EVLIST__HLIST_SIZE (1 << PERF_EVLIST__HLIST_BITS)
+#define PERF_EVLIST__NR_CHANNELS 1
+enum perf_evlist_mmap_flag {
+ PERF_EVLIST__CHANNEL_ENABLED = 1,
+};
+
/**
* struct perf_mmap - perf's ring buffer mmap details
*
@@ -52,6 +57,7 @@ struct perf_evlist {
pid_t pid;
} workload;
struct fdarray pollfd;
+ unsigned long channel_flags[PERF_EVLIST__NR_CHANNELS];
struct perf_mmap *mmap;
struct thread_map *threads;
struct cpu_map *cpus;
@@ -116,9 +122,61 @@ struct perf_evsel *perf_evlist__id2evsel_strict(struct perf_evlist *evlist,
struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
+union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
+ int channel, int idx);
union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
+void perf_evlist__mmap_consume_ex(struct perf_evlist *evlist,
+ int channel, int idx);
void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
+int perf_evlist__mmap_nr(struct perf_evlist *evlist);
+
+int perf_evlist__channel_nr(struct perf_evlist *evlist);
+void perf_evlist__channel_reset(struct perf_evlist *evlist);
+int perf_evlist__channel_add(struct perf_evlist *evlist,
+ unsigned long flag,
+ bool is_default);
+
+static inline bool
+__perf_evlist__channel_check(struct perf_evlist *evlist, int channel,
+ enum perf_evlist_mmap_flag bits)
+{
+ if (channel >= PERF_EVLIST__NR_CHANNELS)
+ return false;
+
+ return (evlist->channel_flags[channel] & bits) ? true : false;
+}
+#define perf_evlist__channel_check(e, c, b) \
+ __perf_evlist__channel_check(e, c, PERF_EVLIST__CHANNEL_##b)
+
+static inline bool
+perf_evlist__channel_is_enabled(struct perf_evlist *evlist, int channel)
+{
+ return perf_evlist__channel_check(evlist, channel, ENABLED);
+}
+
+static inline int
+perf_evlist__idx_channel(struct perf_evlist *evlist, int idx)
+{
+ int channel = idx / evlist->nr_mmaps;
+
+ if (channel >= PERF_EVLIST__NR_CHANNELS)
+ return -E2BIG;
+ return channel;
+}
+
+int perf_evlist__channel_idx(struct perf_evlist *evlist,
+ int *p_channel, int *p_idx);
+
+static inline struct perf_mmap *
+perf_evlist__get_mmap(struct perf_evlist *evlist,
+ int channel, int idx)
+{
+ if (perf_evlist__channel_idx(evlist, &channel, &idx))
+ return NULL;
+
+ return &evlist->mmap[idx];
+}
int perf_evlist__open(struct perf_evlist *evlist);
void perf_evlist__close(struct perf_evlist *evlist);
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 34/48] perf tools: Automatically add new channel according to evlist
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (32 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 33/48] perf tools: Add evlist channel helpers Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 35/48] perf tools: Operate multiple channels Wang Nan
` (13 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
perf_evlist__channel_find() can be used to find a proper channel based
on propreties of a evsel. If the channel doesn't exist, it can create
new one for it. After this patch there's no need to create default
channel explicitly.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 5 -----
tools/perf/util/evlist.c | 47 ++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 42 insertions(+), 10 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 24c776c..cf8f67a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -357,11 +357,6 @@ try_again:
}
perf_evlist__channel_reset(evlist);
- rc = perf_evlist__channel_add(evlist, 0, true);
- if (rc < 0)
- goto out;
- rc = 0;
-
if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
opts->auxtrace_mmap_pages,
opts->auxtrace_snapshot_mode) < 0) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index a6b52fc..d94f2c6 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -943,6 +943,43 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
return 0;
}
+static unsigned long
+perf_evlist__channel_for_evsel(struct perf_evsel *evsel __maybe_unused)
+{
+ return 0;
+}
+
+static int
+perf_evlist__channel_find(struct perf_evlist *evlist,
+ struct perf_evsel *evsel,
+ bool add_new)
+{
+ unsigned long flag = perf_evlist__channel_for_evsel(evsel);
+ int i;
+
+ flag |= PERF_EVLIST__CHANNEL_ENABLED;
+ for (i = 0; i < perf_evlist__channel_nr(evlist); i++)
+ if (evlist->channel_flags[i] == flag)
+ return i;
+ if (add_new)
+ return perf_evlist__channel_add(evlist, flag, false);
+ return -ENOENT;
+}
+
+static int
+perf_evlist__channel_complete(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel;
+ int err;
+
+ evlist__for_each(evlist, evsel) {
+ err = perf_evlist__channel_find(evlist, evsel, true);
+ if (err < 0)
+ return err;
+ }
+ return 0;
+}
+
static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
struct mmap_params *mp, int cpu,
int thread, int *output)
@@ -1162,6 +1199,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
bool overwrite, unsigned int auxtrace_pages,
bool auxtrace_overwrite)
{
+ int err;
struct perf_evsel *evsel;
const struct cpu_map *cpus = evlist->cpus;
const struct thread_map *threads = evlist->threads;
@@ -1169,6 +1207,10 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
};
+ err = perf_evlist__channel_complete(evlist);
+ if (err)
+ return err;
+
if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
return -ENOMEM;
@@ -1199,12 +1241,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
bool overwrite)
{
- int err;
-
perf_evlist__channel_reset(evlist);
- err = perf_evlist__channel_add(evlist, 0, true);
- if (err < 0)
- return err;
return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 35/48] perf tools: Operate multiple channels
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (33 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 34/48] perf tools: Automatically add new channel according to evlist Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 36/48] perf tools: Squash overwrite setting into channel Wang Nan
` (12 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Before this patch perf operates on only the first channel. Make perf
mmap and read from multiple channels.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 3 ++-
tools/perf/util/evlist.c | 55 ++++++++++++++++++++++++++++++++++-----------
tools/perf/util/evlist.h | 2 +-
3 files changed, 45 insertions(+), 15 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cf8f67a..a472950 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -466,8 +466,9 @@ static int record__mmap_read_all(struct record *rec)
u64 bytes_written = rec->bytes_written;
int i;
int rc = 0;
+ int total_mmaps = perf_evlist__mmap_nr(rec->evlist);
- for (i = 0; i < rec->evlist->nr_mmaps; i++) {
+ for (i = 0; i < total_mmaps; i++) {
struct auxtrace_mmap *mm = &rec->evlist->mmap[i].auxtrace_mmap;
if (rec->evlist->mmap[i].base) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index d94f2c6..16f061c 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -873,6 +873,21 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
auxtrace_mmap__munmap(&evlist->mmap[idx].auxtrace_mmap);
}
+static void
+__perf_evlist__munmap_channels(struct perf_evlist *evlist, int _idx)
+{
+ int _ch;
+
+ for (_ch = 0; _ch < perf_evlist__channel_nr(evlist); _ch++) {
+ int err, idx = _idx, ch = _ch;
+
+ err = perf_evlist__channel_idx(evlist, &ch, &idx);
+ if (err < 0)
+ continue;
+ __perf_evlist__munmap(evlist, idx);
+ }
+}
+
void perf_evlist__munmap(struct perf_evlist *evlist)
{
int i;
@@ -980,26 +995,38 @@ perf_evlist__channel_complete(struct perf_evlist *evlist)
return 0;
}
-static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
+static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
struct mmap_params *mp, int cpu,
- int thread, int *output)
+ int thread, int *outputs)
{
struct perf_evsel *evsel;
evlist__for_each(evlist, evsel) {
- int fd;
+ int fd, channel, idx, err;
+
+ channel = perf_evlist__channel_find(evlist, evsel, false);
+ if (channel < 0) {
+ pr_err("ERROR: unable to find suitable channel for %s\n",
+ evsel->name);
+ return -1;
+ }
+
+ idx = _idx;
+ err = perf_evlist__channel_idx(evlist, &channel, &idx);
+ if (err < 0)
+ return err;
if (evsel->system_wide && thread)
continue;
fd = FD(evsel, cpu, thread);
- if (*output == -1) {
- *output = fd;
- if (__perf_evlist__mmap(evlist, idx, mp, *output) < 0)
+ if (outputs[channel] == -1) {
+ outputs[channel] = fd;
+ if (__perf_evlist__mmap(evlist, idx, mp, outputs[channel]) < 0)
return -1;
} else {
- if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
+ if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, outputs[channel]) != 0)
return -1;
perf_evlist__mmap_get(evlist, idx);
@@ -1039,14 +1066,15 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
pr_debug2("perf event ring buffer mmapped per cpu\n");
for (cpu = 0; cpu < nr_cpus; cpu++) {
- int output = -1;
+ int outputs[PERF_EVLIST__NR_CHANNELS];
+ memset(outputs, -1, sizeof(outputs));
auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, cpu,
true);
for (thread = 0; thread < nr_threads; thread++) {
if (perf_evlist__mmap_per_evsel(evlist, cpu, mp, cpu,
- thread, &output))
+ thread, outputs))
goto out_unmap;
}
}
@@ -1055,7 +1083,7 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
out_unmap:
for (cpu = 0; cpu < nr_cpus; cpu++)
- __perf_evlist__munmap(evlist, cpu);
+ __perf_evlist__munmap_channels(evlist, cpu);
return -1;
}
@@ -1067,13 +1095,14 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
pr_debug2("perf event ring buffer mmapped per thread\n");
for (thread = 0; thread < nr_threads; thread++) {
- int output = -1;
+ int outputs[PERF_EVLIST__NR_CHANNELS];
+ memset(outputs, -1, sizeof(outputs));
auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, thread,
false);
if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
- &output))
+ outputs))
goto out_unmap;
}
@@ -1081,7 +1110,7 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
out_unmap:
for (thread = 0; thread < nr_threads; thread++)
- __perf_evlist__munmap(evlist, thread);
+ __perf_evlist__munmap_channels(evlist, thread);
return -1;
}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 1812652..b652587 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -20,7 +20,7 @@ struct record_opts;
#define PERF_EVLIST__HLIST_BITS 8
#define PERF_EVLIST__HLIST_SIZE (1 << PERF_EVLIST__HLIST_BITS)
-#define PERF_EVLIST__NR_CHANNELS 1
+#define PERF_EVLIST__NR_CHANNELS 2
enum perf_evlist_mmap_flag {
PERF_EVLIST__CHANNEL_ENABLED = 1,
};
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 36/48] perf tools: Squash overwrite setting into channel
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (34 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 35/48] perf tools: Operate multiple channels Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 37/48] perf record: Don't read from and poll overwrite channel Wang Nan
` (11 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Make 'overwrite' a channel configuration other than a evlist global
option. With this setting an evlist can have two channels, one is
normal channel, another is overwritable channel.
perf_evlist__channel_for_evsel() ensures events with 'overwrite'
configuration inserted to overwritable channel.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 2 +-
tools/perf/util/evlist.c | 42 +++++++++++++++++++++++++++---------------
tools/perf/util/evlist.h | 5 ++---
tools/perf/util/evsel.h | 1 +
4 files changed, 31 insertions(+), 19 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a472950..f5bc5bf 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -357,7 +357,7 @@ try_again:
}
perf_evlist__channel_reset(evlist);
- if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
+ if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
opts->auxtrace_mmap_pages,
opts->auxtrace_snapshot_mode) < 0) {
if (errno == EPERM) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 16f061c..9175c83 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -731,7 +731,7 @@ union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
return NULL;
head = perf_mmap__read_head(md);
- if (evlist->overwrite) {
+ if (perf_evlist__channel_check(evlist, channel, RDONLY)) {
/*
* If we're further behind than half the buffer, there's a chance
* the writer will bite our tail and mess up the samples under us.
@@ -820,7 +820,7 @@ void perf_evlist__mmap_consume_ex(struct perf_evlist *evlist,
return;
}
- if (!evlist->overwrite) {
+ if (!perf_evlist__channel_check(evlist, channel, RDONLY)) {
u64 old = md->prev;
perf_mmap__write_tail(md, old);
@@ -918,7 +918,6 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
}
struct mmap_params {
- int prot;
int mask;
struct auxtrace_mmap_params auxtrace_mp;
};
@@ -926,6 +925,15 @@ struct mmap_params {
static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
struct mmap_params *mp, int fd)
{
+ int channel = perf_evlist__idx_channel(evlist, idx);
+ int prot = PROT_READ;
+
+ if (channel < 0)
+ return -1;
+
+ if (!perf_evlist__channel_check(evlist, channel, RDONLY))
+ prot |= PROT_WRITE;
+
/*
* The last one will be done at perf_evlist__mmap_consume(), so that we
* make sure we don't prevent tools from consuming every last event in
@@ -942,7 +950,7 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
atomic_set(&evlist->mmap[idx].refcnt, 2);
evlist->mmap[idx].prev = 0;
evlist->mmap[idx].mask = mp->mask;
- evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, mp->prot,
+ evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, prot,
MAP_SHARED, fd, 0);
if (evlist->mmap[idx].base == MAP_FAILED) {
pr_debug2("failed to mmap perf event ring buffer, error %d\n",
@@ -959,9 +967,13 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
}
static unsigned long
-perf_evlist__channel_for_evsel(struct perf_evsel *evsel __maybe_unused)
+perf_evlist__channel_for_evsel(struct perf_evsel *evsel)
{
- return 0;
+ unsigned long flag = 0;
+
+ if (evsel->overwrite)
+ flag |= PERF_EVLIST__CHANNEL_RDONLY;
+ return flag;
}
static int
@@ -1211,11 +1223,10 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
* perf_evlist__mmap_ex - Create mmaps to receive events.
* @evlist: list of events
* @pages: map length in pages
- * @overwrite: overwrite older events?
* @auxtrace_pages - auxtrace map length in pages
* @auxtrace_overwrite - overwrite older auxtrace data?
*
- * If @overwrite is %false the user needs to signal event consumption using
+ * For writable channel, the user needs to signal event consumption using
* perf_mmap__write_tail(). Using perf_evlist__mmap_read() does this
* automatically.
*
@@ -1225,16 +1236,13 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
* Return: %0 on success, negative error code otherwise.
*/
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
- bool overwrite, unsigned int auxtrace_pages,
- bool auxtrace_overwrite)
+ unsigned int auxtrace_pages, bool auxtrace_overwrite)
{
int err;
struct perf_evsel *evsel;
const struct cpu_map *cpus = evlist->cpus;
const struct thread_map *threads = evlist->threads;
- struct mmap_params mp = {
- .prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
- };
+ struct mmap_params mp;
err = perf_evlist__channel_complete(evlist);
if (err)
@@ -1246,7 +1254,6 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
return -ENOMEM;
- evlist->overwrite = overwrite;
evlist->mmap_len = perf_evlist__mmap_size(pages);
pr_debug("mmap size %zuB\n", evlist->mmap_len);
mp.mask = evlist->mmap_len - page_size - 1;
@@ -1270,8 +1277,13 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
bool overwrite)
{
+ struct perf_evsel *evsel;
+
perf_evlist__channel_reset(evlist);
- return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
+ evlist__for_each(evlist, evsel)
+ evsel->overwrite = overwrite;
+
+ return perf_evlist__mmap_ex(evlist, pages, 0, false);
}
int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b652587..21a8b85 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -23,6 +23,7 @@ struct record_opts;
#define PERF_EVLIST__NR_CHANNELS 2
enum perf_evlist_mmap_flag {
PERF_EVLIST__CHANNEL_ENABLED = 1,
+ PERF_EVLIST__CHANNEL_RDONLY = 2,
};
/**
@@ -45,7 +46,6 @@ struct perf_evlist {
int nr_entries;
int nr_groups;
int nr_mmaps;
- bool overwrite;
bool enabled;
bool has_user_cpus;
size_t mmap_len;
@@ -203,8 +203,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt,
int unset);
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
- bool overwrite, unsigned int auxtrace_pages,
- bool auxtrace_overwrite);
+ unsigned int auxtrace_pages, bool auxtrace_overwrite);
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
bool overwrite);
void perf_evlist__munmap(struct perf_evlist *evlist);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index efad78f..03c70e5 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -114,6 +114,7 @@ struct perf_evsel {
bool tracking;
bool per_pkg;
bool precise_max;
+ bool overwrite;
/* parse modifier helper */
int exclude_GH;
int nr_members;
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 37/48] perf record: Don't read from and poll overwrite channel
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (35 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 36/48] perf tools: Squash overwrite setting into channel Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 38/48] perf record: Don't poll on " Wang Nan
` (10 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Reading from overwritable ring buffer is unreliable. Introduce
record__mmap_should_read() and prevent reading from overwrite ring
buffer in 'perf record'. The rule in record__mmap_should_read() will
be changed when perf support reading from backward writing ring buffer.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f5bc5bf..b27b3ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -461,6 +461,19 @@ static struct perf_event_header finished_round_event = {
.type = PERF_RECORD_FINISHED_ROUND,
};
+static bool record__mmap_should_read(struct record *rec, int idx)
+{
+ int channel = -1;
+
+ if (!rec->evlist->mmap[idx].base)
+ return false;
+ if (perf_evlist__channel_idx(rec->evlist, &channel, &idx))
+ return false;
+ if (perf_evlist__channel_check(rec->evlist, channel, RDONLY))
+ return false;
+ return true;
+}
+
static int record__mmap_read_all(struct record *rec)
{
u64 bytes_written = rec->bytes_written;
@@ -471,7 +484,7 @@ static int record__mmap_read_all(struct record *rec)
for (i = 0; i < total_mmaps; i++) {
struct auxtrace_mmap *mm = &rec->evlist->mmap[i].auxtrace_mmap;
- if (rec->evlist->mmap[i].base) {
+ if (record__mmap_should_read(rec, i)) {
if (record__mmap_read(rec, i) != 0) {
rc = -1;
goto out;
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 38/48] perf record: Don't poll on overwrite channel
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (36 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 37/48] perf record: Don't read from and poll overwrite channel Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 39/48] perf tools: Detect avalibility of write_backward Wang Nan
` (9 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
There's no need to receive events from overwrite ring buffer. Instead,
perf should make them run background until something happen. This patch
makes events from overwrite ring buffer is ignored except POLLERR and
POLLHUP.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/evlist.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 9175c83..7cf0435 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -461,9 +461,9 @@ int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
return 0;
}
-static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx)
+static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx, short revent)
{
- int pos = fdarray__add(&evlist->pollfd, fd, POLLIN | POLLERR | POLLHUP);
+ int pos = fdarray__add(&evlist->pollfd, fd, revent | POLLERR | POLLHUP);
/*
* Save the idx so that when we filter out fds POLLHUP'ed we can
* close the associated evlist->mmap[] entry.
@@ -479,7 +479,7 @@ static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx
int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
{
- return __perf_evlist__add_pollfd(evlist, fd, -1);
+ return __perf_evlist__add_pollfd(evlist, fd, -1, POLLIN);
}
static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd)
@@ -1007,6 +1007,18 @@ perf_evlist__channel_complete(struct perf_evlist *evlist)
return 0;
}
+static bool
+perf_evlist__should_poll(struct perf_evlist *evlist,
+ struct perf_evsel *evsel,
+ int channel)
+{
+ if (evsel->system_wide)
+ return false;
+ if (perf_evlist__channel_check(evlist, channel, RDONLY))
+ return false;
+ return true;
+}
+
static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
struct mmap_params *mp, int cpu,
int thread, int *outputs)
@@ -1015,6 +1027,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
evlist__for_each(evlist, evsel) {
int fd, channel, idx, err;
+ short revent = POLLIN;
channel = perf_evlist__channel_find(evlist, evsel, false);
if (channel < 0) {
@@ -1044,6 +1057,8 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
perf_evlist__mmap_get(evlist, idx);
}
+ if (!perf_evlist__should_poll(evlist, evsel, channel))
+ revent = 0;
/*
* The system_wide flag causes a selected event to be opened
* always without a pid. Consequently it will never get a
@@ -1052,7 +1067,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
* Therefore don't add it for polling.
*/
if (!evsel->system_wide &&
- __perf_evlist__add_pollfd(evlist, fd, idx) < 0) {
+ __perf_evlist__add_pollfd(evlist, fd, idx, revent) < 0) {
perf_evlist__mmap_put(evlist, idx);
return -1;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 39/48] perf tools: Detect avalibility of write_backward
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (37 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 38/48] perf record: Don't poll on " Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 40/48] perf tools: Enable overwrite settings Wang Nan
` (8 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Detect avalibility of write_backward and save the result into
record_opts. With write_backward the start pointer of a ring
buffer mapped read only can be found reliably.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/perf.h | 1 +
tools/perf/util/record.c | 11 +++++++++++
2 files changed, 12 insertions(+)
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 5381a01..198345e 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -73,6 +73,7 @@ struct record_opts {
bool sample_transaction;
unsigned initial_delay;
bool use_clockid;
+ bool has_write_backward;
clockid_t clockid;
unsigned int proc_map_timeout;
};
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 0467367..d01f155 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -85,6 +85,11 @@ static void perf_probe_comm_exec(struct perf_evsel *evsel)
evsel->attr.comm_exec = 1;
}
+static void perf_probe_write_backward(struct perf_evsel *evsel)
+{
+ evsel->attr.write_backward = 1;
+}
+
static void perf_probe_context_switch(struct perf_evsel *evsel)
{
evsel->attr.context_switch = 1;
@@ -105,6 +110,11 @@ bool perf_can_record_switch_events(void)
return perf_probe_api(perf_probe_context_switch);
}
+static bool perf_can_write_backward(void)
+{
+ return perf_probe_api(perf_probe_write_backward);
+}
+
bool perf_can_record_cpu_wide(void)
{
struct perf_event_attr attr = {
@@ -235,6 +245,7 @@ static int record_opts__config_freq(struct record_opts *opts)
int record_opts__config(struct record_opts *opts)
{
+ opts->has_write_backward = perf_can_write_backward();
return record_opts__config_freq(opts);
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 40/48] perf tools: Enable overwrite settings
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (38 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 39/48] perf tools: Detect avalibility of write_backward Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 41/48] perf tools: Set write_backward attribut bit for overwrite events Wang Nan
` (7 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
This patch allows following config terms and option:
# perf record --overwrite ...
Globally set following events to overwrite;
# perf record --event cycles/overwrite/ ...
# perf record --event cycles/no-overwrite/ ...
Set specific events to be overwrite or no-overwrite.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 1 +
tools/perf/perf.h | 1 +
tools/perf/util/evsel.c | 4 ++++
tools/perf/util/evsel.h | 2 ++
tools/perf/util/parse-events.c | 14 ++++++++++++++
tools/perf/util/parse-events.h | 2 ++
tools/perf/util/parse-events.l | 2 ++
7 files changed, 26 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b27b3ff..d3f0435 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1271,6 +1271,7 @@ struct option __record_options[] = {
OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
&record.opts.no_inherit_set,
"child tasks do not inherit counters"),
+ OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
"number of mmap data pages and AUX area tracing mmap pages",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 198345e..7a65a92 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -60,6 +60,7 @@ struct record_opts {
bool record_switch_events;
bool all_kernel;
bool all_user;
+ bool overwrite;
unsigned int freq;
unsigned int mmap_pages;
unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 510afa4..10dfdd1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -670,6 +670,9 @@ static void apply_config_terms(struct perf_evsel *evsel,
*/
attr->inherit = term->val.inherit ? 1 : 0;
break;
+ case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
+ evsel->overwrite = term->val.overwrite ? 1 : 0;
+ break;
default:
break;
}
@@ -745,6 +748,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
attr->inherit = !opts->no_inherit;
+ evsel->overwrite = opts->overwrite;
perf_evsel__set_sample_bit(evsel, IP);
perf_evsel__set_sample_bit(evsel, TID);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 03c70e5..aa976f9 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -44,6 +44,7 @@ enum {
PERF_EVSEL__CONFIG_TERM_CALLGRAPH,
PERF_EVSEL__CONFIG_TERM_STACK_USER,
PERF_EVSEL__CONFIG_TERM_INHERIT,
+ PERF_EVSEL__CONFIG_TERM_OVERWRITE,
PERF_EVSEL__CONFIG_TERM_MAX,
};
@@ -57,6 +58,7 @@ struct perf_evsel_config_term {
char *callgraph;
u64 stack_user;
bool inherit;
+ bool overwrite;
} val;
};
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4c19d5e..707e514 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -992,6 +992,12 @@ do { \
case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
CHECK_TYPE_VAL(NUM);
break;
+ case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+ CHECK_TYPE_VAL(NUM);
+ break;
+ case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+ CHECK_TYPE_VAL(NUM);
+ break;
case PARSE_EVENTS__TERM_TYPE_NAME:
CHECK_TYPE_VAL(STR);
break;
@@ -1040,6 +1046,8 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
case PARSE_EVENTS__TERM_TYPE_STACKSIZE:
case PARSE_EVENTS__TERM_TYPE_INHERIT:
case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
+ case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+ case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
return config_term_common(attr, term, err);
default:
if (err) {
@@ -1109,6 +1117,12 @@ do { \
case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
ADD_CONFIG_TERM(INHERIT, inherit, term->val.num ? 0 : 1);
break;
+ case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+ ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 1 : 0);
+ break;
+ case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+ ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 0 : 1);
+ break;
default:
break;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 67e4930..c7e6e51 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -69,6 +69,8 @@ enum {
PARSE_EVENTS__TERM_TYPE_STACKSIZE,
PARSE_EVENTS__TERM_TYPE_NOINHERIT,
PARSE_EVENTS__TERM_TYPE_INHERIT,
+ PARSE_EVENTS__TERM_TYPE_NOOVERWRITE,
+ PARSE_EVENTS__TERM_TYPE_OVERWRITE,
__PARSE_EVENTS__TERM_TYPE_NR,
};
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 1477fbc..cc4c426 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -201,6 +201,8 @@ call-graph { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CALLGRAPH); }
stack-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_STACKSIZE); }
inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); }
no-inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
+overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
+no-overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 41/48] perf tools: Set write_backward attribut bit for overwrite events
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (39 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 40/48] perf tools: Enable overwrite settings Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 42/48] perf tools: Record fd into perf_mmap Wang Nan
` (6 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
write_backward attribute makes kernel filling ring buffer from the end
of it, makes reading from overwrite ring buffer possible.
This patch select this attribute if evsel->overwrite is selected
explicitly by user.
Overwrite and write_backward are still controled separatly for legacy
readonly mmap users (most of them are in perf/tests).
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 7 +++++++
tools/perf/util/evlist.c | 2 ++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 13 +++++++++++++
4 files changed, 23 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d3f0435..888a8e8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -332,6 +332,13 @@ static int record__open(struct record *rec)
perf_evlist__config(evlist, opts);
evlist__for_each(evlist, pos) {
+ if (pos->overwrite) {
+ if (!pos->attr.write_backward) {
+ ui__warning("Unable to read from overwrite ring buffer\n\n");
+ rc = -ENOSYS;
+ goto out;
+ }
+ }
try_again:
if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7cf0435..36dd305 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -973,6 +973,8 @@ perf_evlist__channel_for_evsel(struct perf_evsel *evsel)
if (evsel->overwrite)
flag |= PERF_EVLIST__CHANNEL_RDONLY;
+ if (evsel->attr.write_backward)
+ flag |= PERF_EVLIST__CHANNEL_BACKWARD;
return flag;
}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 21a8b85..321224c 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -24,6 +24,7 @@ struct record_opts;
enum perf_evlist_mmap_flag {
PERF_EVLIST__CHANNEL_ENABLED = 1,
PERF_EVLIST__CHANNEL_RDONLY = 2,
+ PERF_EVLIST__CHANNEL_BACKWARD = 4,
};
/**
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 10dfdd1..0bbd5ef 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -678,6 +678,19 @@ static void apply_config_terms(struct perf_evsel *evsel,
}
}
+ /*
+ * Set backward after config term processing because it is
+ * possible to set overwrite globally, without config
+ * terms.
+ */
+ if (evsel->overwrite) {
+ if (opts->has_write_backward)
+ attr->write_backward = 1;
+ else
+ pr_err("Reading from overwrite event %s is not supported\n",
+ evsel->name);
+ }
+
/* User explicitly set per-event callgraph, clear the old setting and reset. */
if ((callgraph_buf != NULL) || (dump_size > 0)) {
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 42/48] perf tools: Record fd into perf_mmap
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (40 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 41/48] perf tools: Set write_backward attribut bit for overwrite events Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 43/48] perf tools: Add API to pause a channel Wang Nan
` (5 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Add a fd field into perf_mmap so perf can backtrack the fd from mmap.
This feature will be used to toggle overwrite ring buffers.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/evlist.c | 15 +++++++++++++--
tools/perf/util/evlist.h | 1 +
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 36dd305..bd2393a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -868,6 +868,7 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
if (evlist->mmap[idx].base != NULL) {
munmap(evlist->mmap[idx].base, evlist->mmap_len);
evlist->mmap[idx].base = NULL;
+ evlist->mmap[idx].fd = -1;
atomic_set(&evlist->mmap[idx].refcnt, 0);
}
auxtrace_mmap__munmap(&evlist->mmap[idx].auxtrace_mmap);
@@ -903,7 +904,7 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
{
- int total_mmaps;
+ int total_mmaps, i;
evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
if (cpu_map__empty(evlist->cpus))
@@ -914,7 +915,12 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
return -EINVAL;
evlist->mmap = zalloc(total_mmaps * sizeof(struct perf_mmap));
- return evlist->mmap != NULL ? 0 : -ENOMEM;
+ if (!evlist->mmap)
+ return -ENOMEM;
+
+ for (i = 0; i < total_mmaps; i++)
+ evlist->mmap[i].fd = -1;
+ return 0;
}
struct mmap_params {
@@ -934,6 +940,10 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
if (!perf_evlist__channel_check(evlist, channel, RDONLY))
prot |= PROT_WRITE;
+ if (evlist->mmap[idx].fd >= 0) {
+ pr_err("idx %d already mapped\n", idx);
+ return -1;
+ }
/*
* The last one will be done at perf_evlist__mmap_consume(), so that we
* make sure we don't prevent tools from consuming every last event in
@@ -958,6 +968,7 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
evlist->mmap[idx].base = NULL;
return -1;
}
+ evlist->mmap[idx].fd = fd;
if (auxtrace_mmap__mmap(&evlist->mmap[idx].auxtrace_mmap,
&mp->auxtrace_mp, evlist->mmap[idx].base, fd))
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 321224c..bc6d787 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -35,6 +35,7 @@ enum perf_evlist_mmap_flag {
struct perf_mmap {
void *base;
int mask;
+ int fd;
atomic_t refcnt;
u64 prev;
struct auxtrace_mmap auxtrace_mmap;
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 43/48] perf tools: Add API to pause a channel
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (41 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 42/48] perf tools: Record fd into perf_mmap Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 44/48] perf record: Toggle overwrite ring buffer for reading Wang Nan
` (4 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
perf_evlist__channel_toggle_paused() is introduced to pause/resume a
channel in an evlist. Utilize PERF_EVENT_IOC_PAUSE_OUTPUT ioctl.
Following commits use perf_evlist__channel_toggle_paused() to ensure
overwrite ring buffer is turned off before reading.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/evlist.c | 28 ++++++++++++++++++++++++++++
tools/perf/util/evlist.h | 2 ++
2 files changed, 30 insertions(+)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index bd2393a..06c79c8 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -706,6 +706,34 @@ int perf_evlist__channel_idx(struct perf_evlist *evlist,
return 0;
}
+int perf_evlist__channel_toggle_paused(struct perf_evlist *evlist,
+ int channel, bool pause)
+{
+ int i;
+
+ if (channel >= perf_evlist__channel_nr(evlist))
+ return -E2BIG;
+ if (!evlist->mmap)
+ return -EFAULT;
+ for (i = 0; i < evlist->nr_mmaps; i++) {
+ int n = channel * evlist->nr_mmaps + i;
+ int fd = evlist->mmap[n].fd;
+ int err;
+
+ if (fd < 0)
+ continue;
+ err = ioctl(fd, PERF_EVENT_IOC_PAUSE_OUTPUT,
+ pause ? 1 : 0);
+ if (err) {
+ err = (errno == 0 ? -EINVAL : -errno);
+ pr_err("Unable to pause output on %d: %s\n",
+ fd, strerror(-err));
+ return err;
+ }
+ }
+ return 0;
+}
+
union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
int channel, int idx)
{
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index bc6d787..c1831a9 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -180,6 +180,8 @@ perf_evlist__get_mmap(struct perf_evlist *evlist,
return &evlist->mmap[idx];
}
+int perf_evlist__channel_toggle_paused(struct perf_evlist *evlist,
+ int channel, bool pause);
int perf_evlist__open(struct perf_evlist *evlist);
void perf_evlist__close(struct perf_evlist *evlist);
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 44/48] perf record: Toggle overwrite ring buffer for reading
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (42 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 43/48] perf tools: Add API to pause a channel Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 45/48] perf record: Rename variable to make code clear Wang Nan
` (3 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Reading from a overwrite ring buffer is unrelible.
perf_evlist__channel_toggle_paused() should be called before
reading from them.
Toggel overwrite_evt_paused director after receiving done or switch
output.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 79 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 888a8e8..e39475b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -39,6 +39,11 @@
#include <sys/mman.h>
#include <asm/bug.h>
+enum overwrite_evt_state {
+ OVERWRITE_EVT_RUNNING,
+ OVERWRITE_EVT_DATA_PENDING,
+ OVERWRITE_EVT_EMPTY,
+};
struct record {
struct perf_tool tool;
@@ -57,6 +62,7 @@ struct record {
bool buildid_all;
bool timestamp_filename;
bool switch_output;
+ enum overwrite_evt_state overwrite_evt_state;
unsigned long long samples;
};
@@ -388,6 +394,7 @@ try_again:
session->evlist = evlist;
perf_session__set_id_hdr_size(session);
+ rec->overwrite_evt_state = OVERWRITE_EVT_RUNNING;
out:
return rc;
}
@@ -468,6 +475,52 @@ static struct perf_event_header finished_round_event = {
.type = PERF_RECORD_FINISHED_ROUND,
};
+static void
+record__toggle_overwrite_evsels(struct record *rec,
+ enum overwrite_evt_state state)
+{
+ struct perf_evlist *evlist = rec->evlist;
+ enum overwrite_evt_state old_state = rec->overwrite_evt_state;
+ enum action {
+ NONE,
+ PAUSE,
+ RESUME,
+ } action = NONE;
+ int ch, nr_channels;
+
+ switch (old_state) {
+ case OVERWRITE_EVT_RUNNING:
+ if (state != OVERWRITE_EVT_RUNNING)
+ action = PAUSE;
+ break;
+ case OVERWRITE_EVT_DATA_PENDING:
+ if (state == OVERWRITE_EVT_RUNNING)
+ action = RESUME;
+ break;
+ case OVERWRITE_EVT_EMPTY:
+ if (state == OVERWRITE_EVT_RUNNING)
+ action = RESUME;
+ if (state == OVERWRITE_EVT_DATA_PENDING)
+ state = OVERWRITE_EVT_EMPTY;
+ break;
+ default:
+ WARN_ONCE(1, "Shouldn't get there\n");
+ }
+
+ rec->overwrite_evt_state = state;
+
+ if (action == NONE)
+ return;
+
+ nr_channels = perf_evlist__channel_nr(evlist);
+ for (ch = 0; ch < nr_channels; ch++) {
+ if (!perf_evlist__channel_check(evlist, ch, RDONLY))
+ continue;
+ perf_evlist__channel_toggle_paused(evlist, ch,
+ action == PAUSE);
+ }
+}
+
static bool record__mmap_should_read(struct record *rec, int idx)
{
int channel = -1;
@@ -512,6 +565,8 @@ static int record__mmap_read_all(struct record *rec)
if (bytes_written != rec->bytes_written)
rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
+ if (rec->overwrite_evt_state == OVERWRITE_EVT_DATA_PENDING)
+ record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_EMPTY);
out:
return rc;
}
@@ -870,6 +925,17 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
for (;;) {
unsigned long long hits = rec->samples;
+ /*
+ * rec->overwrite_evt_state is possible to be
+ * OVERWRITE_EVT_EMPTY here: when done == true and
+ * hits != rec->samples after previous reading.
+ *
+ * record__toggle_overwrite_evsels ensure we never
+ * convert OVERWRITE_EVT_EMPTY to OVERWRITE_EVT_DATA_PENDING.
+ */
+ if (switch_output_started || done || draining)
+ record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_DATA_PENDING);
+
if (record__mmap_read_all(rec) < 0) {
auxtrace_snapshot_disable();
err = -1;
@@ -888,7 +954,20 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
}
if (switch_output_started) {
+ /*
+ * SIGUSR2 raise after or during record__mmap_read_all().
+ * continue to read again.
+ */
+ if (rec->overwrite_evt_state == OVERWRITE_EVT_RUNNING)
+ continue;
+
switch_output_started = 0;
+ /*
+ * Reenable events in overwrite ring buffer after
+ * record__mmap_read_all(): we should have collected
+ * data from it.
+ */
+ record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_RUNNING);
if (!quiet)
fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 45/48] perf record: Rename variable to make code clear
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (43 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 44/48] perf record: Toggle overwrite ring buffer for reading Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 46/48] perf record: Read from backward ring buffer Wang Nan
` (2 subsequent siblings)
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
record__mmap_read() write data from ring buffer into perf.data.
'head' is maintained by kernel, points to the last writtend record.
'old' is maintained by perf, points to the record read in previous
round. record__mmap_read() saves data from 'old' to 'head' to
perf.data. The naming of variables are not easy to read. In addition,
when dealing with backward writing ring buffer, the md->prev pointer
should point to 'head' instead of the last byte it got.
Add start and end pointer to make code clear and set md->prev to 'head'
instead of the moved 'old' pointer. This patch doesn't change
behavior since:
buf = &data[old & md->mask];
size = head - old;
old += size; <--- Here, old == head
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e39475b..b1f37f0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -91,17 +91,18 @@ static int record__mmap_read(struct record *rec, int idx)
struct perf_mmap *md = &rec->evlist->mmap[idx];
u64 head = perf_mmap__read_head(md);
u64 old = md->prev;
+ u64 end = head, start = old;
unsigned char *data = md->base + page_size;
unsigned long size;
void *buf;
int rc = 0;
- if (old == head)
+ if (start == end)
return 0;
rec->samples++;
- size = head - old;
+ size = end - start;
if (size > (unsigned long)(md->mask) + 1) {
WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
@@ -110,10 +111,10 @@ static int record__mmap_read(struct record *rec, int idx)
return 0;
}
- if ((old & md->mask) + size != (head & md->mask)) {
- buf = &data[old & md->mask];
- size = md->mask + 1 - (old & md->mask);
- old += size;
+ if ((start & md->mask) + size != (end & md->mask)) {
+ buf = &data[start & md->mask];
+ size = md->mask + 1 - (start & md->mask);
+ start += size;
if (record__write(rec, buf, size) < 0) {
rc = -1;
@@ -121,16 +122,16 @@ static int record__mmap_read(struct record *rec, int idx)
}
}
- buf = &data[old & md->mask];
- size = head - old;
- old += size;
+ buf = &data[start & md->mask];
+ size = end - start;
+ start += size;
if (record__write(rec, buf, size) < 0) {
rc = -1;
goto out;
}
- md->prev = old;
+ md->prev = head;
perf_evlist__mmap_consume(rec->evlist, idx);
out:
return rc;
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 46/48] perf record: Read from backward ring buffer
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (44 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 45/48] perf record: Rename variable to make code clear Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 47/48] perf record: Allow generate tracking events at the end of output Wang Nan
2016-02-22 9:11 ` [PATCH 48/48] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Introduce rb_find_range() to find start and end position from a backward
ring buffer.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 69 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 67 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b1f37f0..82b49ce 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -86,6 +86,61 @@ static int process_synthesized_event(struct perf_tool *tool,
return record__write(rec, event, event->header.size);
}
+static int
+backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
+{
+ struct perf_event_header *pheader;
+ u64 evt_head = head;
+ int size = mask + 1;
+
+ pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
+ pheader = (struct perf_event_header *)(buf + (head & mask));
+ *start = head;
+ while (true) {
+ if (evt_head - head >= (unsigned int)size) {
+ pr_debug("Finshed reading backward ring buffer: rewind\n");
+ if (evt_head - head > (unsigned int)size)
+ evt_head -= pheader->size;
+ *end = evt_head;
+ return 0;
+ }
+
+ pheader = (struct perf_event_header *)(buf + (evt_head & mask));
+
+ if (pheader->size == 0) {
+ pr_debug("Finshed reading backward ring buffer: get start\n");
+ *end = evt_head;
+ return 0;
+ }
+
+ evt_head += pheader->size;
+ pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
+ }
+ WARN_ONCE(1, "Shouldn't get here\n");
+ return -1;
+}
+
+static int
+rb_find_range(struct perf_evlist *evlist, int idx,
+ void *data, int mask, u64 head, u64 old,
+ u64 *start, u64 *end)
+{
+ int channel;
+
+ channel = perf_evlist__idx_channel(evlist, idx);
+ if (!perf_evlist__channel_check(evlist, channel, RDONLY)) {
+ *start = old;
+ *end = head;
+ return 0;
+ }
+
+ if (perf_evlist__channel_check(evlist, channel, BACKWARD))
+ return backward_rb_find_range(data, mask, head, start, end);
+
+ WARN_ONCE(1, "Unable to find start position from a read-only ring buffer\n");
+ return -1;
+}
+
static int record__mmap_read(struct record *rec, int idx)
{
struct perf_mmap *md = &rec->evlist->mmap[idx];
@@ -97,6 +152,10 @@ static int record__mmap_read(struct record *rec, int idx)
void *buf;
int rc = 0;
+ if (rb_find_range(rec->evlist, idx, data, md->mask, head,
+ old, &start, &end))
+ return -1;
+
if (start == end)
return 0;
@@ -530,8 +589,14 @@ static bool record__mmap_should_read(struct record *rec, int idx)
return false;
if (perf_evlist__channel_idx(rec->evlist, &channel, &idx))
return false;
- if (perf_evlist__channel_check(rec->evlist, channel, RDONLY))
- return false;
+ if (perf_evlist__channel_check(rec->evlist, channel, RDONLY)) {
+ if (rec->overwrite_evt_state != OVERWRITE_EVT_DATA_PENDING)
+ return false;
+ if (perf_evlist__channel_check(rec->evlist, channel, BACKWARD))
+ return true;
+ else
+ return false;
+ }
return true;
}
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 47/48] perf record: Allow generate tracking events at the end of output
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (45 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 46/48] perf record: Read from backward ring buffer Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
2016-02-22 9:11 ` [PATCH 48/48] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
Before this patch tracking events are generated based on information in
/proc before all samples. However, with the introducing of overwrite
evsel in perf record, it becomes inconvenience: 'perf record' now can
executed as a daemon for sereval hours and only capture the last
snapshot when it receives SIGUSR2. The tracking events generated at
the head of output 'perf.data' becomes too old, but most of tracking
events during 'perf record' running are dropped.
This patch generates tracking events at the end of output. The output
events series would better reflecting status of system when SIGUSR2
received.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/builtin-record.c | 62 +++++++++++++++++++++++++++++++--------------
1 file changed, 43 insertions(+), 19 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 82b49ce..81e2c3c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -63,6 +63,7 @@ struct record {
bool timestamp_filename;
bool switch_output;
enum overwrite_evt_state overwrite_evt_state;
+ bool tail_tracking;
unsigned long long samples;
};
@@ -685,6 +686,26 @@ record__finish_output(struct record *rec)
static int record__synthesize(struct record *rec);
+static void record__synthesize_target(struct record *rec)
+{
+ if (target__none(&rec->opts.target)) {
+ struct {
+ struct thread_map map;
+ struct thread_map_data map_data;
+ } thread_map;
+
+ thread_map.map.nr = 1;
+ thread_map.map.map[0].pid = rec->evlist->workload.pid;
+ thread_map.map.map[0].comm = NULL;
+ perf_event__synthesize_thread_map(&rec->tool,
+ &thread_map.map,
+ process_synthesized_event,
+ &rec->session->machines.host,
+ rec->opts.sample_address,
+ rec->opts.proc_map_timeout);
+ }
+}
+
static int
record__switch_output(struct record *rec, bool at_exit)
{
@@ -694,6 +715,11 @@ record__switch_output(struct record *rec, bool at_exit)
/* Same Size: "2015122520103046"*/
char timestamp[] = "InvalidTimestamp";
+ if (rec->tail_tracking) {
+ record__synthesize(rec);
+ record__synthesize_target(rec);
+ }
+
rec->samples = 0;
record__finish_output(rec);
err = fetch_current_timestamp(timestamp, sizeof(timestamp));
@@ -720,23 +746,10 @@ record__switch_output(struct record *rec, bool at_exit)
machines__init(&rec->session->machines);
perf_session__create_kernel_maps(rec->session);
perf_session__set_id_hdr_size(rec->session);
- record__synthesize(rec);
- if (target__none(&rec->opts.target)) {
- struct {
- struct thread_map map;
- struct thread_map_data map_data;
- } thread_map;
-
- thread_map.map.nr = 1;
- thread_map.map.map[0].pid = rec->evlist->workload.pid;
- thread_map.map.map[0].comm = NULL;
- perf_event__synthesize_thread_map(&rec->tool,
- &thread_map.map,
- process_synthesized_event,
- &rec->session->machines.host,
- rec->opts.sample_address,
- rec->opts.proc_map_timeout);
+ if (!rec->tail_tracking) {
+ record__synthesize(rec);
+ record__synthesize_target(rec);
}
}
return fd;
@@ -932,9 +945,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
machine = &session->machines.host;
- err = record__synthesize(rec);
- if (err < 0)
- goto out_child;
+ if (!rec->tail_tracking) {
+ err = record__synthesize(rec);
+ if (err < 0)
+ goto out_child;
+ }
if (rec->realtime_prio) {
struct sched_param param;
@@ -1075,6 +1090,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
disabled = true;
}
}
+
+ if (rec->tail_tracking) {
+ err = record__synthesize(rec);
+ if (err < 0)
+ goto out_child;
+ }
+
auxtrace_snapshot_disable();
if (forks && workload_exec_errno) {
@@ -1507,6 +1529,8 @@ struct option __record_options[] = {
"append timestamp to output filename"),
OPT_BOOLEAN(0, "switch-output", &record.switch_output,
"Switch output when receive SIGUSR2"),
+ OPT_BOOLEAN(0, "tail-tracking", &record.tail_tracking,
+ "Generate tracking events at the end of output"),
OPT_END()
};
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 48/48] perf tools: Don't warn about out of order event if write_backward is used
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
` (46 preceding siblings ...)
2016-02-22 9:11 ` [PATCH 47/48] perf record: Allow generate tracking events at the end of output Wang Nan
@ 2016-02-22 9:11 ` Wang Nan
47 siblings, 0 replies; 76+ messages in thread
From: Wang Nan @ 2016-02-22 9:11 UTC (permalink / raw)
To: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg
Cc: Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
Wang Nan, linux-kernel
If write_backward attribute is set, records are written into kernel
ring buffer from end to beginning, but read from beginning to end.
To avoid 'XX out of order events recorded' warning message (timestamps
of records is in reverse order when using write_backward), suppress the
warning message if write_backward is selected by at lease one event.
Result:
Before this patch:
# perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
-e raw_syscalls:sys_enter \
dd if=/dev/zero of=/dev/null count=300
300+0 records in
300+0 records out
153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s
[ perf record: Woken up 5 times to write data ]
Warning:
40 out of order events recorded.
[ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]
After this patch:
# perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
-e raw_syscalls:sys_enter \
dd if=/dev/zero of=/dev/null count=300
300+0 records in
300+0 records out
153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
tools/perf/util/session.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 40b7a0d..132c6ab 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1516,10 +1516,27 @@ int perf_session__register_idle_thread(struct perf_session *session)
return err;
}
+static void
+perf_session__warn_order(const struct perf_session *session)
+{
+ const struct ordered_events *oe = &session->ordered_events;
+ struct perf_evsel *evsel;
+ bool should_warn = true;
+
+ evlist__for_each(session->evlist, evsel) {
+ if (evsel->attr.write_backward)
+ should_warn = false;
+ }
+
+ if (!should_warn)
+ return;
+ if (oe->nr_unordered_events != 0)
+ ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+}
+
static void perf_session__warn_about_errors(const struct perf_session *session)
{
const struct events_stats *stats = &session->evlist->stats;
- const struct ordered_events *oe = &session->ordered_events;
if (session->tool->lost == perf_event__process_lost &&
stats->nr_events[PERF_RECORD_LOST] != 0) {
@@ -1576,8 +1593,7 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
stats->nr_unprocessable_samples);
}
- if (oe->nr_unordered_events != 0)
- ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+ perf_session__warn_order(session);
events_stats__auxtrace_error_warn(stats);
--
1.8.3.4
^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output()
2016-02-22 9:10 ` [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output() Wang Nan
@ 2016-02-23 16:14 ` Arnaldo Carvalho de Melo
2016-02-23 17:23 ` Jiri Olsa
2016-02-23 19:22 ` Jiri Olsa
2 siblings, 0 replies; 76+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-02-23 16:14 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
Em Mon, Feb 22, 2016 at 09:10:38AM +0000, Wang Nan escreveu:
> bpf_perf_event_output() outputs data through sample->raw_data. This
> patch adds support to convert those data into CTF. A python script
> then can be used to process output data from BPF programs.
>
> Test result:
Trying to test this I get:
[acme@jouet linux]$ make O=/tmp/build/perf LIBBABELTRACE=1 -C tools/perf install-bin 2>&1 | grep babel
config/Makefile:663: No libbabeltrace found, disables 'perf data' CTF format support, please install libbabeltrace-dev[el]/libbabeltrace-ctf-dev
[acme@jouet linux]$ rpm -q libbabeltrace-devel
libbabeltrace-devel-1.2.4-2.fc23.x86_64
[acme@jouet linux]$
Guess we better improve this message... Anyway, trying to find libbabeltrace's
git tree to see if with its master branch this works...
- Arnaldo
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output()
2016-02-22 9:10 ` [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output() Wang Nan
2016-02-23 16:14 ` Arnaldo Carvalho de Melo
@ 2016-02-23 17:23 ` Jiri Olsa
2016-02-23 17:24 ` Jiri Olsa
2016-02-23 19:22 ` Jiri Olsa
2 siblings, 1 reply; 76+ messages in thread
From: Jiri Olsa @ 2016-02-23 17:23 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:38AM +0000, Wang Nan wrote:
SNIP
> ---
> tools/perf/util/data-convert-bt.c | 112 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 111 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
> index b722e57..70f462d 100644
> --- a/tools/perf/util/data-convert-bt.c
> +++ b/tools/perf/util/data-convert-bt.c
> @@ -352,6 +352,84 @@ static int add_tracepoint_values(struct ctf_writer *cw,
> return ret;
> }
>
> +static int
> +add_bpf_output_values(struct bt_ctf_event_class *event_class,
> + struct bt_ctf_event *event,
> + struct perf_sample *sample)
> +{
> + struct bt_ctf_field_type *len_type, *seq_type;
> + struct bt_ctf_field *len_field, *seq_field;
> + unsigned int raw_size = sample->raw_size;
> + unsigned int nr_elements = raw_size / sizeof(u32);
> + unsigned int i;
> + int ret;
> +
> + if (nr_elements * sizeof(u32) != raw_size)
could this be IS_ALIGNED(raw_size, u32)
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output()
2016-02-23 17:23 ` Jiri Olsa
@ 2016-02-23 17:24 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-23 17:24 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Tue, Feb 23, 2016 at 06:23:46PM +0100, Jiri Olsa wrote:
> On Mon, Feb 22, 2016 at 09:10:38AM +0000, Wang Nan wrote:
>
> SNIP
>
> > ---
> > tools/perf/util/data-convert-bt.c | 112 +++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 111 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
> > index b722e57..70f462d 100644
> > --- a/tools/perf/util/data-convert-bt.c
> > +++ b/tools/perf/util/data-convert-bt.c
> > @@ -352,6 +352,84 @@ static int add_tracepoint_values(struct ctf_writer *cw,
> > return ret;
> > }
> >
> > +static int
> > +add_bpf_output_values(struct bt_ctf_event_class *event_class,
> > + struct bt_ctf_event *event,
> > + struct perf_sample *sample)
> > +{
> > + struct bt_ctf_field_type *len_type, *seq_type;
> > + struct bt_ctf_field *len_field, *seq_field;
> > + unsigned int raw_size = sample->raw_size;
> > + unsigned int nr_elements = raw_size / sizeof(u32);
> > + unsigned int i;
> > + int ret;
> > +
> > + if (nr_elements * sizeof(u32) != raw_size)
>
> could this be IS_ALIGNED(raw_size, u32)
nah we dont have it.. nevermind ;-)
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-22 9:10 ` [PATCH 10/48] perf tools: Introduce bpf-output event Wang Nan
@ 2016-02-23 17:45 ` Arnaldo Carvalho de Melo
2016-02-24 1:58 ` Wangnan (F)
2016-02-25 5:41 ` [tip:perf/core] " tip-bot for Wang Nan
1 sibling, 1 reply; 76+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-02-23 17:45 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
Em Mon, Feb 22, 2016 at 09:10:37AM +0000, Wang Nan escreveu:
> Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
> bpf_perf_event_output() helper) add a helper to enable BPF program
> output data to perf ring buffer through a new type of perf event
> PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
> event of that type. Now perf user can use following cmdline to
> receive output data from BPF programs:
>
> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
> # ./perf script
> perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
> perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
> perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
> ...
>
> Test result:
> # cat ./test_bpf_output.c
> /************************ BEGIN **************************/
> #include <uapi/linux/bpf.h>
> struct bpf_map_def {
> unsigned int type;
> unsigned int key_size;
> unsigned int value_size;
> unsigned int max_entries;
> };
>
> #define SEC(NAME) __attribute__((section(NAME), used))
> static u64 (*ktime_get_ns)(void) =
> (void *)BPF_FUNC_ktime_get_ns;
> static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
> (void *)BPF_FUNC_trace_printk;
> static int (*get_smp_processor_id)(void) =
> (void *)BPF_FUNC_get_smp_processor_id;
> static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
> (void *)BPF_FUNC_perf_event_output;
>
> struct bpf_map_def SEC("maps") channel = {
> .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> .key_size = sizeof(int),
> .value_size = sizeof(u32),
> .max_entries = __NR_CPUS__,
> };
>
> SEC("func_write=sys_write")
> int func_write(void *ctx)
> {
> struct {
> u64 ktime;
> int cpuid;
> } __attribute__((packed)) output_data;
> char error_data[] = "Error: failed to output: %d\n";
>
> output_data.cpuid = get_smp_processor_id();
> output_data.ktime = ktime_get_ns();
> int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
> &output_data, sizeof(output_data));
> if (err)
> trace_printk(error_data, sizeof(error_data), err);
> return 0;
> }
> char _license[] SEC("license") = "GPL";
> int _version SEC("version") = LINUX_VERSION_CODE;
> /************************ END ***************************/
>
> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
> # ./perf script | grep ls
> ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
> ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...
So, there is something strange here:
if (unlikely(event->oncpu != smp_processor_id()))
return -EOPNOTSUPP;
This is where I am hitting, with:
[acme@jouet linux]$ uname -r
4.5.0-rc4
int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
&output_data, sizeof(output_data));
if (err)
trace_printk(error_data, sizeof(error_data), err);
And then:
[root@jouet bpf]# tail /sys/kernel/debug/tracing/trace
perf-13040 [003] d... 12062.807729: : Error: failed to output: -95
perf-13040 [003] d... 12062.807731: : Error: failed to output: -95
perf-13040 [003] d... 12062.807732: : Error: failed to output: -95
perf-13040 [003] d... 12062.807735: : Error: failed to output: -95
perf-13040 [003] d... 12062.807737: : Error: failed to output: -95
perf-13040 [003] d... 12062.807744: : Error: failed to output: -95
gnome-terminal--3091 [001] d... 12062.807773: : Error: failed to output: -95
gnome-terminal--3091 [001] d... 12062.807784: : Error: failed to output: -95
gmain-2830 [002] d... 12062.811791: : Error: failed to output: -95
gmain-2830 [002] d... 12062.811810: : Error: failed to output: -95
[root@jouet bpf]#
Ideas? AFK for a while, will continue investigating.
This already was submitted to Ingo, BTW.
I used, as in the changeset comment tests:
perf record -a -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output.c/map:channel.event=evt/ ls /
And perf script told me:
[root@jouet bpf]# perf script | tail
perf 13040 [003] 12062.708337: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708339: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708340: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708341: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708343: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708344: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708346: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708347: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708348: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
perf 13040 [003] 12062.708350: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
[root@jouet bpf]#
Wonder where that /lib/modules/4.5.0-rc4/build/vmlinux came from...
[root@jouet bpf]# perf script | cut -d'(' -f2 | sort | uniq -c
1141 /lib/modules/4.5.0-rc4/build/vmlinux)
- Arnaldo
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output()
2016-02-22 9:10 ` [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output() Wang Nan
2016-02-23 16:14 ` Arnaldo Carvalho de Melo
2016-02-23 17:23 ` Jiri Olsa
@ 2016-02-23 19:22 ` Jiri Olsa
2 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-23 19:22 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:38AM +0000, Wang Nan wrote:
SNIP
> usleep 14942 92503.298562: evt: ffffffff810585e9 kretprobe_trampoline_holder (/lib....
>
> # ./perf data convert --to-ctf ./out.ctf
> [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
> [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]
>
> # babeltrace ./out.ctf
> [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
> [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }
>
> # cat ./test_bpf_output_2.py
> from babeltrace import TraceCollection
> tc = TraceCollection()
> tc.add_trace('./out.ctf', 'ctf')
> d = {1:[], 2:[]}
> for event in tc.events:
> if not event.name.startswith('evt'):
> continue
> raw_data = event['raw_data']
> (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
> d[type].append(time)
> print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));
>
> # python3 ./test_bpf_output_2.py
> [100056879]
>
Acked-by: Jiri Olsa <jolsa@kernel.org>
looks good to me.. also note I just compiled, haven't tried
the example above.. too far in bpf land for me ATM ;-)
thanks,
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-23 17:45 ` Arnaldo Carvalho de Melo
@ 2016-02-24 1:58 ` Wangnan (F)
2016-02-24 2:04 ` Wangnan (F)
2016-02-24 13:36 ` Arnaldo Carvalho de Melo
0 siblings, 2 replies; 76+ messages in thread
From: Wangnan (F) @ 2016-02-24 1:58 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On 2016/2/24 1:45, Arnaldo Carvalho de Melo wrote:
> Em Mon, Feb 22, 2016 at 09:10:37AM +0000, Wang Nan escreveu:
>> Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
>> bpf_perf_event_output() helper) add a helper to enable BPF program
>> output data to perf ring buffer through a new type of perf event
>> PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
>> event of that type. Now perf user can use following cmdline to
>> receive output data from BPF programs:
>>
>> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
>> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
>> # ./perf script
>> perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
>> perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
>> perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
>> ...
>>
>> Test result:
>> # cat ./test_bpf_output.c
>> /************************ BEGIN **************************/
>> #include <uapi/linux/bpf.h>
>> struct bpf_map_def {
>> unsigned int type;
>> unsigned int key_size;
>> unsigned int value_size;
>> unsigned int max_entries;
>> };
>>
>> #define SEC(NAME) __attribute__((section(NAME), used))
>> static u64 (*ktime_get_ns)(void) =
>> (void *)BPF_FUNC_ktime_get_ns;
>> static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
>> (void *)BPF_FUNC_trace_printk;
>> static int (*get_smp_processor_id)(void) =
>> (void *)BPF_FUNC_get_smp_processor_id;
>> static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
>> (void *)BPF_FUNC_perf_event_output;
>>
>> struct bpf_map_def SEC("maps") channel = {
>> .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>> .key_size = sizeof(int),
>> .value_size = sizeof(u32),
>> .max_entries = __NR_CPUS__,
>> };
>>
>> SEC("func_write=sys_write")
>> int func_write(void *ctx)
>> {
>> struct {
>> u64 ktime;
>> int cpuid;
>> } __attribute__((packed)) output_data;
>> char error_data[] = "Error: failed to output: %d\n";
>>
>> output_data.cpuid = get_smp_processor_id();
>> output_data.ktime = ktime_get_ns();
>> int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
>> &output_data, sizeof(output_data));
>> if (err)
>> trace_printk(error_data, sizeof(error_data), err);
>> return 0;
>> }
>> char _license[] SEC("license") = "GPL";
>> int _version SEC("version") = LINUX_VERSION_CODE;
>> /************************ END ***************************/
>>
>> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
>> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
>> # ./perf script | grep ls
>> ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
>> ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...
> So, there is something strange here:
>
> if (unlikely(event->oncpu != smp_processor_id()))
> return -EOPNOTSUPP;
>
> This is where I am hitting, with:
>
> [acme@jouet linux]$ uname -r
> 4.5.0-rc4
>
> int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
> &output_data, sizeof(output_data));
> if (err)
> trace_printk(error_data, sizeof(error_data), err);
>
> And then:
>
> [root@jouet bpf]# tail /sys/kernel/debug/tracing/trace
> perf-13040 [003] d... 12062.807729: : Error: failed to output: -95
> perf-13040 [003] d... 12062.807731: : Error: failed to output: -95
> perf-13040 [003] d... 12062.807732: : Error: failed to output: -95
> perf-13040 [003] d... 12062.807735: : Error: failed to output: -95
> perf-13040 [003] d... 12062.807737: : Error: failed to output: -95
> perf-13040 [003] d... 12062.807744: : Error: failed to output: -95
> gnome-terminal--3091 [001] d... 12062.807773: : Error: failed to output: -95
> gnome-terminal--3091 [001] d... 12062.807784: : Error: failed to output: -95
> gmain-2830 [002] d... 12062.811791: : Error: failed to output: -95
> gmain-2830 [002] d... 12062.811810: : Error: failed to output: -95
> [root@jouet bpf]#
>
> Ideas? AFK for a while, will continue investigating.
I also noticed this output, but didn't digg into it because all events
I concerned is okay. I'll look into this today.
> This already was submitted to Ingo, BTW.
>
> I used, as in the changeset comment tests:
>
> perf record -a -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output.c/map:channel.event=evt/ ls /
>
> And perf script told me:
>
> [root@jouet bpf]# perf script | tail
> perf 13040 [003] 12062.708337: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708339: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708340: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708341: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708343: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708344: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708346: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708347: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708348: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> perf 13040 [003] 12062.708350: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> [root@jouet bpf]#
>
> Wonder where that /lib/modules/4.5.0-rc4/build/vmlinux came from...
>
> [root@jouet bpf]# perf script | cut -d'(' -f2 | sort | uniq -c
> 1141 /lib/modules/4.5.0-rc4/build/vmlinux)
It's a standard directory for perf searching vmlinux. Isn't it?
tools/perf/util/symbol.c:
static const char * const vmlinux_paths_upd[] = {
"/boot/vmlinux-%s",
"/usr/lib/debug/boot/vmlinux-%s",
"/lib/modules/%s/build/vmlinux",
"/usr/lib/debug/lib/modules/%s/vmlinux",
"/usr/lib/debug/boot/vmlinux-%s.debug"
};
So what's your problem?
Thank you.
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-24 1:58 ` Wangnan (F)
@ 2016-02-24 2:04 ` Wangnan (F)
2016-02-24 4:03 ` Wangnan (F)
2016-02-24 13:36 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 76+ messages in thread
From: Wangnan (F) @ 2016-02-24 2:04 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On 2016/2/24 9:58, Wangnan (F) wrote:
>
>
> On 2016/2/24 1:45, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Feb 22, 2016 at 09:10:37AM +0000, Wang Nan escreveu:
>>> Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
>>> bpf_perf_event_output() helper) add a helper to enable BPF program
>>> output data to perf ring buffer through a new type of perf event
>>> PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
>>> event of that type. Now perf user can use following cmdline to
>>> receive output data from BPF programs:
>>>
>>> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
>>> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
>>> # ./perf script
>>> perf 1560 [004] 347747.086295:
>>> evt: ffffffff811fd201 sys_write ...
>>> perf 1560 [004] 347747.086300:
>>> evt: ffffffff811fd201 sys_write ...
>>> perf 1560 [004] 347747.086315:
>>> evt: ffffffff811fd201 sys_write ...
>>> ...
>>>
>>> Test result:
>>> # cat ./test_bpf_output.c
>>> /************************ BEGIN **************************/
>>> #include <uapi/linux/bpf.h>
>>> struct bpf_map_def {
>>> unsigned int type;
>>> unsigned int key_size;
>>> unsigned int value_size;
>>> unsigned int max_entries;
>>> };
>>>
>>> #define SEC(NAME) __attribute__((section(NAME), used))
>>> static u64 (*ktime_get_ns)(void) =
>>> (void *)BPF_FUNC_ktime_get_ns;
>>> static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
>>> (void *)BPF_FUNC_trace_printk;
>>> static int (*get_smp_processor_id)(void) =
>>> (void *)BPF_FUNC_get_smp_processor_id;
>>> static int (*perf_event_output)(void *, struct bpf_map_def *, int,
>>> void *, unsigned long) =
>>> (void *)BPF_FUNC_perf_event_output;
>>>
>>> struct bpf_map_def SEC("maps") channel = {
>>> .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>>> .key_size = sizeof(int),
>>> .value_size = sizeof(u32),
>>> .max_entries = __NR_CPUS__,
>>> };
>>>
>>> SEC("func_write=sys_write")
>>> int func_write(void *ctx)
>>> {
>>> struct {
>>> u64 ktime;
>>> int cpuid;
>>> } __attribute__((packed)) output_data;
>>> char error_data[] = "Error: failed to output: %d\n";
>>>
>>> output_data.cpuid = get_smp_processor_id();
>>> output_data.ktime = ktime_get_ns();
>>> int err = perf_event_output(ctx, &channel,
>>> get_smp_processor_id(),
>>> &output_data, sizeof(output_data));
>>> if (err)
>>> trace_printk(error_data, sizeof(error_data), err);
>>> return 0;
>>> }
>>> char _license[] SEC("license") = "GPL";
>>> int _version SEC("version") = LINUX_VERSION_CODE;
>>> /************************ END ***************************/
>>>
>>> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
>>> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
>>> # ./perf script | grep ls
>>> ls 2242 [003] 347851.557563: evt: ffffffff811fd201
>>> sys_write ...
>>> ls 2242 [003] 347851.557571: evt: ffffffff811fd201
>>> sys_write ...
>> So, there is something strange here:
>>
>> if (unlikely(event->oncpu != smp_processor_id()))
>> return -EOPNOTSUPP;
>>
>
All failures have 'event->oncpu == -1' here. I guess we should suppress
warning in
this case. But why event->oncpu becomes -1?
Thank you.
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-24 2:04 ` Wangnan (F)
@ 2016-02-24 4:03 ` Wangnan (F)
2016-02-24 5:03 ` Wangnan (F)
0 siblings, 1 reply; 76+ messages in thread
From: Wangnan (F) @ 2016-02-24 4:03 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Peter Zijlstra
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, pi3orama, linux-kernel
On 2016/2/24 10:04, Wangnan (F) wrote:
>
>
> On 2016/2/24 9:58, Wangnan (F) wrote:
>>
>>
>> On 2016/2/24 1:45, Arnaldo Carvalho de Melo wrote:
>>> Em Mon, Feb 22, 2016 at 09:10:37AM +0000, Wang Nan escreveu:
>>>> Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
>>>> bpf_perf_event_output() helper) add a helper to enable BPF program
>>>> output data to perf ring buffer through a new type of perf event
>>>> PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
>>>> event of that type. Now perf user can use following cmdline to
>>>> receive output data from BPF programs:
>>>>
>>>> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
>>>> -e ./test_bpf_output.c/map:channel.event=evt/
>>>> ls /
>>>> # ./perf script
>>>> perf 1560 [004] 347747.086295:
>>>> evt: ffffffff811fd201 sys_write ...
>>>> perf 1560 [004] 347747.086300:
>>>> evt: ffffffff811fd201 sys_write ...
>>>> perf 1560 [004] 347747.086315:
>>>> evt: ffffffff811fd201 sys_write ...
>>>> ...
>>>>
>>>> Test result:
>>>> # cat ./test_bpf_output.c
>>>> /************************ BEGIN **************************/
>>>> #include <uapi/linux/bpf.h>
>>>> struct bpf_map_def {
>>>> unsigned int type;
>>>> unsigned int key_size;
>>>> unsigned int value_size;
>>>> unsigned int max_entries;
>>>> };
>>>>
>>>> #define SEC(NAME) __attribute__((section(NAME), used))
>>>> static u64 (*ktime_get_ns)(void) =
>>>> (void *)BPF_FUNC_ktime_get_ns;
>>>> static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
>>>> (void *)BPF_FUNC_trace_printk;
>>>> static int (*get_smp_processor_id)(void) =
>>>> (void *)BPF_FUNC_get_smp_processor_id;
>>>> static int (*perf_event_output)(void *, struct bpf_map_def *,
>>>> int, void *, unsigned long) =
>>>> (void *)BPF_FUNC_perf_event_output;
>>>>
>>>> struct bpf_map_def SEC("maps") channel = {
>>>> .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>>>> .key_size = sizeof(int),
>>>> .value_size = sizeof(u32),
>>>> .max_entries = __NR_CPUS__,
>>>> };
>>>>
>>>> SEC("func_write=sys_write")
>>>> int func_write(void *ctx)
>>>> {
>>>> struct {
>>>> u64 ktime;
>>>> int cpuid;
>>>> } __attribute__((packed)) output_data;
>>>> char error_data[] = "Error: failed to output: %d\n";
>>>>
>>>> output_data.cpuid = get_smp_processor_id();
>>>> output_data.ktime = ktime_get_ns(); supr
>>>> int err = perf_event_output(ctx, &channel,
>>>> get_smp_processor_id(),
>>>> &output_data, sizeof(output_data));
>>>> if (err)
>>>> trace_printk(error_data, sizeof(error_data), err);
>>>> return 0;
>>>> }
>>>> char _license[] SEC("license") = "GPL";
>>>> int _version SEC("version") = LINUX_VERSION_CODE;
>>>> /************************ END ***************************/
>>>>
>>>> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
>>>> -e ./test_bpf_output.c/map:channel.event=evt/
>>>> ls /
>>>> # ./perf script | grep ls
>>>> ls 2242 [003] 347851.557563: evt: ffffffff811fd201
>>>> sys_write ...
>>>> ls 2242 [003] 347851.557571: evt: ffffffff811fd201
>>>> sys_write ...
>>> So, there is something strange here:
>>>
>>> if (unlikely(event->oncpu != smp_processor_id()))
>>> return -EOPNOTSUPP;
>>>
>>
>
> All failures have 'event->oncpu == -1' here. I guess we should
> suppress warning in
> this case. But why event->oncpu becomes -1?
>
For this specific test it is not surprising to see these error messages.
In this test
we create bpf-output channel on 'ls' process only, but the BPF script is
triggered
on all procs (BPF triggering is not related to perf event scheduling).
Trying to
output data through 'ls' specific bpf-output channel should fail if this
'sys_write'
is not issued by 'ls' or its children. So it is a correct behavior.
However, I also see them in system wide channel:
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/
-a
^C[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 17.534 MB perf.data (264326 samples) ]
# cat /sys/kernel/debug/tracing/trace | tail
rs:main Q:Reg-582 [000] d..2 4858.711225: : Error: failed to
output: -95
rs:main Q:Reg-582 [000] d..2 4858.711241: : Error: failed to
output: -95
gmain-1858 [003] d..2 4858.711436: : Error: failed to
output: -95
gmain-1858 [003] d..2 4858.711441: : Error: failed to
output: -95
gmain-1858 [003] d..2 4858.711473: : Error: failed to
output: -95
rs:main Q:Reg-582 [002] d..2 4858.712215: : Error: failed to
output: -95
rs:main Q:Reg-582 [002] d..2 4858.712224: : Error: failed to
output: -95
gmain-1858 [003] d..2 4858.712230: : Error: failed to
output: -95
rs:main Q:Reg-582 [002] d..2 4858.712235: : Error: failed to
output: -95
rs:main Q:Reg-582 [002] d..2 4858.712239: : Error: failed to
output: -95
System wide events can also be scheduled in and out. If the bpf-output
events
are scheduled out, trying to output data through it causes the above
failure.
I don't think it is a problem.
Peter, Could you please give some infomation? In which case a system wide
bpf output channel would be scheduled out?
Thank you.
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-24 4:03 ` Wangnan (F)
@ 2016-02-24 5:03 ` Wangnan (F)
0 siblings, 0 replies; 76+ messages in thread
From: Wangnan (F) @ 2016-02-24 5:03 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Peter Zijlstra
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, pi3orama, linux-kernel
On 2016/2/24 12:03, Wangnan (F) wrote:
>
>
> On 2016/2/24 10:04, Wangnan (F) wrote:
>>
>>
>> On 2016/2/24 9:58, Wangnan (F) wrote:
>>>
[SNIP]
>>> So, there is something strange here:
>>>>
>>>> if (unlikely(event->oncpu != smp_processor_id()))
>>>> return -EOPNOTSUPP;
>>>>
>>>
>>
>> All failures have 'event->oncpu == -1' here. I guess we should
>> suppress warning in
>> this case. But why event->oncpu becomes -1?
>>
>
> For this specific test it is not surprising to see these error
> messages. In this test
> we create bpf-output channel on 'ls' process only, but the BPF script
> is triggered
> on all procs (BPF triggering is not related to perf event scheduling).
> Trying to
> output data through 'ls' specific bpf-output channel should fail if
> this 'sys_write'
> is not issued by 'ls' or its children. So it is a correct behavior.
>
> However, I also see them in system wide channel:
>
> # echo "" > /sys/kernel/debug/tracing/trace
> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
> -e ./test_bpf_output.c/map:channel.event=evt/
> -a
> ^C[ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 17.534 MB perf.data (264326 samples) ]
> # cat /sys/kernel/debug/tracing/trace | tail
> rs:main Q:Reg-582 [000] d..2 4858.711225: : Error: failed to
> output: -95
> rs:main Q:Reg-582 [000] d..2 4858.711241: : Error: failed to
> output: -95
> gmain-1858 [003] d..2 4858.711436: : Error: failed to
> output: -95
> gmain-1858 [003] d..2 4858.711441: : Error: failed to
> output: -95
> gmain-1858 [003] d..2 4858.711473: : Error: failed to
> output: -95
> rs:main Q:Reg-582 [002] d..2 4858.712215: : Error: failed to
> output: -95
> rs:main Q:Reg-582 [002] d..2 4858.712224: : Error: failed to
> output: -95
> gmain-1858 [003] d..2 4858.712230: : Error: failed to
> output: -95
> rs:main Q:Reg-582 [002] d..2 4858.712235: : Error: failed to
> output: -95
> rs:main Q:Reg-582 [002] d..2 4858.712239: : Error: failed to
> output: -95
>
> System wide events can also be scheduled in and out. If the bpf-output
> events
> are scheduled out, trying to output data through it causes the above
> failure.
> I don't think it is a problem.
>
> Peter, Could you please give some infomation? In which case a system wide
> bpf output channel would be scheduled out?
>
Sorry, I think my brain is not quite well. Actually this is a easy question:
all ENOTSUPP results are generated before PERF_EVENT_IOC_ENABLE or after
PERF_EVENT_IOC_DISABLE. You saw so many failure messages because it is
sys_write,
and perf itself needs it.
So you can simply ignore these messages.
Thank you.
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 16/48] perf core: Add backward attribute to perf event
2016-02-22 9:10 ` [PATCH 16/48] perf core: Add backward attribute to perf event Wang Nan
@ 2016-02-24 13:08 ` Jiri Olsa
2016-02-24 13:21 ` Jiri Olsa
0 siblings, 1 reply; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 13:08 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:43AM +0000, Wang Nan wrote:
SNIP
> + if (is_write_backward(output_event) != is_write_backward(event))
> + goto out;
> +
> + /*
> * If both events generate aux data, they must be on the same PMU
> */
> if (has_aux(event) && has_aux(output_event) &&
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 37c11c6..80b1fa7 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -233,6 +233,8 @@ out:
> int perf_output_begin(struct perf_output_handle *handle,
> struct perf_event *event, unsigned int size)
> {
> + if (unlikely(is_write_backward(event)))
> + return __perf_output_begin(handle, event, size, true);
> return __perf_output_begin(handle, event, size, false);
could this be just:
return __perf_output_begin(handle, event, size,
is_write_backward(event))
also not sure if it's worth to have __perf_output_begin
if the only difference to perf_output_begin is 'backward'
argument that could be figured out from the event argument
anyway
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 16/48] perf core: Add backward attribute to perf event
2016-02-24 13:08 ` Jiri Olsa
@ 2016-02-24 13:21 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 13:21 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Wed, Feb 24, 2016 at 02:08:50PM +0100, Jiri Olsa wrote:
> On Mon, Feb 22, 2016 at 09:10:43AM +0000, Wang Nan wrote:
>
> SNIP
>
> > + if (is_write_backward(output_event) != is_write_backward(event))
> > + goto out;
> > +
> > + /*
> > * If both events generate aux data, they must be on the same PMU
> > */
> > if (has_aux(event) && has_aux(output_event) &&
> > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> > index 37c11c6..80b1fa7 100644
> > --- a/kernel/events/ring_buffer.c
> > +++ b/kernel/events/ring_buffer.c
> > @@ -233,6 +233,8 @@ out:
> > int perf_output_begin(struct perf_output_handle *handle,
> > struct perf_event *event, unsigned int size)
> > {
> > + if (unlikely(is_write_backward(event)))
> > + return __perf_output_begin(handle, event, size, true);
> > return __perf_output_begin(handle, event, size, false);
>
> could this be just:
> return __perf_output_begin(handle, event, size,
> is_write_backward(event))
>
> also not sure if it's worth to have __perf_output_begin
> if the only difference to perf_output_begin is 'backward'
> argument that could be figured out from the event argument
> anyway
nevermind my second comment, just saw it being used also in next patches ;-)
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 10/48] perf tools: Introduce bpf-output event
2016-02-24 1:58 ` Wangnan (F)
2016-02-24 2:04 ` Wangnan (F)
@ 2016-02-24 13:36 ` Arnaldo Carvalho de Melo
1 sibling, 0 replies; 76+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-02-24 13:36 UTC (permalink / raw)
To: Wangnan (F)
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo, Brendan Gregg,
Adrian Hunter, Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
Em Wed, Feb 24, 2016 at 09:58:34AM +0800, Wangnan (F) escreveu:
>
>
> On 2016/2/24 1:45, Arnaldo Carvalho de Melo wrote:
> >Em Mon, Feb 22, 2016 at 09:10:37AM +0000, Wang Nan escreveu:
> >>Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
> >>bpf_perf_event_output() helper) add a helper to enable BPF program
> >>output data to perf ring buffer through a new type of perf event
> >>PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
> >>event of that type. Now perf user can use following cmdline to
> >>receive output data from BPF programs:
> >>
> >> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
> >> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
> >> # ./perf script
> >> perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
> >> perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
> >> perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
> >> ...
> >>
> >>Test result:
> >> # cat ./test_bpf_output.c
> >> /************************ BEGIN **************************/
> >> #include <uapi/linux/bpf.h>
> >> struct bpf_map_def {
> >> unsigned int type;
> >> unsigned int key_size;
> >> unsigned int value_size;
> >> unsigned int max_entries;
> >> };
> >>
> >> #define SEC(NAME) __attribute__((section(NAME), used))
> >> static u64 (*ktime_get_ns)(void) =
> >> (void *)BPF_FUNC_ktime_get_ns;
> >> static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
> >> (void *)BPF_FUNC_trace_printk;
> >> static int (*get_smp_processor_id)(void) =
> >> (void *)BPF_FUNC_get_smp_processor_id;
> >> static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
> >> (void *)BPF_FUNC_perf_event_output;
> >>
> >> struct bpf_map_def SEC("maps") channel = {
> >> .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> >> .key_size = sizeof(int),
> >> .value_size = sizeof(u32),
> >> .max_entries = __NR_CPUS__,
> >> };
> >>
> >> SEC("func_write=sys_write")
> >> int func_write(void *ctx)
> >> {
> >> struct {
> >> u64 ktime;
> >> int cpuid;
> >> } __attribute__((packed)) output_data;
> >> char error_data[] = "Error: failed to output: %d\n";
> >>
> >> output_data.cpuid = get_smp_processor_id();
> >> output_data.ktime = ktime_get_ns();
> >> int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
> >> &output_data, sizeof(output_data));
> >> if (err)
> >> trace_printk(error_data, sizeof(error_data), err);
> >> return 0;
> >> }
> >> char _license[] SEC("license") = "GPL";
> >> int _version SEC("version") = LINUX_VERSION_CODE;
> >> /************************ END ***************************/
> >>
> >> # ./perf record -a -e bpf-output/no-inherit,name=evt/ \
> >> -e ./test_bpf_output.c/map:channel.event=evt/ ls /
> >> # ./perf script | grep ls
> >> ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
> >> ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...
> >So, there is something strange here:
> >
> > if (unlikely(event->oncpu != smp_processor_id()))
> > return -EOPNOTSUPP;
> >
> >This is where I am hitting, with:
> >
> >[acme@jouet linux]$ uname -r
> >4.5.0-rc4
> >
> > int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
> > &output_data, sizeof(output_data));
> > if (err)
> > trace_printk(error_data, sizeof(error_data), err);
> >
> >And then:
> >
> >[root@jouet bpf]# tail /sys/kernel/debug/tracing/trace
> > perf-13040 [003] d... 12062.807729: : Error: failed to output: -95
> > perf-13040 [003] d... 12062.807731: : Error: failed to output: -95
> > perf-13040 [003] d... 12062.807732: : Error: failed to output: -95
> > perf-13040 [003] d... 12062.807735: : Error: failed to output: -95
> > perf-13040 [003] d... 12062.807737: : Error: failed to output: -95
> > perf-13040 [003] d... 12062.807744: : Error: failed to output: -95
> > gnome-terminal--3091 [001] d... 12062.807773: : Error: failed to output: -95
> > gnome-terminal--3091 [001] d... 12062.807784: : Error: failed to output: -95
> > gmain-2830 [002] d... 12062.811791: : Error: failed to output: -95
> > gmain-2830 [002] d... 12062.811810: : Error: failed to output: -95
> >[root@jouet bpf]#
> >
> >Ideas? AFK for a while, will continue investigating.
>
> I also noticed this output, but didn't digg into it because all events
> I concerned is okay. I'll look into this today.
>
> >This already was submitted to Ingo, BTW.
> >
> >I used, as in the changeset comment tests:
> >
> >perf record -a -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output.c/map:channel.event=evt/ ls /
> >
> >And perf script told me:
> >
> >[root@jouet bpf]# perf script | tail
> > perf 13040 [003] 12062.708337: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708339: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708340: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708341: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708343: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708344: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708346: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708347: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708348: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> > perf 13040 [003] 12062.708350: evt: ffffffff81234eb1 sys_write (/lib/modules/4.5.0-rc4/build/vmlinux)
> >[root@jouet bpf]#
> >
> >Wonder where that /lib/modules/4.5.0-rc4/build/vmlinux came from...
> >
> >[root@jouet bpf]# perf script | cut -d'(' -f2 | sort | uniq -c
> > 1141 /lib/modules/4.5.0-rc4/build/vmlinux)
>
> It's a standard directory for perf searching vmlinux. Isn't it?
Nah, that was me being confused by 'perf script's output, it looked like
what was enclosed in () right after the sys_write was a parameter for
that function (sys_write), when in fact it is the DSO where sys_write is
in, duh.
- Arnaldo
> tools/perf/util/symbol.c:
>
> static const char * const vmlinux_paths_upd[] = {
> "/boot/vmlinux-%s",
> "/usr/lib/debug/boot/vmlinux-%s",
> "/lib/modules/%s/build/vmlinux",
> "/usr/lib/debug/lib/modules/%s/vmlinux",
> "/usr/lib/debug/boot/vmlinux-%s.debug"
> };
>
> So what's your problem?
>
> Thank you.
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 20/48] perf tools: Make ordered_events reusable
2016-02-22 9:10 ` [PATCH 20/48] perf tools: Make ordered_events reusable Wang Nan
@ 2016-02-24 14:18 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 14:18 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:47AM +0000, Wang Nan wrote:
> ordered_events__free() leaves linked lists and timestamps not cleared,
> so unable to be reused after ordered_events__free(). Which is inconvenient
> after 'perf record' supports generating multiple perf.data output and
> process build-ids for each of them.
>
> Calls ordered_events__init() in ordered_events__free() so ordered_events
> can be reused.
>
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
> tools/perf/util/ordered-events.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
> index b1b9e23..70c0dc8 100644
> --- a/tools/perf/util/ordered-events.c
> +++ b/tools/perf/util/ordered-events.c
> @@ -299,6 +299,8 @@ void ordered_events__init(struct ordered_events *oe, ordered_events__deliver_t d
>
> void ordered_events__free(struct ordered_events *oe)
> {
> + ordered_events__deliver_t old_deliver = oe->deliver;
> +
> while (!list_empty(&oe->to_free)) {
> struct ordered_event *event;
>
> @@ -307,4 +309,7 @@ void ordered_events__free(struct ordered_events *oe)
> free_dup_event(oe, event->event);
> free(event);
> }
> +
> + memset(oe, '\0', sizeof(*oe));
> + ordered_events__init(oe, old_deliver);
> }
I think it'd be better to put that memset call into ordered_events__init
and introduce ordered_events__reinit that calls ordered_events__free
and ordered_events__init with oe->deliver as you do above
that way it'll be apparent when you reuse the ordered_events struct
thanks,
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 18/48] perf tools: Only validate is_pos for tracking evsels
2016-02-22 9:10 ` [PATCH 18/48] perf tools: Only validate is_pos for tracking evsels Wang Nan
@ 2016-02-24 14:21 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 14:21 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:45AM +0000, Wang Nan wrote:
> is_pos only useful for tracking events (fork, mmap, exit, ...).
> Perf collects those events through evsel with 'tracking' set.
> Therefore, there's no need to validate every is_pos against
> evlist->is_pos.
>
> This patch is required after perf support PERF_SAMPLE_TAILSIZE.
> Since there an extra u64 at the end of this type of evsels, is_pos
> for evsel with PERF_SAMPLE_TAILSIZE setting is different from other
> evsels.
>
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
> tools/perf/util/evlist.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index c42e196..fef465a 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1274,8 +1274,15 @@ bool perf_evlist__valid_sample_type(struct perf_evlist *evlist)
> return false;
>
> evlist__for_each(evlist, pos) {
> - if (pos->id_pos != evlist->id_pos ||
> - pos->is_pos != evlist->is_pos)
> + if (pos->id_pos != evlist->id_pos)
> + return false;
> + /*
> + * Only tracking events needs is_pos. Those events are
> + * collected if evsel->tracking is selected.
> + * For other evsel, is_pos is useless for other evsels,
typo in comment above ^^^ using twice 'other evsel'
> + * so skip validating them.
> + */
> + if (pos->tracking && pos->is_pos != evlist->is_pos)
> return false;
> }
>
> --
> 1.8.3.4
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 21/48] perf record: Extract synthesize code to record__synthesize()
2016-02-22 9:10 ` [PATCH 21/48] perf record: Extract synthesize code to record__synthesize() Wang Nan
@ 2016-02-24 14:29 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 14:29 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:48AM +0000, Wang Nan wrote:
SNIP
> + err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
> + session, process_synthesized_event);
> + if (err)
> + goto out;
> + }
> +
> + err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
> + machine);
> + if (err < 0 && !warned_kmaps) {
> + warned_kmaps = true;
> + pr_err("Couldn't record kernel reference relocation symbol\n"
> + "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
> + "Check /proc/kallsyms permission or run as root.\n");
> + }
> +
> + err = perf_event__synthesize_modules(tool, process_synthesized_event,
> + machine);
> + if (err < 0 && !warned_modules) {
> + warned_modules = true;
could you please add logic for warning just once
into separate patch, so this is just pure move?
also we have WARN_ONCE macro in perf
thanks,
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 22/48] perf tools: Add perf_data_file__switch() helper
2016-02-22 9:10 ` [PATCH 22/48] perf tools: Add perf_data_file__switch() helper Wang Nan
@ 2016-02-24 14:34 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 14:34 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:49AM +0000, Wang Nan wrote:
> perf_data_file__switch() closes current output file, renames it, then
> open a new one to continue record. It will be used by perf record
> to split output into multiple perf.data files.
>
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
> tools/perf/util/data.c | 36 ++++++++++++++++++++++++++++++++++++
> tools/perf/util/data.h | 11 ++++++++++-
> 2 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index 1921942..bfded6a 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -136,3 +136,39 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
> {
> return writen(file->fd, buf, size);
> }
> +
> +int perf_data_file__switch(struct perf_data_file *file,
> + const char *postfix,
> + size_t pos, bool at_exit)
> +{
> + char *new_filepath;
> + int ret;
> +
> + if (check_pipe(file))
> + return -EINVAL;
> + if (perf_data_file__is_read(file))
> + return -EINVAL;
> +
> + if (asprintf(&new_filepath, "%s.%s", file->path, postfix) < 0)
> + return -ENOMEM;
> +
> + rename(file->path, new_filepath);
should we check for rename's return value?
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 23/48] perf record: Turns auxtrace_snapshot_enable into 3 states
2016-02-22 9:10 ` [PATCH 23/48] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
@ 2016-02-24 14:43 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 14:43 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:50AM +0000, Wang Nan wrote:
> auxtrace_snapshot_enable has only two states (0/1). Turns it into a
> triple states enum so SIGUSR2 handler can safely do other works without
> triggering auxtrace snapshot.
>
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
Acked-by: Jiri Olsa <jolsa@kernel.org>
thanks,
jirka
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 29/48] perf record: Re-synthesize tracking events after output switching
2016-02-22 9:10 ` [PATCH 29/48] perf record: Re-synthesize tracking events after output switching Wang Nan
@ 2016-02-24 14:57 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 14:57 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:56AM +0000, Wang Nan wrote:
> Tracking events describe kernel and threads. They are generated by
> reading /proc/kallsyms, /proc/*/maps and /proc/*/task/* during
> initialization of 'perf record', serialized into event sequences and put
> at the head of 'perf.data'. In case of output switching, each output
> file should contain those events.
>
> This patch calls record__synthesize() during output switching, so the
> event sequences described above can be collected again.
>
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
> tools/perf/builtin-record.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 2839715..3a11102 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -529,6 +529,8 @@ record__finish_output(struct record *rec)
> return;
> }
>
> +static int record__synthesize(struct record *rec);
> +
> static int
> record__switch_output(struct record *rec, bool at_exit)
> {
> @@ -557,6 +559,15 @@ record__switch_output(struct record *rec, bool at_exit)
> if (!quiet)
> fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
> file->path, timestamp);
> +
> + /* Reinit machine */
> + if (!at_exit) {
> + machines__exit(&rec->session->machines);
> + machines__init(&rec->session->machines);
> + perf_session__create_kernel_maps(rec->session);
> + perf_session__set_id_hdr_size(rec->session);
hum, what's the reason to reinit machines data, it's still the same no?
I'd think that onlt record__synthesize call is needed in here
also I think we should introduce some perf_session helper
for that.. like perf_session__init or such
thanks,
jirka
> + record__synthesize(rec);
> + }
> return fd;
> }
>
> --
> 1.8.3.4
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 30/48] perf record: Generate tracking events for process forked by perf
2016-02-22 9:10 ` [PATCH 30/48] perf record: Generate tracking events for process forked by perf Wang Nan
@ 2016-02-24 15:01 ` Jiri Olsa
0 siblings, 0 replies; 76+ messages in thread
From: Jiri Olsa @ 2016-02-24 15:01 UTC (permalink / raw)
To: Wang Nan
Cc: Alexei Starovoitov, Arnaldo Carvalho de Melo,
Arnaldo Carvalho de Melo, Brendan Gregg, Adrian Hunter,
Cody P Schafer, David S. Miller, He Kuang,
Jérémie Galarneau, Jiri Olsa, Kirill Smelkov, Li Zefan,
Masami Hiramatsu, Namhyung Kim, Peter Zijlstra, pi3orama,
linux-kernel
On Mon, Feb 22, 2016 at 09:10:57AM +0000, Wang Nan wrote:
> With 'perf record --switch-output' without -a, record__synthesize() in
> record__switch_output() won't generate tracking events because there's
> no thread_map in evlist. Which causes newly created perf.data doesn't
> contain map and comm information.
>
> This patch creates a fake thread_map and directly call
> perf_event__synthesize_thread_map() for those events.
>
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
> tools/perf/builtin-record.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 3a11102..7d4d8bf 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -567,6 +567,23 @@ record__switch_output(struct record *rec, bool at_exit)
> perf_session__create_kernel_maps(rec->session);
> perf_session__set_id_hdr_size(rec->session);
> record__synthesize(rec);
> +
could you please comment from changelog in here
> + if (target__none(&rec->opts.target)) {
also this would be better in separate function:
> + struct {
> + struct thread_map map;
> + struct thread_map_data map_data;
> + } thread_map;
> +
> + thread_map.map.nr = 1;
> + thread_map.map.map[0].pid = rec->evlist->workload.pid;
> + thread_map.map.map[0].comm = NULL;
> + perf_event__synthesize_thread_map(&rec->tool,
> + &thread_map.map,
> + process_synthesized_event,
> + &rec->session->machines.host,
> + rec->opts.sample_address,
> + rec->opts.proc_map_timeout);
> + }
> }
> return fd;
> }
> --
> 1.8.3.4
>
^ permalink raw reply [flat|nested] 76+ messages in thread
* [tip:perf/core] perf bpf: Add API to set values to map entries in a bpf object
2016-02-22 9:10 ` [PATCH 03/48] perf bpf: Add API to set values to map entries in a bpf object Wang Nan
@ 2016-02-25 5:39 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:39 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, jeremie.galarneau, brendan.d.gregg, hekuang, acme, tglx,
dev, linux-kernel, lizefan, jolsa, mingo, kirr, adrian.hunter,
ast, hpa, namhyung, masami.hiramatsu.pt, wangnan0
Commit-ID: 066dacbf2a32defb4de23ea4c1af9e77578b5ac2
Gitweb: http://git.kernel.org/tip/066dacbf2a32defb4de23ea4c1af9e77578b5ac2
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:30 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 12:17:48 -0300
perf bpf: Add API to set values to map entries in a bpf object
bpf__config_obj() is introduced as a core API to config BPF object after
loading. One configuration option of maps is introduced. After this
patch BPF object can accept assignments like:
map:my_map.value=1234
(map.my_map.value looks pretty. However, there's a small but hard to fix
problem related to flex's greedy matching. Please see [1]. Choose ':'
to avoid it in a simpler way.)
This patch is more complex than the work it does because the
consideration of extension. In designing BPF map configuration, the
following things should be considered:
1. Array indices selection: perf should allow user setting different
value for different slots in an array, with syntax like:
map:my_map.value[0,3...6]=1234;
2. A map should be set by different config terms, each for a part
of it. For example, set each slot to the pid of a thread;
3. Type of value: integer is not the only valid value type. A perf
counter can also be put into a map after commit 35578d798400
("bpf: Implement function bpf_perf_event_read() that get the
selected hardware PMU counter")
4. For a hash table, it should be possible to use a string or other
value as a key;
5. It is possible that map configuration is unable to be setup
during parsing. A perf counter is an example.
Therefore, this patch does the following:
1. Instead of updating map element during parsing, this patch stores
map config options in 'struct bpf_map_priv'. Following patches
will apply those configs at an appropriate time;
2. Link map operations in a list so a map can have multiple config
terms attached, so different parts can be configured separately;
3. Make 'struct bpf_map_priv' extensible so that the following patches
can add new types of keys and operations;
4. Use bpf_obj_config__map_funcs array to support more map config options.
Since the patch changing the event parser to parse BPF object config is
relative large, I've put it in another commit. Code in this patch can be
tested after applying the next patch.
[1] http://lkml.kernel.org/g/564ED621.4050500@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-4-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Changes "maps:my_map.value" to "map:my_map.value", improved error messages ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/bpf-loader.c | 276 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 38 ++++++
2 files changed, 314 insertions(+)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 0bdccf4..caeef9e 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -739,6 +739,261 @@ int bpf__foreach_tev(struct bpf_object *obj,
return 0;
}
+enum bpf_map_op_type {
+ BPF_MAP_OP_SET_VALUE,
+};
+
+enum bpf_map_key_type {
+ BPF_MAP_KEY_ALL,
+};
+
+struct bpf_map_op {
+ struct list_head list;
+ enum bpf_map_op_type op_type;
+ enum bpf_map_key_type key_type;
+ union {
+ u64 value;
+ } v;
+};
+
+struct bpf_map_priv {
+ struct list_head ops_list;
+};
+
+static void
+bpf_map_op__delete(struct bpf_map_op *op)
+{
+ if (!list_empty(&op->list))
+ list_del(&op->list);
+ free(op);
+}
+
+static void
+bpf_map_priv__purge(struct bpf_map_priv *priv)
+{
+ struct bpf_map_op *pos, *n;
+
+ list_for_each_entry_safe(pos, n, &priv->ops_list, list) {
+ list_del_init(&pos->list);
+ bpf_map_op__delete(pos);
+ }
+}
+
+static void
+bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
+ void *_priv)
+{
+ struct bpf_map_priv *priv = _priv;
+
+ bpf_map_priv__purge(priv);
+ free(priv);
+}
+
+static struct bpf_map_op *
+bpf_map_op__new(void)
+{
+ struct bpf_map_op *op;
+
+ op = zalloc(sizeof(*op));
+ if (!op) {
+ pr_debug("Failed to alloc bpf_map_op\n");
+ return ERR_PTR(-ENOMEM);
+ }
+ INIT_LIST_HEAD(&op->list);
+
+ op->key_type = BPF_MAP_KEY_ALL;
+ return op;
+}
+
+static int
+bpf_map__add_op(struct bpf_map *map, struct bpf_map_op *op)
+{
+ struct bpf_map_priv *priv;
+ const char *map_name;
+ int err;
+
+ map_name = bpf_map__get_name(map);
+ err = bpf_map__get_private(map, (void **)&priv);
+ if (err) {
+ pr_debug("Failed to get private from map %s\n", map_name);
+ return err;
+ }
+
+ if (!priv) {
+ priv = zalloc(sizeof(*priv));
+ if (!priv) {
+ pr_debug("No enough memory to alloc map private\n");
+ return -ENOMEM;
+ }
+ INIT_LIST_HEAD(&priv->ops_list);
+
+ if (bpf_map__set_private(map, priv, bpf_map_priv__clear)) {
+ free(priv);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ }
+
+ list_add_tail(&op->list, &priv->ops_list);
+ return 0;
+}
+
+static int
+__bpf_map__config_value(struct bpf_map *map,
+ struct parse_events_term *term)
+{
+ struct bpf_map_def def;
+ struct bpf_map_op *op;
+ const char *map_name;
+ int err;
+
+ map_name = bpf_map__get_name(map);
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("Unable to get map definition from '%s'\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ if (def.type != BPF_MAP_TYPE_ARRAY) {
+ pr_debug("Map %s type is not BPF_MAP_TYPE_ARRAY\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
+ }
+ if (def.key_size < sizeof(unsigned int)) {
+ pr_debug("Map %s has incorrect key size\n", map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE;
+ }
+ switch (def.value_size) {
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ break;
+ default:
+ pr_debug("Map %s has incorrect value size\n", map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
+ }
+
+ op = bpf_map_op__new();
+ if (IS_ERR(op))
+ return PTR_ERR(op);
+ op->op_type = BPF_MAP_OP_SET_VALUE;
+ op->v.value = term->val.num;
+
+ err = bpf_map__add_op(map, op);
+ if (err)
+ bpf_map_op__delete(op);
+ return err;
+}
+
+static int
+bpf_map__config_value(struct bpf_map *map,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist __maybe_unused)
+{
+ if (!term->err_val) {
+ pr_debug("Config value not set\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_CONF;
+ }
+
+ if (term->type_val != PARSE_EVENTS__TERM_TYPE_NUM) {
+ pr_debug("ERROR: wrong value type\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
+ }
+
+ return __bpf_map__config_value(map, term);
+}
+
+struct bpf_obj_config__map_func {
+ const char *config_opt;
+ int (*config_func)(struct bpf_map *, struct parse_events_term *,
+ struct perf_evlist *);
+};
+
+struct bpf_obj_config__map_func bpf_obj_config__map_funcs[] = {
+ {"value", bpf_map__config_value},
+};
+
+static int
+bpf__obj_config_map(struct bpf_object *obj,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist,
+ int *key_scan_pos)
+{
+ /* key is "map:<mapname>.<config opt>" */
+ char *map_name = strdup(term->config + sizeof("map:") - 1);
+ struct bpf_map *map;
+ int err = -BPF_LOADER_ERRNO__OBJCONF_OPT;
+ char *map_opt;
+ size_t i;
+
+ if (!map_name)
+ return -ENOMEM;
+
+ map_opt = strchr(map_name, '.');
+ if (!map_opt) {
+ pr_debug("ERROR: Invalid map config: %s\n", map_name);
+ goto out;
+ }
+
+ *map_opt++ = '\0';
+ if (*map_opt == '\0') {
+ pr_debug("ERROR: Invalid map option: %s\n", term->config);
+ goto out;
+ }
+
+ map = bpf_object__get_map_by_name(obj, map_name);
+ if (!map) {
+ pr_debug("ERROR: Map %s doesn't exist\n", map_name);
+ err = -BPF_LOADER_ERRNO__OBJCONF_MAP_NOTEXIST;
+ goto out;
+ }
+
+ *key_scan_pos += map_opt - map_name;
+ for (i = 0; i < ARRAY_SIZE(bpf_obj_config__map_funcs); i++) {
+ struct bpf_obj_config__map_func *func =
+ &bpf_obj_config__map_funcs[i];
+
+ if (strcmp(map_opt, func->config_opt) == 0) {
+ err = func->config_func(map, term, evlist);
+ goto out;
+ }
+ }
+
+ pr_debug("ERROR: Invalid map config option '%s'\n", map_opt);
+ err = -BPF_LOADER_ERRNO__OBJCONF_MAP_OPT;
+out:
+ free(map_name);
+ if (!err)
+ key_scan_pos += strlen(map_opt);
+ return err;
+}
+
+int bpf__config_obj(struct bpf_object *obj,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist,
+ int *error_pos)
+{
+ int key_scan_pos = 0;
+ int err;
+
+ if (!obj || !term || !term->config)
+ return -EINVAL;
+
+ if (!prefixcmp(term->config, "map:")) {
+ key_scan_pos = sizeof("map:") - 1;
+ err = bpf__obj_config_map(obj, term, evlist, &key_scan_pos);
+ goto out;
+ }
+ err = -BPF_LOADER_ERRNO__OBJCONF_OPT;
+out:
+ if (error_pos)
+ *error_pos = key_scan_pos;
+ return err;
+
+}
+
#define ERRNO_OFFSET(e) ((e) - __BPF_LOADER_ERRNO__START)
#define ERRCODE_OFFSET(c) ERRNO_OFFSET(BPF_LOADER_ERRNO__##c)
#define NR_ERRNO (__BPF_LOADER_ERRNO__END - __BPF_LOADER_ERRNO__START)
@@ -753,6 +1008,14 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(PROLOGUE)] = "Failed to generate prologue",
[ERRCODE_OFFSET(PROLOGUE2BIG)] = "Prologue too big for program",
[ERRCODE_OFFSET(PROLOGUEOOB)] = "Offset out of bound for prologue",
+ [ERRCODE_OFFSET(OBJCONF_OPT)] = "Invalid object config option",
+ [ERRCODE_OFFSET(OBJCONF_CONF)] = "Config value not set (missing '=')",
+ [ERRCODE_OFFSET(OBJCONF_MAP_OPT)] = "Invalid object map config option",
+ [ERRCODE_OFFSET(OBJCONF_MAP_NOTEXIST)] = "Target map doesn't exist",
+ [ERRCODE_OFFSET(OBJCONF_MAP_VALUE)] = "Incorrect value type for map",
+ [ERRCODE_OFFSET(OBJCONF_MAP_TYPE)] = "Incorrect map type",
+ [ERRCODE_OFFSET(OBJCONF_MAP_KEYSIZE)] = "Incorrect map key size",
+ [ERRCODE_OFFSET(OBJCONF_MAP_VALUESIZE)] = "Incorrect map value size",
};
static int
@@ -872,3 +1135,16 @@ int bpf__strerror_load(struct bpf_object *obj,
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
+ struct parse_events_term *term __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused,
+ int *error_pos __maybe_unused, int err,
+ char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE,
+ "Can't use this config term with this map type");
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 6fdc045..cc46a07 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -10,6 +10,7 @@
#include <string.h>
#include <bpf/libbpf.h>
#include "probe-event.h"
+#include "evlist.h"
#include "debug.h"
enum bpf_loader_errno {
@@ -24,10 +25,19 @@ enum bpf_loader_errno {
BPF_LOADER_ERRNO__PROLOGUE, /* Failed to generate prologue */
BPF_LOADER_ERRNO__PROLOGUE2BIG, /* Prologue too big for program */
BPF_LOADER_ERRNO__PROLOGUEOOB, /* Offset out of bound for prologue */
+ BPF_LOADER_ERRNO__OBJCONF_OPT, /* Invalid object config option */
+ BPF_LOADER_ERRNO__OBJCONF_CONF, /* Config value not set (lost '=')) */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_OPT, /* Invalid object map config option */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_NOTEXIST, /* Target map not exist */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE, /* Incorrect value type for map */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE, /* Incorrect map type */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE, /* Incorrect map key size */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE,/* Incorrect map value size */
__BPF_LOADER_ERRNO__END,
};
struct bpf_object;
+struct parse_events_term;
#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
@@ -53,6 +63,14 @@ int bpf__strerror_load(struct bpf_object *obj, int err,
char *buf, size_t size);
int bpf__foreach_tev(struct bpf_object *obj,
bpf_prog_iter_callback_t func, void *arg);
+
+int bpf__config_obj(struct bpf_object *obj, struct parse_events_term *term,
+ struct perf_evlist *evlist, int *error_pos);
+int bpf__strerror_config_obj(struct bpf_object *obj,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist,
+ int *error_pos, int err, char *buf,
+ size_t size);
#else
static inline struct bpf_object *
bpf__prepare_load(const char *filename __maybe_unused,
@@ -84,6 +102,15 @@ bpf__foreach_tev(struct bpf_object *obj __maybe_unused,
}
static inline int
+bpf__config_obj(struct bpf_object *obj __maybe_unused,
+ struct parse_events_term *term __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused,
+ int *error_pos __maybe_unused)
+{
+ return 0;
+}
+
+static inline int
__bpf_strerror(char *buf, size_t size)
{
if (!size)
@@ -118,5 +145,16 @@ static inline int bpf__strerror_load(struct bpf_object *obj __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int
+bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
+ struct parse_events_term *term __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused,
+ int *error_pos __maybe_unused,
+ int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf tools: Enable BPF object configure syntax
2016-02-22 9:10 ` [PATCH 04/48] perf tools: Enable BPF object configure syntax Wang Nan
@ 2016-02-25 5:39 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:39 UTC (permalink / raw)
To: linux-tip-commits
Cc: ast, linux-kernel, brendan.d.gregg, dev, namhyung,
jeremie.galarneau, hekuang, mingo, acme, lizefan, wangnan0, tglx,
kirr, adrian.hunter, masami.hiramatsu.pt, hpa, jolsa, peterz
Commit-ID: a34f3be70cdf986850552e62b9f22d659bfbcef3
Gitweb: http://git.kernel.org/tip/a34f3be70cdf986850552e62b9f22d659bfbcef3
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:31 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 12:20:35 -0300
perf tools: Enable BPF object configure syntax
This patch adds the final step for BPF map configuration. A new syntax
is appended into parser so user can config BPF objects through '/' '/'
enclosed config terms.
After this patch, following syntax is available:
# perf record -e ./test_bpf_map_1.c/map:channel.value=10/ ...
It would takes effect after appling following commits.
Test result:
# cat ./test_bpf_map_1.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
SEC("func=sys_nanosleep")
int func(void *ctx)
{
int key = 0;
char fmt[] = "%d\n";
int *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), *pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
- Normal case:
# ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
- Error case:
# ./perf record -e './test_bpf_map_1.c/map:channel.value/' usleep 10
event syntax error: '..ps:channel:value/'
\___ Config value not set (missing '=')
Hint: Valid config term:
map:[<arraymap>]:value=[value]
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
# ./perf record -e './test_bpf_map_1.c/xmap:channel.value=10/' usleep 10
event syntax error: '..pf_map_1.c/xmap:channel.value=10/'
\___ Invalid object config option
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:xchannel.value=10/' usleep 10
event syntax error: '..p_1.c/map:xchannel.value=10/'
\___ Target map not exist
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:channel.xvalue=10/' usleep 10
event syntax error: '..ps:channel.xvalue=10/'
\___ Invalid object map config option
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:channel.value=x10/' usleep 10
event syntax error: '..nnel.value=x10/'
\___ Incorrect value type for map
[SNIP]
Change BPF_MAP_TYPE_ARRAY to '1' in test_bpf_map_1.c:
# ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
event syntax error: '..ps:channel.value=10/'
\___ Can't use this config term to this type of map
Hint: Valid config term:
map:[<arraymap>].value=[value]
(add -v to see detail)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
[for parser part]
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-5-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/parse-events.c | 55 +++++++++++++++++++++++++++++++++++++++---
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.l | 2 +-
tools/perf/util/parse-events.y | 10 +++++---
4 files changed, 61 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index b0b3295..a5dd670 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -672,17 +672,63 @@ errout:
return err;
}
+static int
+parse_events_config_bpf(struct parse_events_evlist *data,
+ struct bpf_object *obj,
+ struct list_head *head_config)
+{
+ struct parse_events_term *term;
+ int error_pos;
+
+ if (!head_config || list_empty(head_config))
+ return 0;
+
+ list_for_each_entry(term, head_config, list) {
+ char errbuf[BUFSIZ];
+ int err;
+
+ if (term->type_term != PARSE_EVENTS__TERM_TYPE_USER) {
+ snprintf(errbuf, sizeof(errbuf),
+ "Invalid config term for BPF object");
+ errbuf[BUFSIZ - 1] = '\0';
+
+ data->error->idx = term->err_term;
+ data->error->str = strdup(errbuf);
+ return -EINVAL;
+ }
+
+ err = bpf__config_obj(obj, term, NULL, &error_pos);
+ if (err) {
+ bpf__strerror_config_obj(obj, term, NULL,
+ &error_pos, err, errbuf,
+ sizeof(errbuf));
+ data->error->help = strdup(
+"Hint:\tValid config term:\n"
+" \tmap:[<arraymap>].value=[value]\n"
+" \t(add -v to see detail)");
+ data->error->str = strdup(errbuf);
+ if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
+ data->error->idx = term->err_val;
+ else
+ data->error->idx = term->err_term + error_pos;
+ return err;
+ }
+ }
+ return 0;
+}
+
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
char *bpf_file_name,
- bool source)
+ bool source,
+ struct list_head *head_config)
{
struct bpf_object *obj;
+ int err;
obj = bpf__prepare_load(bpf_file_name, source);
if (IS_ERR(obj)) {
char errbuf[BUFSIZ];
- int err;
err = PTR_ERR(obj);
@@ -700,7 +746,10 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
return err;
}
- return parse_events_load_bpf_obj(data, list, obj);
+ err = parse_events_load_bpf_obj(data, list, obj);
+ if (err)
+ return err;
+ return parse_events_config_bpf(data, obj, head_config);
}
static int
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index d5eb2af..c48377a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -129,7 +129,8 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
char *bpf_file_name,
- bool source);
+ bool source,
+ struct list_head *head_config);
/* Provide this function for perf test */
struct bpf_object;
int parse_events_load_bpf_obj(struct parse_events_evlist *data,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 99486e6..0cc6b84 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -122,7 +122,7 @@ num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
name [a-zA-Z_*?][a-zA-Z0-9_*?.]*
-name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.]*
+name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
/* If you add a modifier you need to update check_modifier() */
modifier_event [ukhpPGHSDI]+
modifier_bp [rwx]{1,3}
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 6a2d006..0e2d433 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -437,24 +437,26 @@ PE_RAW opt_event_config
}
event_bpf_file:
-PE_BPF_OBJECT
+PE_BPF_OBJECT opt_event_config
{
struct parse_events_evlist *data = _data;
struct parse_events_error *error = data->error;
struct list_head *list;
ALLOC_LIST(list);
- ABORT_ON(parse_events_load_bpf(data, list, $1, false));
+ ABORT_ON(parse_events_load_bpf(data, list, $1, false, $2));
+ parse_events_terms__delete($2);
$$ = list;
}
|
-PE_BPF_SOURCE
+PE_BPF_SOURCE opt_event_config
{
struct parse_events_evlist *data = _data;
struct list_head *list;
ALLOC_LIST(list);
- ABORT_ON(parse_events_load_bpf(data, list, $1, true));
+ ABORT_ON(parse_events_load_bpf(data, list, $1, true, $2));
+ parse_events_terms__delete($2);
$$ = list;
}
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf record: Apply config to BPF objects before recording
2016-02-22 9:10 ` [PATCH 05/48] perf record: Apply config to BPF objects before recording Wang Nan
@ 2016-02-25 5:39 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:39 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, wangnan0, adrian.hunter, masami.hiramatsu.pt,
jeremie.galarneau, brendan.d.gregg, hekuang, jolsa, linux-kernel,
acme, tglx, kirr, namhyung, hpa, ast, lizefan, dev, mingo
Commit-ID: 8690a2a773703e4ad2a07a7f3912ea6b131307cc
Gitweb: http://git.kernel.org/tip/8690a2a773703e4ad2a07a7f3912ea6b131307cc
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:32 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 12:28:02 -0300
perf record: Apply config to BPF objects before recording
bpf__apply_obj_config() is introduced as the core API to apply object
config options to all BPF objects. This patch also does the real work
for setting values for BPF_MAP_TYPE_PERF_ARRAY maps by inserting value
stored in map's private field into the BPF map.
This patch is required because we are not always able to set all BPF
config during parsing. Further patch will set events created by perf to
BPF_MAP_TYPE_PERF_EVENT_ARRAY maps, which is not exist until
perf_evsel__open().
bpf_map_foreach_key() is introduced to iterate over each key needs to be
configured. This function would be extended to support more map types
and different key settings.
In perf record, before start recording, call bpf__apply_config() to turn
on all BPF config options.
Test result:
# cat ./test_bpf_map_1.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
SEC("func=sys_nanosleep")
int func(void *ctx)
{
int key = 0;
char fmt[] = "%d\n";
int *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), *pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -e './test_bpf_map_1.c/map:channel.value=11/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-18593 [007] d... 2394714.395539: : 11
# ./perf record -e './test_bpf_map_1.c/map:channel.value=101/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-18593 [007] d... 2394714.395539: : 11
usleep-19000 [006] d... 2394831.057840: : 101
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-6-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/builtin-record.c | 11 +++
tools/perf/util/bpf-loader.c | 184 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 15 ++++
3 files changed, 210 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cf3a28d..7d11162 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -32,6 +32,7 @@
#include "util/parse-branch-options.h"
#include "util/parse-regs-options.h"
#include "util/llvm-utils.h"
+#include "util/bpf-loader.h"
#include <unistd.h>
#include <sched.h>
@@ -536,6 +537,16 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}
+ err = bpf__apply_obj_config();
+ if (err) {
+ char errbuf[BUFSIZ];
+
+ bpf__strerror_apply_obj_config(err, errbuf, sizeof(errbuf));
+ pr_err("ERROR: Apply config to BPF failed: %s\n",
+ errbuf);
+ goto out_child;
+ }
+
/*
* Normally perf_session__new would do this, but it doesn't have the
* evlist.
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index caeef9e..dbbd17c 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -7,6 +7,7 @@
#include <linux/bpf.h>
#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
#include <linux/err.h>
#include <linux/string.h>
#include "perf.h"
@@ -994,6 +995,182 @@ out:
}
+typedef int (*map_config_func_t)(const char *name, int map_fd,
+ struct bpf_map_def *pdef,
+ struct bpf_map_op *op,
+ void *pkey, void *arg);
+
+static int
+foreach_key_array_all(map_config_func_t func,
+ void *arg, const char *name,
+ int map_fd, struct bpf_map_def *pdef,
+ struct bpf_map_op *op)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < pdef->max_entries; i++) {
+ err = func(name, map_fd, pdef, op, &i, arg);
+ if (err) {
+ pr_debug("ERROR: failed to insert value to %s[%u]\n",
+ name, i);
+ return err;
+ }
+ }
+ return 0;
+}
+
+static int
+bpf_map_config_foreach_key(struct bpf_map *map,
+ map_config_func_t func,
+ void *arg)
+{
+ int err, map_fd;
+ const char *name;
+ struct bpf_map_op *op;
+ struct bpf_map_def def;
+ struct bpf_map_priv *priv;
+
+ name = bpf_map__get_name(map);
+
+ err = bpf_map__get_private(map, (void **)&priv);
+ if (err) {
+ pr_debug("ERROR: failed to get private from map %s\n", name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ if (!priv || list_empty(&priv->ops_list)) {
+ pr_debug("INFO: nothing to config for map %s\n", name);
+ return 0;
+ }
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("ERROR: failed to get definition from map %s\n", name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ map_fd = bpf_map__get_fd(map);
+ if (map_fd < 0) {
+ pr_debug("ERROR: failed to get fd from map %s\n", name);
+ return map_fd;
+ }
+
+ list_for_each_entry(op, &priv->ops_list, list) {
+ switch (def.type) {
+ case BPF_MAP_TYPE_ARRAY:
+ switch (op->key_type) {
+ case BPF_MAP_KEY_ALL:
+ err = foreach_key_array_all(func, arg, name,
+ map_fd, &def, op);
+ if (err)
+ return err;
+ break;
+ default:
+ pr_debug("ERROR: keytype for map '%s' invalid\n",
+ name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ break;
+ default:
+ pr_debug("ERROR: type of '%s' incorrect\n", name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
+ }
+ }
+
+ return 0;
+}
+
+static int
+apply_config_value_for_key(int map_fd, void *pkey,
+ size_t val_size, u64 val)
+{
+ int err = 0;
+
+ switch (val_size) {
+ case 1: {
+ u8 _val = (u8)(val);
+ err = bpf_map_update_elem(map_fd, pkey, &_val, BPF_ANY);
+ break;
+ }
+ case 2: {
+ u16 _val = (u16)(val);
+ err = bpf_map_update_elem(map_fd, pkey, &_val, BPF_ANY);
+ break;
+ }
+ case 4: {
+ u32 _val = (u32)(val);
+ err = bpf_map_update_elem(map_fd, pkey, &_val, BPF_ANY);
+ break;
+ }
+ case 8: {
+ err = bpf_map_update_elem(map_fd, pkey, &val, BPF_ANY);
+ break;
+ }
+ default:
+ pr_debug("ERROR: invalid value size\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
+ }
+ if (err && errno)
+ err = -errno;
+ return err;
+}
+
+static int
+apply_obj_config_map_for_key(const char *name, int map_fd,
+ struct bpf_map_def *pdef __maybe_unused,
+ struct bpf_map_op *op,
+ void *pkey, void *arg __maybe_unused)
+{
+ int err;
+
+ switch (op->op_type) {
+ case BPF_MAP_OP_SET_VALUE:
+ err = apply_config_value_for_key(map_fd, pkey,
+ pdef->value_size,
+ op->v.value);
+ break;
+ default:
+ pr_debug("ERROR: unknown value type for '%s'\n", name);
+ err = -BPF_LOADER_ERRNO__INTERNAL;
+ }
+ return err;
+}
+
+static int
+apply_obj_config_map(struct bpf_map *map)
+{
+ return bpf_map_config_foreach_key(map,
+ apply_obj_config_map_for_key,
+ NULL);
+}
+
+static int
+apply_obj_config_object(struct bpf_object *obj)
+{
+ struct bpf_map *map;
+ int err;
+
+ bpf_map__for_each(map, obj) {
+ err = apply_obj_config_map(map);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
+int bpf__apply_obj_config(void)
+{
+ struct bpf_object *obj, *tmp;
+ int err;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ err = apply_obj_config_object(obj);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
#define ERRNO_OFFSET(e) ((e) - __BPF_LOADER_ERRNO__START)
#define ERRCODE_OFFSET(c) ERRNO_OFFSET(BPF_LOADER_ERRNO__##c)
#define NR_ERRNO (__BPF_LOADER_ERRNO__END - __BPF_LOADER_ERRNO__START)
@@ -1148,3 +1325,10 @@ int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_apply_obj_config(int err, char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index cc46a07..5d3b931 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -71,6 +71,8 @@ int bpf__strerror_config_obj(struct bpf_object *obj,
struct perf_evlist *evlist,
int *error_pos, int err, char *buf,
size_t size);
+int bpf__apply_obj_config(void);
+int bpf__strerror_apply_obj_config(int err, char *buf, size_t size);
#else
static inline struct bpf_object *
bpf__prepare_load(const char *filename __maybe_unused,
@@ -111,6 +113,12 @@ bpf__config_obj(struct bpf_object *obj __maybe_unused,
}
static inline int
+bpf__apply_obj_config(void)
+{
+ return 0;
+}
+
+static inline int
__bpf_strerror(char *buf, size_t size)
{
if (!size)
@@ -156,5 +164,12 @@ bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int
+bpf__strerror_apply_obj_config(int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf tools: Enable passing event to BPF object
2016-02-22 9:10 ` [PATCH 06/48] perf tools: Enable passing event to BPF object Wang Nan
@ 2016-02-25 5:40 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:40 UTC (permalink / raw)
To: linux-tip-commits
Cc: adrian.hunter, jolsa, hekuang, namhyung, peterz, wangnan0, kirr,
masami.hiramatsu.pt, tglx, acme, ast, brendan.d.gregg, lizefan,
dev, linux-kernel, mingo, hpa, jeremie.galarneau
Commit-ID: 7630b3e28dd827fffad13cc0aada14b00ec524d9
Gitweb: http://git.kernel.org/tip/7630b3e28dd827fffad13cc0aada14b00ec524d9
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:33 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 12:30:50 -0300
perf tools: Enable passing event to BPF object
A new syntax is added to the parser so that the user can access
predefined perf events in BPF objects.
After this patch, BPF programs for perf are finally able to utilize
bpf_perf_event_read() introduced in commit 35578d798400 ("bpf: Implement
function bpf_perf_event_read() that get the selected hardware PMU
counter").
Test result:
# cat test_bpf_map_2.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_read)(struct bpf_map_def *, int) =
(void *)BPF_FUNC_perf_event_read;
struct bpf_map_def SEC("maps") pmu_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = __NR_CPUS__,
};
SEC("func_write=sys_write")
int func_write(void *ctx)
{
unsigned long long val;
char fmt[] = "sys_write: pmu=%llu\n";
val = perf_event_read(&pmu_map, get_smp_processor_id());
trace_printk(fmt, sizeof(fmt), val);
return 0;
}
SEC("func_write_return=sys_write%return")
int func_write_return(void *ctx)
{
unsigned long long val = 0;
char fmt[] = "sys_write_return: pmu=%llu\n";
val = perf_event_read(&pmu_map, get_smp_processor_id());
trace_printk(fmt, sizeof(fmt), val);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Normal case:
# echo "" > /sys/kernel/debug/tracing/trace
# perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
[SNIP]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
# cat /sys/kernel/debug/tracing/trace | grep ls
ls-17066 [000] d... 938449.863301: : sys_write: pmu=1157327
ls-17066 [000] dN.. 938449.863342: : sys_write_return: pmu=1225218
ls-17066 [000] d... 938449.863349: : sys_write: pmu=1241922
ls-17066 [000] dN.. 938449.863369: : sys_write_return: pmu=1267445
Normal case (system wide):
# echo "" > /sys/kernel/debug/tracing/trace
# perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.811 MB perf.data (120 samples) ]
# cat /sys/kernel/debug/tracing/trace | grep -v '18446744073709551594' | grep -v perf | head -n 20
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
gmain-30828 [002] d... 2740551.068992: : sys_write: pmu=84373
gmain-30828 [002] d... 2740551.068992: : sys_write_return: pmu=87696
gmain-30828 [002] d... 2740551.068996: : sys_write: pmu=100658
gmain-30828 [002] d... 2740551.068997: : sys_write_return: pmu=102572
Error case 1:
# perf record -e './test_bpf_map_2.c' ls /
[SNIP]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep ls
ls-17115 [007] d... 2724279.665625: : sys_write: pmu=18446744073709551614
ls-17115 [007] dN.. 2724279.665651: : sys_write_return: pmu=18446744073709551614
ls-17115 [007] d... 2724279.665658: : sys_write: pmu=18446744073709551614
ls-17115 [007] dN.. 2724279.665677: : sys_write_return: pmu=18446744073709551614
(18446744073709551614 is 0xfffffffffffffffe (-2))
Error case 2:
# perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=evt/' -a
event syntax error: '..ps:pmu_map.event=evt/'
\___ Event not found for map setting
Hint: Valid config terms:
map:[<arraymap>].value=[value]
map:[<eventmap>].event=[event]
[SNIP]
Error case 3:
# ls /proc/2348/task/
2348 2505 2506 2507 2508
# perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -p 2348
ERROR: Apply config to BPF failed: Cannot set event to BPF map in multi-thread tracing
Error case 4:
# perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
ERROR: Apply config to BPF failed: Doesn't support inherit event (Hint: use -i to turn off inherit)
Error case 5:
# perf record -i -e raw_syscalls:sys_enter -e './test_bpf_map_2.c/map:pmu_map.event=raw_syscalls:sys_enter/' ls
ERROR: Apply config to BPF failed: Can only put raw, hardware and BPF output event into a BPF map
Error case 6:
# perf record -i -e './test_bpf_map_2.c/map:pmu_map.event=123/' ls /
event syntax error: '.._map.event=123/'
\___ Incorrect value type for map
[SNIP]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-7-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/bpf-loader.c | 163 +++++++++++++++++++++++++++++++++++++++--
tools/perf/util/bpf-loader.h | 5 ++
tools/perf/util/evlist.c | 16 ++++
tools/perf/util/evlist.h | 3 +
tools/perf/util/parse-events.c | 15 ++--
tools/perf/util/parse-events.h | 1 +
6 files changed, 190 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index dbbd17c..deacb95 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -742,6 +742,7 @@ int bpf__foreach_tev(struct bpf_object *obj,
enum bpf_map_op_type {
BPF_MAP_OP_SET_VALUE,
+ BPF_MAP_OP_SET_EVSEL,
};
enum bpf_map_key_type {
@@ -754,6 +755,7 @@ struct bpf_map_op {
enum bpf_map_key_type key_type;
union {
u64 value;
+ struct perf_evsel *evsel;
} v;
};
@@ -838,6 +840,24 @@ bpf_map__add_op(struct bpf_map *map, struct bpf_map_op *op)
return 0;
}
+static struct bpf_map_op *
+bpf_map__add_newop(struct bpf_map *map)
+{
+ struct bpf_map_op *op;
+ int err;
+
+ op = bpf_map_op__new();
+ if (IS_ERR(op))
+ return op;
+
+ err = bpf_map__add_op(map, op);
+ if (err) {
+ bpf_map_op__delete(op);
+ return ERR_PTR(err);
+ }
+ return op;
+}
+
static int
__bpf_map__config_value(struct bpf_map *map,
struct parse_events_term *term)
@@ -876,16 +896,12 @@ __bpf_map__config_value(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
}
- op = bpf_map_op__new();
+ op = bpf_map__add_newop(map);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_VALUE;
op->v.value = term->val.num;
-
- err = bpf_map__add_op(map, op);
- if (err)
- bpf_map_op__delete(op);
- return err;
+ return 0;
}
static int
@@ -899,13 +915,75 @@ bpf_map__config_value(struct bpf_map *map,
}
if (term->type_val != PARSE_EVENTS__TERM_TYPE_NUM) {
- pr_debug("ERROR: wrong value type\n");
+ pr_debug("ERROR: wrong value type for 'value'\n");
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
}
return __bpf_map__config_value(map, term);
}
+static int
+__bpf_map__config_event(struct bpf_map *map,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel;
+ struct bpf_map_def def;
+ struct bpf_map_op *op;
+ const char *map_name;
+ int err;
+
+ map_name = bpf_map__get_name(map);
+ evsel = perf_evlist__find_evsel_by_str(evlist, term->val.str);
+ if (!evsel) {
+ pr_debug("Event (for '%s') '%s' doesn't exist\n",
+ map_name, term->val.str);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_NOEVT;
+ }
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("Unable to get map definition from '%s'\n",
+ map_name);
+ return err;
+ }
+
+ /*
+ * No need to check key_size and value_size:
+ * kernel has already checked them.
+ */
+ if (def.type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
+ pr_debug("Map %s type is not BPF_MAP_TYPE_PERF_EVENT_ARRAY\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
+ }
+
+ op = bpf_map__add_newop(map);
+ if (IS_ERR(op))
+ return PTR_ERR(op);
+ op->op_type = BPF_MAP_OP_SET_EVSEL;
+ op->v.evsel = evsel;
+ return 0;
+}
+
+static int
+bpf_map__config_event(struct bpf_map *map,
+ struct parse_events_term *term,
+ struct perf_evlist *evlist)
+{
+ if (!term->err_val) {
+ pr_debug("Config value not set\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_CONF;
+ }
+
+ if (term->type_val != PARSE_EVENTS__TERM_TYPE_STR) {
+ pr_debug("ERROR: wrong value type for 'event'\n");
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
+ }
+
+ return __bpf_map__config_event(map, term, evlist);
+}
+
struct bpf_obj_config__map_func {
const char *config_opt;
int (*config_func)(struct bpf_map *, struct parse_events_term *,
@@ -914,6 +992,7 @@ struct bpf_obj_config__map_func {
struct bpf_obj_config__map_func bpf_obj_config__map_funcs[] = {
{"value", bpf_map__config_value},
+ {"event", bpf_map__config_event},
};
static int
@@ -1057,6 +1136,7 @@ bpf_map_config_foreach_key(struct bpf_map *map,
list_for_each_entry(op, &priv->ops_list, list) {
switch (def.type) {
case BPF_MAP_TYPE_ARRAY:
+ case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
switch (op->key_type) {
case BPF_MAP_KEY_ALL:
err = foreach_key_array_all(func, arg, name,
@@ -1115,6 +1195,60 @@ apply_config_value_for_key(int map_fd, void *pkey,
}
static int
+apply_config_evsel_for_key(const char *name, int map_fd, void *pkey,
+ struct perf_evsel *evsel)
+{
+ struct xyarray *xy = evsel->fd;
+ struct perf_event_attr *attr;
+ unsigned int key, events;
+ bool check_pass = false;
+ int *evt_fd;
+ int err;
+
+ if (!xy) {
+ pr_debug("ERROR: evsel not ready for map %s\n", name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ if (xy->row_size / xy->entry_size != 1) {
+ pr_debug("ERROR: Dimension of target event is incorrect for map %s\n",
+ name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM;
+ }
+
+ attr = &evsel->attr;
+ if (attr->inherit) {
+ pr_debug("ERROR: Can't put inherit event into map %s\n", name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH;
+ }
+
+ if (attr->type == PERF_TYPE_RAW)
+ check_pass = true;
+ if (attr->type == PERF_TYPE_HARDWARE)
+ check_pass = true;
+ if (attr->type == PERF_TYPE_SOFTWARE &&
+ attr->config == PERF_COUNT_SW_BPF_OUTPUT)
+ check_pass = true;
+ if (!check_pass) {
+ pr_debug("ERROR: Event type is wrong for map %s\n", name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE;
+ }
+
+ events = xy->entries / (xy->row_size / xy->entry_size);
+ key = *((unsigned int *)pkey);
+ if (key >= events) {
+ pr_debug("ERROR: there is no event %d for map %s\n",
+ key, name);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_MAPSIZE;
+ }
+ evt_fd = xyarray__entry(xy, key, 0);
+ err = bpf_map_update_elem(map_fd, pkey, evt_fd, BPF_ANY);
+ if (err && errno)
+ err = -errno;
+ return err;
+}
+
+static int
apply_obj_config_map_for_key(const char *name, int map_fd,
struct bpf_map_def *pdef __maybe_unused,
struct bpf_map_op *op,
@@ -1128,6 +1262,10 @@ apply_obj_config_map_for_key(const char *name, int map_fd,
pdef->value_size,
op->v.value);
break;
+ case BPF_MAP_OP_SET_EVSEL:
+ err = apply_config_evsel_for_key(name, map_fd, pkey,
+ op->v.evsel);
+ break;
default:
pr_debug("ERROR: unknown value type for '%s'\n", name);
err = -BPF_LOADER_ERRNO__INTERNAL;
@@ -1193,6 +1331,11 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(OBJCONF_MAP_TYPE)] = "Incorrect map type",
[ERRCODE_OFFSET(OBJCONF_MAP_KEYSIZE)] = "Incorrect map key size",
[ERRCODE_OFFSET(OBJCONF_MAP_VALUESIZE)] = "Incorrect map value size",
+ [ERRCODE_OFFSET(OBJCONF_MAP_NOEVT)] = "Event not found for map setting",
+ [ERRCODE_OFFSET(OBJCONF_MAP_MAPSIZE)] = "Invalid map size for event setting",
+ [ERRCODE_OFFSET(OBJCONF_MAP_EVTDIM)] = "Event dimension too large",
+ [ERRCODE_OFFSET(OBJCONF_MAP_EVTINH)] = "Doesn't support inherit event",
+ [ERRCODE_OFFSET(OBJCONF_MAP_EVTTYPE)] = "Wrong event type for map",
};
static int
@@ -1329,6 +1472,12 @@ int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
int bpf__strerror_apply_obj_config(int err, char *buf, size_t size)
{
bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM,
+ "Cannot set event to BPF map in multi-thread tracing");
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH,
+ "%s (Hint: use -i to turn off inherit)", emsg);
+ bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE,
+ "Can only put raw, hardware and BPF output event into a BPF map");
bpf__strerror_end(buf, size);
return 0;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 5d3b931..7c7689f 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -33,6 +33,11 @@ enum bpf_loader_errno {
BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE, /* Incorrect map type */
BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE, /* Incorrect map key size */
BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE,/* Incorrect map value size */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_NOEVT, /* Event not found for map setting */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_MAPSIZE, /* Invalid map size for event setting */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM, /* Event dimension too large */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH, /* Doesn't support inherit event */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE, /* Wrong event type for map */
__BPF_LOADER_ERRNO__END,
};
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 0f57716..c42e196 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1741,3 +1741,19 @@ void perf_evlist__set_tracking_event(struct perf_evlist *evlist,
tracking_evsel->tracking = true;
}
+
+struct perf_evsel *
+perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
+ const char *str)
+{
+ struct perf_evsel *evsel;
+
+ evlist__for_each(evlist, evsel) {
+ if (!evsel->name)
+ continue;
+ if (strcmp(str, evsel->name) == 0)
+ return evsel;
+ }
+
+ return NULL;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 7c4d9a2..a0d1522 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -294,4 +294,7 @@ void perf_evlist__set_tracking_event(struct perf_evlist *evlist,
struct perf_evsel *tracking_evsel);
void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr);
+
+struct perf_evsel *
+perf_evlist__find_evsel_by_str(struct perf_evlist *evlist, const char *str);
#endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a5dd670..5909fd2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -697,14 +697,16 @@ parse_events_config_bpf(struct parse_events_evlist *data,
return -EINVAL;
}
- err = bpf__config_obj(obj, term, NULL, &error_pos);
+ err = bpf__config_obj(obj, term, data->evlist, &error_pos);
if (err) {
- bpf__strerror_config_obj(obj, term, NULL,
+ bpf__strerror_config_obj(obj, term, data->evlist,
&error_pos, err, errbuf,
sizeof(errbuf));
data->error->help = strdup(
-"Hint:\tValid config term:\n"
+"Hint:\tValid config terms:\n"
" \tmap:[<arraymap>].value=[value]\n"
+" \tmap:[<eventmap>].event=[event]\n"
+"\n"
" \t(add -v to see detail)");
data->error->str = strdup(errbuf);
if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
@@ -1530,9 +1532,10 @@ int parse_events(struct perf_evlist *evlist, const char *str,
struct parse_events_error *err)
{
struct parse_events_evlist data = {
- .list = LIST_HEAD_INIT(data.list),
- .idx = evlist->nr_entries,
- .error = err,
+ .list = LIST_HEAD_INIT(data.list),
+ .idx = evlist->nr_entries,
+ .error = err,
+ .evlist = evlist,
};
int ret;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index c48377a..e036969 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -99,6 +99,7 @@ struct parse_events_evlist {
int idx;
int nr_groups;
struct parse_events_error *error;
+ struct perf_evlist *evlist;
};
struct parse_events_terms {
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf tools: Support setting different slots in a BPF map separately
2016-02-22 9:10 ` [PATCH 07/48] perf tools: Support setting different slots in a BPF map separately Wang Nan
@ 2016-02-25 5:40 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:40 UTC (permalink / raw)
To: linux-tip-commits
Cc: kirr, acme, brendan.d.gregg, ast, hpa, tglx, dev, peterz,
wangnan0, hekuang, namhyung, linux-kernel, lizefan,
masami.hiramatsu.pt, jolsa, mingo, adrian.hunter,
jeremie.galarneau
Commit-ID: 2d055bf253c0d606c5de3fe7749e3188080780ad
Gitweb: http://git.kernel.org/tip/2d055bf253c0d606c5de3fe7749e3188080780ad
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:34 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 12:48:50 -0300
perf tools: Support setting different slots in a BPF map separately
This patch introduces basic facilities to support config different slots
in a BPF map one by one.
array.nr_ranges and array.ranges are introduced into 'struct
parse_events_term', where ranges is an array of indices range (start,
length) which will be configured by this config term. nr_ranges is the
size of the array. The array is passed to 'struct bpf_map_priv'. To
indicate the new type of configuration, BPF_MAP_KEY_RANGES is added as a
new key type. bpf_map_config_foreach_key() is extended to iterate over
those indices instead of all possible keys.
Code in this commit will be enabled by following commit which enables
the indices syntax for array configuration.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/bpf-loader.c | 128 ++++++++++++++++++++++++++++++++++++++---
tools/perf/util/bpf-loader.h | 1 +
tools/perf/util/parse-events.c | 7 +++
tools/perf/util/parse-events.h | 10 ++++
4 files changed, 137 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index deacb95..44824e3 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -17,6 +17,7 @@
#include "llvm-utils.h"
#include "probe-event.h"
#include "probe-finder.h" // for MAX_PROBES
+#include "parse-events.h"
#include "llvm-utils.h"
#define DEFINE_PRINT_FN(name, level) \
@@ -747,6 +748,7 @@ enum bpf_map_op_type {
enum bpf_map_key_type {
BPF_MAP_KEY_ALL,
+ BPF_MAP_KEY_RANGES,
};
struct bpf_map_op {
@@ -754,6 +756,9 @@ struct bpf_map_op {
enum bpf_map_op_type op_type;
enum bpf_map_key_type key_type;
union {
+ struct parse_events_array array;
+ } k;
+ union {
u64 value;
struct perf_evsel *evsel;
} v;
@@ -768,6 +773,8 @@ bpf_map_op__delete(struct bpf_map_op *op)
{
if (!list_empty(&op->list))
list_del(&op->list);
+ if (op->key_type == BPF_MAP_KEY_RANGES)
+ parse_events__clear_array(&op->k.array);
free(op);
}
@@ -792,10 +799,33 @@ bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
free(priv);
}
+static int
+bpf_map_op_setkey(struct bpf_map_op *op, struct parse_events_term *term)
+{
+ op->key_type = BPF_MAP_KEY_ALL;
+ if (!term)
+ return 0;
+
+ if (term->array.nr_ranges) {
+ size_t memsz = term->array.nr_ranges *
+ sizeof(op->k.array.ranges[0]);
+
+ op->k.array.ranges = memdup(term->array.ranges, memsz);
+ if (!op->k.array.ranges) {
+ pr_debug("No enough memory to alloc indices for map\n");
+ return -ENOMEM;
+ }
+ op->key_type = BPF_MAP_KEY_RANGES;
+ op->k.array.nr_ranges = term->array.nr_ranges;
+ }
+ return 0;
+}
+
static struct bpf_map_op *
-bpf_map_op__new(void)
+bpf_map_op__new(struct parse_events_term *term)
{
struct bpf_map_op *op;
+ int err;
op = zalloc(sizeof(*op));
if (!op) {
@@ -804,7 +834,11 @@ bpf_map_op__new(void)
}
INIT_LIST_HEAD(&op->list);
- op->key_type = BPF_MAP_KEY_ALL;
+ err = bpf_map_op_setkey(op, term);
+ if (err) {
+ free(op);
+ return ERR_PTR(err);
+ }
return op;
}
@@ -841,12 +875,12 @@ bpf_map__add_op(struct bpf_map *map, struct bpf_map_op *op)
}
static struct bpf_map_op *
-bpf_map__add_newop(struct bpf_map *map)
+bpf_map__add_newop(struct bpf_map *map, struct parse_events_term *term)
{
struct bpf_map_op *op;
int err;
- op = bpf_map_op__new();
+ op = bpf_map_op__new(term);
if (IS_ERR(op))
return op;
@@ -896,7 +930,7 @@ __bpf_map__config_value(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
}
- op = bpf_map__add_newop(map);
+ op = bpf_map__add_newop(map, term);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_VALUE;
@@ -958,7 +992,7 @@ __bpf_map__config_event(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
}
- op = bpf_map__add_newop(map);
+ op = bpf_map__add_newop(map, term);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_EVSEL;
@@ -996,6 +1030,44 @@ struct bpf_obj_config__map_func bpf_obj_config__map_funcs[] = {
};
static int
+config_map_indices_range_check(struct parse_events_term *term,
+ struct bpf_map *map,
+ const char *map_name)
+{
+ struct parse_events_array *array = &term->array;
+ struct bpf_map_def def;
+ unsigned int i;
+ int err;
+
+ if (!array->nr_ranges)
+ return 0;
+ if (!array->ranges) {
+ pr_debug("ERROR: map %s: array->nr_ranges is %d but range array is NULL\n",
+ map_name, (int)array->nr_ranges);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ err = bpf_map__get_def(map, &def);
+ if (err) {
+ pr_debug("ERROR: Unable to get map definition from '%s'\n",
+ map_name);
+ return -BPF_LOADER_ERRNO__INTERNAL;
+ }
+
+ for (i = 0; i < array->nr_ranges; i++) {
+ unsigned int start = array->ranges[i].start;
+ size_t length = array->ranges[i].length;
+ unsigned int idx = start + length - 1;
+
+ if (idx >= def.max_entries) {
+ pr_debug("ERROR: index %d too large\n", idx);
+ return -BPF_LOADER_ERRNO__OBJCONF_MAP_IDX2BIG;
+ }
+ }
+ return 0;
+}
+
+static int
bpf__obj_config_map(struct bpf_object *obj,
struct parse_events_term *term,
struct perf_evlist *evlist,
@@ -1030,7 +1102,12 @@ bpf__obj_config_map(struct bpf_object *obj,
goto out;
}
- *key_scan_pos += map_opt - map_name;
+ *key_scan_pos += strlen(map_opt);
+ err = config_map_indices_range_check(term, map, map_name);
+ if (err)
+ goto out;
+ *key_scan_pos -= strlen(map_opt);
+
for (i = 0; i < ARRAY_SIZE(bpf_obj_config__map_funcs); i++) {
struct bpf_obj_config__map_func *func =
&bpf_obj_config__map_funcs[i];
@@ -1100,6 +1177,33 @@ foreach_key_array_all(map_config_func_t func,
}
static int
+foreach_key_array_ranges(map_config_func_t func, void *arg,
+ const char *name, int map_fd,
+ struct bpf_map_def *pdef,
+ struct bpf_map_op *op)
+{
+ unsigned int i, j;
+ int err;
+
+ for (i = 0; i < op->k.array.nr_ranges; i++) {
+ unsigned int start = op->k.array.ranges[i].start;
+ size_t length = op->k.array.ranges[i].length;
+
+ for (j = 0; j < length; j++) {
+ unsigned int idx = start + j;
+
+ err = func(name, map_fd, pdef, op, &idx, arg);
+ if (err) {
+ pr_debug("ERROR: failed to insert value to %s[%u]\n",
+ name, idx);
+ return err;
+ }
+ }
+ }
+ return 0;
+}
+
+static int
bpf_map_config_foreach_key(struct bpf_map *map,
map_config_func_t func,
void *arg)
@@ -1141,14 +1245,19 @@ bpf_map_config_foreach_key(struct bpf_map *map,
case BPF_MAP_KEY_ALL:
err = foreach_key_array_all(func, arg, name,
map_fd, &def, op);
- if (err)
- return err;
+ break;
+ case BPF_MAP_KEY_RANGES:
+ err = foreach_key_array_ranges(func, arg, name,
+ map_fd, &def,
+ op);
break;
default:
pr_debug("ERROR: keytype for map '%s' invalid\n",
name);
return -BPF_LOADER_ERRNO__INTERNAL;
}
+ if (err)
+ return err;
break;
default:
pr_debug("ERROR: type of '%s' incorrect\n", name);
@@ -1336,6 +1445,7 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(OBJCONF_MAP_EVTDIM)] = "Event dimension too large",
[ERRCODE_OFFSET(OBJCONF_MAP_EVTINH)] = "Doesn't support inherit event",
[ERRCODE_OFFSET(OBJCONF_MAP_EVTTYPE)] = "Wrong event type for map",
+ [ERRCODE_OFFSET(OBJCONF_MAP_IDX2BIG)] = "Index too large",
};
static int
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 7c7689f..be43119 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -38,6 +38,7 @@ enum bpf_loader_errno {
BPF_LOADER_ERRNO__OBJCONF_MAP_EVTDIM, /* Event dimension too large */
BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH, /* Doesn't support inherit event */
BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE, /* Wrong event type for map */
+ BPF_LOADER_ERRNO__OBJCONF_MAP_IDX2BIG, /* Index too large */
__BPF_LOADER_ERRNO__END,
};
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5909fd2..697d350 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2211,6 +2211,8 @@ void parse_events_terms__purge(struct list_head *terms)
struct parse_events_term *term, *h;
list_for_each_entry_safe(term, h, terms, list) {
+ if (term->array.nr_ranges)
+ free(term->array.ranges);
list_del_init(&term->list);
free(term);
}
@@ -2224,6 +2226,11 @@ void parse_events_terms__delete(struct list_head *terms)
free(terms);
}
+void parse_events__clear_array(struct parse_events_array *a)
+{
+ free(a->ranges);
+}
+
void parse_events_evlist_error(struct parse_events_evlist *data,
int idx, const char *str)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e036969..e445622 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -72,8 +72,17 @@ enum {
__PARSE_EVENTS__TERM_TYPE_NR,
};
+struct parse_events_array {
+ size_t nr_ranges;
+ struct {
+ unsigned int start;
+ size_t length;
+ } *ranges;
+};
+
struct parse_events_term {
char *config;
+ struct parse_events_array array;
union {
char *str;
u64 num;
@@ -120,6 +129,7 @@ int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term);
void parse_events_terms__delete(struct list_head *terms);
void parse_events_terms__purge(struct list_head *terms);
+void parse_events__clear_array(struct parse_events_array *a);
int parse_events__modifier_event(struct list_head *list, char *str, bool add);
int parse_events__modifier_group(struct list_head *list, char *event_mod);
int parse_events_name(struct list_head *list, char *name);
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf tools: Enable indices setting syntax for BPF map
2016-02-22 9:10 ` [PATCH 08/48] perf tools: Enable indices setting syntax for BPF map Wang Nan
@ 2016-02-25 5:40 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:40 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, jeremie.galarneau, namhyung, kirr, brendan.d.gregg, peterz,
tglx, wangnan0, jolsa, ast, adrian.hunter, lizefan, linux-kernel,
dev, mingo, hekuang, acme, masami.hiramatsu.pt
Commit-ID: e571e029bdbf59f485fe67740b7a4ef421e1d55d
Gitweb: http://git.kernel.org/tip/e571e029bdbf59f485fe67740b7a4ef421e1d55d
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:35 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 12:59:49 -0300
perf tools: Enable indices setting syntax for BPF map
This patch introduces a new syntax to perf event parser:
# perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
By utilizing the basic facilities in bpf-loader.c which allow setting
different slots in a BPF map separately, the newly introduced syntax
allows perf to control specific elements in a BPF map.
Test result:
# cat ./test_bpf_map_3.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(unsigned char),
.max_entries = 100,
};
SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
int func(void *ctx, int err, long nsec)
{
char fmt[] = "%ld\n";
long usec = nsec * 0x10624dd3 >> 38; // nsec / 1000
int key = (int)usec;
unsigned char *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), (unsigned char)*pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Normal case:
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
usleep-655 [006] d... 2745434.122814: : 102
usleep-904 [006] d... 2745439.916264: : 103
# ./perf record -e './test_bpf_map_3.c/map:channel.value[all]=104/' usleep 99
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
usleep-655 [006] d... 2745434.122814: : 102
usleep-904 [006] d... 2745439.916264: : 103
usleep-1537 [003] d... 2745538.053737: : 104
Error case:
# ./perf record -e './test_bpf_map_3.c/map:channel.value[10...1000]=104/' usleep 99
event syntax error: '..annel.value[10...1000]=104/'
\___ Index too large
Hint: Valid config terms:
map:[<arraymap>].value<indices>=[value]
map:[<eventmap>].event<indices>=[event]
where <indices> is something like [0,3...5] or [all]
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-9-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/parse-events.c | 5 ++-
tools/perf/util/parse-events.l | 13 ++++++-
tools/perf/util/parse-events.y | 85 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 100 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 697d350..6e2f203 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -704,9 +704,10 @@ parse_events_config_bpf(struct parse_events_evlist *data,
sizeof(errbuf));
data->error->help = strdup(
"Hint:\tValid config terms:\n"
-" \tmap:[<arraymap>].value=[value]\n"
-" \tmap:[<eventmap>].event=[event]\n"
+" \tmap:[<arraymap>].value<indices>=[value]\n"
+" \tmap:[<eventmap>].event<indices>=[event]\n"
"\n"
+" \twhere <indices> is something like [0,3...5] or [all]\n"
" \t(add -v to see detail)");
data->error->str = strdup(errbuf);
if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 0cc6b84..fb85d03 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -9,8 +9,8 @@
%{
#include <errno.h>
#include "../perf.h"
-#include "parse-events-bison.h"
#include "parse-events.h"
+#include "parse-events-bison.h"
char *parse_events_get_text(yyscan_t yyscanner);
YYSTYPE *parse_events_get_lval(yyscan_t yyscanner);
@@ -111,6 +111,7 @@ do { \
%x mem
%s config
%x event
+%x array
group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
@@ -176,6 +177,14 @@ modifier_bp [rwx]{1,3}
}
+<array>{
+"]" { BEGIN(config); return ']'; }
+{num_dec} { return value(yyscanner, 10); }
+{num_hex} { return value(yyscanner, 16); }
+, { return ','; }
+"\.\.\." { return PE_ARRAY_RANGE; }
+}
+
<config>{
/*
* Please update config_term_names when new static term is added.
@@ -195,6 +204,8 @@ no-inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
+\[all\] { return PE_ARRAY_ALL; }
+"[" { BEGIN(array); return '['; }
}
<mem>{
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 0e2d433..d1fbcab 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -48,6 +48,7 @@ static inc_group_count(struct list_head *list,
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
%token PE_ERROR
%token PE_PMU_EVENT_PRE PE_PMU_EVENT_SUF PE_KERNEL_PMU_EVENT
+%token PE_ARRAY_ALL PE_ARRAY_RANGE
%type <num> PE_VALUE
%type <num> PE_VALUE_SYM_HW
%type <num> PE_VALUE_SYM_SW
@@ -83,6 +84,9 @@ static inc_group_count(struct list_head *list,
%type <head> group_def
%type <head> group
%type <head> groups
+%type <array> array
+%type <array> array_term
+%type <array> array_terms
%union
{
@@ -94,6 +98,7 @@ static inc_group_count(struct list_head *list,
char *sys;
char *event;
} tracepoint_name;
+ struct parse_events_array array;
}
%%
@@ -572,6 +577,86 @@ PE_TERM
ABORT_ON(parse_events_term__num(&term, (int)$1, NULL, 1, &@1, NULL));
$$ = term;
}
+|
+PE_NAME array '=' PE_NAME
+{
+ struct parse_events_term *term;
+ int i;
+
+ ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_USER,
+ $1, $4, &@1, &@4));
+
+ term->array = $2;
+ $$ = term;
+}
+|
+PE_NAME array '=' PE_VALUE
+{
+ struct parse_events_term *term;
+
+ ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER,
+ $1, $4, &@1, &@4));
+ term->array = $2;
+ $$ = term;
+}
+
+array:
+'[' array_terms ']'
+{
+ $$ = $2;
+}
+|
+PE_ARRAY_ALL
+{
+ $$.nr_ranges = 0;
+ $$.ranges = NULL;
+}
+
+array_terms:
+array_terms ',' array_term
+{
+ struct parse_events_array new_array;
+
+ new_array.nr_ranges = $1.nr_ranges + $3.nr_ranges;
+ new_array.ranges = malloc(sizeof(new_array.ranges[0]) *
+ new_array.nr_ranges);
+ ABORT_ON(!new_array.ranges);
+ memcpy(&new_array.ranges[0], $1.ranges,
+ $1.nr_ranges * sizeof(new_array.ranges[0]));
+ memcpy(&new_array.ranges[$1.nr_ranges], $3.ranges,
+ $3.nr_ranges * sizeof(new_array.ranges[0]));
+ free($1.ranges);
+ free($3.ranges);
+ $$ = new_array;
+}
+|
+array_term
+
+array_term:
+PE_VALUE
+{
+ struct parse_events_array array;
+
+ array.nr_ranges = 1;
+ array.ranges = malloc(sizeof(array.ranges[0]));
+ ABORT_ON(!array.ranges);
+ array.ranges[0].start = $1;
+ array.ranges[0].length = 1;
+ $$ = array;
+}
+|
+PE_VALUE PE_ARRAY_RANGE PE_VALUE
+{
+ struct parse_events_array array;
+
+ ABORT_ON($3 < $1);
+ array.nr_ranges = 1;
+ array.ranges = malloc(sizeof(array.ranges[0]));
+ ABORT_ON(!array.ranges);
+ array.ranges[0].start = $1;
+ array.ranges[0].length = $3 - $1 + 1;
+ $$ = array;
+}
sep_dc: ':' |
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf tools: Apply tracepoint event definition options to BPF script
2016-02-22 9:10 ` [PATCH 09/48] perf tools: Pass tracepoint options to BPF script Wang Nan
@ 2016-02-25 5:41 ` tip-bot for Wang Nan
0 siblings, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:41 UTC (permalink / raw)
To: linux-tip-commits
Cc: brendan.d.gregg, jolsa, wangnan0, mingo, namhyung,
jeremie.galarneau, kirr, peterz, masami.hiramatsu.pt, ast,
linux-kernel, dev, hpa, acme, lizefan, hekuang, tglx,
adrian.hunter
Commit-ID: 95088a591e197610bd03f4059f5fdbe9e376425b
Gitweb: http://git.kernel.org/tip/95088a591e197610bd03f4059f5fdbe9e376425b
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:36 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 13:02:44 -0300
perf tools: Apply tracepoint event definition options to BPF script
Users can pass options to tracepoints defined in the BPF script. For
example:
# perf record -e ./test.c/no-inherit/ bash
# dd if=/dev/zero of=/dev/null count=10000
# exit
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.022 MB perf.data (139 samples) ]
(no-inherit works, only the sys_read issued by bash are captured, at
least 10000 sys_read issued by dd are skipped.)
test.c:
#define SEC(NAME) __attribute__((section(NAME), used))
SEC("func=sys_read")
int bpf_func__sys_read(void *ctx)
{
return 1;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
no-inherit is applied to the kprobe event defined in test.c.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-10-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/tests/bpf.c | 2 +-
tools/perf/util/parse-events.c | 56 +++++++++++++++++++++++++++++++++++++-----
tools/perf/util/parse-events.h | 3 ++-
3 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
index 4aed5cb..199501c 100644
--- a/tools/perf/tests/bpf.c
+++ b/tools/perf/tests/bpf.c
@@ -112,7 +112,7 @@ static int do_test(struct bpf_object *obj, int (*func)(void),
parse_evlist.error = &parse_error;
INIT_LIST_HEAD(&parse_evlist.list);
- err = parse_events_load_bpf_obj(&parse_evlist, &parse_evlist.list, obj);
+ err = parse_events_load_bpf_obj(&parse_evlist, &parse_evlist.list, obj, NULL);
if (err || list_empty(&parse_evlist.list)) {
pr_debug("Failed to add events selected by BPF\n");
return TEST_FAIL;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6e2f203..4c19d5e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -581,6 +581,7 @@ static int add_tracepoint_multi_sys(struct list_head *list, int *idx,
struct __add_bpf_event_param {
struct parse_events_evlist *data;
struct list_head *list;
+ struct list_head *head_config;
};
static int add_bpf_event(struct probe_trace_event *tev, int fd,
@@ -597,7 +598,8 @@ static int add_bpf_event(struct probe_trace_event *tev, int fd,
tev->group, tev->event, fd);
err = parse_events_add_tracepoint(&new_evsels, &evlist->idx, tev->group,
- tev->event, evlist->error, NULL);
+ tev->event, evlist->error,
+ param->head_config);
if (err) {
struct perf_evsel *evsel, *tmp;
@@ -622,11 +624,12 @@ static int add_bpf_event(struct probe_trace_event *tev, int fd,
int parse_events_load_bpf_obj(struct parse_events_evlist *data,
struct list_head *list,
- struct bpf_object *obj)
+ struct bpf_object *obj,
+ struct list_head *head_config)
{
int err;
char errbuf[BUFSIZ];
- struct __add_bpf_event_param param = {data, list};
+ struct __add_bpf_event_param param = {data, list, head_config};
static bool registered_unprobe_atexit = false;
if (IS_ERR(obj) || !obj) {
@@ -720,14 +723,47 @@ parse_events_config_bpf(struct parse_events_evlist *data,
return 0;
}
+/*
+ * Split config terms:
+ * perf record -e bpf.c/call-graph=fp,map:array.value[0]=1/ ...
+ * 'call-graph=fp' is 'evt config', should be applied to each
+ * events in bpf.c.
+ * 'map:array.value[0]=1' is 'obj config', should be processed
+ * with parse_events_config_bpf.
+ *
+ * Move object config terms from the first list to obj_head_config.
+ */
+static void
+split_bpf_config_terms(struct list_head *evt_head_config,
+ struct list_head *obj_head_config)
+{
+ struct parse_events_term *term, *temp;
+
+ /*
+ * Currectly, all possible user config term
+ * belong to bpf object. parse_events__is_hardcoded_term()
+ * happends to be a good flag.
+ *
+ * See parse_events_config_bpf() and
+ * config_term_tracepoint().
+ */
+ list_for_each_entry_safe(term, temp, evt_head_config, list)
+ if (!parse_events__is_hardcoded_term(term))
+ list_move_tail(&term->list, obj_head_config);
+}
+
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
char *bpf_file_name,
bool source,
struct list_head *head_config)
{
- struct bpf_object *obj;
int err;
+ struct bpf_object *obj;
+ LIST_HEAD(obj_head_config);
+
+ if (head_config)
+ split_bpf_config_terms(head_config, &obj_head_config);
obj = bpf__prepare_load(bpf_file_name, source);
if (IS_ERR(obj)) {
@@ -749,10 +785,18 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
return err;
}
- err = parse_events_load_bpf_obj(data, list, obj);
+ err = parse_events_load_bpf_obj(data, list, obj, head_config);
if (err)
return err;
- return parse_events_config_bpf(data, obj, head_config);
+ err = parse_events_config_bpf(data, obj, &obj_head_config);
+
+ /*
+ * Caller doesn't know anything about obj_head_config,
+ * so combine them together again before returnning.
+ */
+ if (head_config)
+ list_splice_tail(&obj_head_config, head_config);
+ return err;
}
static int
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e445622..67e4930 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -146,7 +146,8 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
struct bpf_object;
int parse_events_load_bpf_obj(struct parse_events_evlist *data,
struct list_head *list,
- struct bpf_object *obj);
+ struct bpf_object *obj,
+ struct list_head *head_config);
int parse_events_add_numeric(struct parse_events_evlist *data,
struct list_head *list,
u32 type, u64 config,
^ permalink raw reply related [flat|nested] 76+ messages in thread
* [tip:perf/core] perf tools: Introduce bpf-output event
2016-02-22 9:10 ` [PATCH 10/48] perf tools: Introduce bpf-output event Wang Nan
2016-02-23 17:45 ` Arnaldo Carvalho de Melo
@ 2016-02-25 5:41 ` tip-bot for Wang Nan
1 sibling, 0 replies; 76+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-25 5:41 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, brendan.d.gregg, masami.hiramatsu.pt, kirr, ast, jolsa,
hekuang, linux-kernel, lizefan, peterz, dev, wangnan0, acme, tglx,
namhyung, hpa, adrian.hunter, jeremie.galarneau
Commit-ID: 03e0a7df3efd959e40cd7ff40b1fabddc234ec5a
Gitweb: http://git.kernel.org/tip/03e0a7df3efd959e40cd7ff40b1fabddc234ec5a
Author: Wang Nan <wangnan0@huawei.com>
AuthorDate: Mon, 22 Feb 2016 09:10:37 +0000
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 22 Feb 2016 14:37:21 -0300
perf tools: Introduce bpf-output event
Commit a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
adds a helper to enable a BPF program to output data to a perf ring
buffer through a new type of perf event, PERF_COUNT_SW_BPF_OUTPUT. This
patch enables perf to create events of that type. Now a perf user can
use the following cmdline to receive output data from BPF programs:
# perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/ ls /
# perf script
perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
...
Test result:
# cat test_bpf_output.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
#define SEC(NAME) __attribute__((section(NAME), used))
static u64 (*ktime_get_ns)(void) =
(void *)BPF_FUNC_ktime_get_ns;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
(void *)BPF_FUNC_perf_event_output;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
SEC("func_write=sys_write")
int func_write(void *ctx)
{
struct {
u64 ktime;
int cpuid;
} __attribute__((packed)) output_data;
char error_data[] = "Error: failed to output: %d\n";
output_data.cpuid = get_smp_processor_id();
output_data.ktime = ktime_get_ns();
int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
&output_data, sizeof(output_data));
if (err)
trace_printk(error_data, sizeof(error_data), err);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************ END ***************************/
# perf record -a -e bpf-output/no-inherit,name=evt/ \
-e ./test_bpf_output.c/map:channel.event=evt/ ls /
# perf script | grep ls
ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-11-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/bpf-loader.c | 5 ++---
tools/perf/util/evsel.c | 5 +++++
tools/perf/util/evsel.h | 8 ++++++++
tools/perf/util/parse-events.l | 1 +
4 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 44824e3..0967ce6 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -1331,13 +1331,12 @@ apply_config_evsel_for_key(const char *name, int map_fd, void *pkey,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTINH;
}
+ if (perf_evsel__is_bpf_output(evsel))
+ check_pass = true;
if (attr->type == PERF_TYPE_RAW)
check_pass = true;
if (attr->type == PERF_TYPE_HARDWARE)
check_pass = true;
- if (attr->type == PERF_TYPE_SOFTWARE &&
- attr->config == PERF_COUNT_SW_BPF_OUTPUT)
- check_pass = true;
if (!check_pass) {
pr_debug("ERROR: Event type is wrong for map %s\n", name);
return -BPF_LOADER_ERRNO__OBJCONF_MAP_EVTTYPE;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 6ae20d0..0902fe4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -225,6 +225,11 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
if (evsel != NULL)
perf_evsel__init(evsel, attr, idx);
+ if (perf_evsel__is_bpf_output(evsel)) {
+ evsel->attr.sample_type |= PERF_SAMPLE_RAW;
+ evsel->attr.sample_period = 1;
+ }
+
return evsel;
}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8e75434..efad78f 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -364,6 +364,14 @@ static inline bool perf_evsel__is_function_event(struct perf_evsel *evsel)
#undef FUNCTION_EVENT
}
+static inline bool perf_evsel__is_bpf_output(struct perf_evsel *evsel)
+{
+ struct perf_event_attr *attr = &evsel->attr;
+
+ return (attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
+ (attr->type == PERF_TYPE_SOFTWARE);
+}
+
struct perf_attr_details {
bool freq;
bool verbose;
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index fb85d03..1477fbc 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -248,6 +248,7 @@ cpu-migrations|migrations { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COU
alignment-faults { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
emulation-faults { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
dummy { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
+bpf-output { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
/*
* We have to handle the kernel PMU event cycles-ct/cycles-t/mem-loads/mem-stores separately.
^ permalink raw reply related [flat|nested] 76+ messages in thread
end of thread, other threads:[~2016-02-25 6:29 UTC | newest]
Thread overview: 76+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-22 9:10 [PATCH 00/48] perf tools: Bugfix, BPF improvements and overwrite ring buffer support Wang Nan
2016-02-22 9:10 ` [PATCH 01/48] perf tools: Record text offset in dso to calculate objdump address Wang Nan
2016-02-22 9:10 ` [PATCH 02/48] perf tools: Adjust symbol for shared objects Wang Nan
2016-02-22 9:10 ` [PATCH 03/48] perf bpf: Add API to set values to map entries in a bpf object Wang Nan
2016-02-25 5:39 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 04/48] perf tools: Enable BPF object configure syntax Wang Nan
2016-02-25 5:39 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 05/48] perf record: Apply config to BPF objects before recording Wang Nan
2016-02-25 5:39 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 06/48] perf tools: Enable passing event to BPF object Wang Nan
2016-02-25 5:40 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 07/48] perf tools: Support setting different slots in a BPF map separately Wang Nan
2016-02-25 5:40 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 08/48] perf tools: Enable indices setting syntax for BPF map Wang Nan
2016-02-25 5:40 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 09/48] perf tools: Pass tracepoint options to BPF script Wang Nan
2016-02-25 5:41 ` [tip:perf/core] perf tools: Apply tracepoint event definition " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 10/48] perf tools: Introduce bpf-output event Wang Nan
2016-02-23 17:45 ` Arnaldo Carvalho de Melo
2016-02-24 1:58 ` Wangnan (F)
2016-02-24 2:04 ` Wangnan (F)
2016-02-24 4:03 ` Wangnan (F)
2016-02-24 5:03 ` Wangnan (F)
2016-02-24 13:36 ` Arnaldo Carvalho de Melo
2016-02-25 5:41 ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-22 9:10 ` [PATCH 11/48] perf data: Support converting data from bpf_perf_event_output() Wang Nan
2016-02-23 16:14 ` Arnaldo Carvalho de Melo
2016-02-23 17:23 ` Jiri Olsa
2016-02-23 17:24 ` Jiri Olsa
2016-02-23 19:22 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 12/48] perf data: Explicitly set byte order for integer types Wang Nan
2016-02-22 9:10 ` [PATCH 13/48] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
2016-02-22 9:10 ` [PATCH 14/48] perf core: Set event's default overflow_handler Wang Nan
2016-02-22 9:10 ` [PATCH 15/48] perf core: Prepare writing into ring buffer from end Wang Nan
2016-02-22 9:10 ` [PATCH 16/48] perf core: Add backward attribute to perf event Wang Nan
2016-02-24 13:08 ` Jiri Olsa
2016-02-24 13:21 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 17/48] perf core: Reduce perf event output overhead by new overflow handler Wang Nan
2016-02-22 9:10 ` [PATCH 18/48] perf tools: Only validate is_pos for tracking evsels Wang Nan
2016-02-24 14:21 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 19/48] perf tools: Print write_backward value in perf_event_attr__fprintf Wang Nan
2016-02-22 9:10 ` [PATCH 20/48] perf tools: Make ordered_events reusable Wang Nan
2016-02-24 14:18 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 21/48] perf record: Extract synthesize code to record__synthesize() Wang Nan
2016-02-24 14:29 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 22/48] perf tools: Add perf_data_file__switch() helper Wang Nan
2016-02-24 14:34 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 23/48] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
2016-02-24 14:43 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 24/48] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
2016-02-22 9:10 ` [PATCH 25/48] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
2016-02-22 9:10 ` [PATCH 26/48] perf record: Split output into multiple files via '--switch-output' Wang Nan
2016-02-22 9:10 ` [PATCH 27/48] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
2016-02-22 9:10 ` [PATCH 28/48] perf record: Disable buildid cache options by default in switch output mode Wang Nan
2016-02-22 9:10 ` [PATCH 29/48] perf record: Re-synthesize tracking events after output switching Wang Nan
2016-02-24 14:57 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 30/48] perf record: Generate tracking events for process forked by perf Wang Nan
2016-02-24 15:01 ` Jiri Olsa
2016-02-22 9:10 ` [PATCH 31/48] perf record: Ensure return non-zero rc when mmap fail Wang Nan
2016-02-22 9:10 ` [PATCH 32/48] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
2016-02-22 9:11 ` [PATCH 33/48] perf tools: Add evlist channel helpers Wang Nan
2016-02-22 9:11 ` [PATCH 34/48] perf tools: Automatically add new channel according to evlist Wang Nan
2016-02-22 9:11 ` [PATCH 35/48] perf tools: Operate multiple channels Wang Nan
2016-02-22 9:11 ` [PATCH 36/48] perf tools: Squash overwrite setting into channel Wang Nan
2016-02-22 9:11 ` [PATCH 37/48] perf record: Don't read from and poll overwrite channel Wang Nan
2016-02-22 9:11 ` [PATCH 38/48] perf record: Don't poll on " Wang Nan
2016-02-22 9:11 ` [PATCH 39/48] perf tools: Detect avalibility of write_backward Wang Nan
2016-02-22 9:11 ` [PATCH 40/48] perf tools: Enable overwrite settings Wang Nan
2016-02-22 9:11 ` [PATCH 41/48] perf tools: Set write_backward attribut bit for overwrite events Wang Nan
2016-02-22 9:11 ` [PATCH 42/48] perf tools: Record fd into perf_mmap Wang Nan
2016-02-22 9:11 ` [PATCH 43/48] perf tools: Add API to pause a channel Wang Nan
2016-02-22 9:11 ` [PATCH 44/48] perf record: Toggle overwrite ring buffer for reading Wang Nan
2016-02-22 9:11 ` [PATCH 45/48] perf record: Rename variable to make code clear Wang Nan
2016-02-22 9:11 ` [PATCH 46/48] perf record: Read from backward ring buffer Wang Nan
2016-02-22 9:11 ` [PATCH 47/48] perf record: Allow generate tracking events at the end of output Wang Nan
2016-02-22 9:11 ` [PATCH 48/48] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox