* [PATCH 1/8] perf tests attr: Fix task term values
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 2/8] perf test attr: Fix python error on empty result Arnaldo Carvalho de Melo
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Jiri Olsa, David Ahern,
Namhyung Kim, Peter Zijlstra, Thomas-Mich Richter,
Arnaldo Carvalho de Melo
From: Jiri Olsa <jolsa@kernel.org>
The perf_event_attr::task is 1 by default for first (tracking) event in
the session. Setting task=1 as default and adding task=0 for cases that
need it.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20170703145030.12903-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/tests/attr/base-record | 2 +-
tools/perf/tests/attr/test-record-group | 1 +
tools/perf/tests/attr/test-record-group-sampling | 2 +-
tools/perf/tests/attr/test-record-group1 | 1 +
4 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/attr/base-record b/tools/perf/tests/attr/base-record
index 31e0b1da830b..37940665f736 100644
--- a/tools/perf/tests/attr/base-record
+++ b/tools/perf/tests/attr/base-record
@@ -23,7 +23,7 @@ comm=1
freq=1
inherit_stat=0
enable_on_exec=1
-task=0
+task=1
watermark=0
precise_ip=0|1|2|3
mmap_data=0
diff --git a/tools/perf/tests/attr/test-record-group b/tools/perf/tests/attr/test-record-group
index 6e7961f6f7a5..618ba1c17474 100644
--- a/tools/perf/tests/attr/test-record-group
+++ b/tools/perf/tests/attr/test-record-group
@@ -17,5 +17,6 @@ sample_type=327
read_format=4
mmap=0
comm=0
+task=0
enable_on_exec=0
disabled=0
diff --git a/tools/perf/tests/attr/test-record-group-sampling b/tools/perf/tests/attr/test-record-group-sampling
index ef59afd6d635..f906b793196f 100644
--- a/tools/perf/tests/attr/test-record-group-sampling
+++ b/tools/perf/tests/attr/test-record-group-sampling
@@ -23,7 +23,7 @@ sample_type=343
# PERF_FORMAT_ID | PERF_FORMAT_GROUP
read_format=12
-
+task=0
mmap=0
comm=0
enable_on_exec=0
diff --git a/tools/perf/tests/attr/test-record-group1 b/tools/perf/tests/attr/test-record-group1
index 87a222d014d8..48e8bd12fe46 100644
--- a/tools/perf/tests/attr/test-record-group1
+++ b/tools/perf/tests/attr/test-record-group1
@@ -18,5 +18,6 @@ sample_type=327
read_format=4
mmap=0
comm=0
+task=0
enable_on_exec=0
disabled=0
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/8] perf test attr: Fix python error on empty result
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 1/8] perf tests attr: Fix task term values Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 3/8] perf test attr: Fix ignored test case result Arnaldo Carvalho de Melo
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
Hendrik Brueckner, Martin Schwidefsky, Arnaldo Carvalho de Melo
From: Thomas Richter <tmricht@linux.vnet.ibm.com>
Commit d78ada4a767 ("perf tests attr: Do not store failed events") does
not create an event file in the /tmp directory when the
perf_open_event() system call failed.
This can lead to a situation where not /tmp/event-xx-yy-zz result file
exists at all (for example on a s390x virtual machine environment) where
no CPUMF hardware is available.
The following command then fails with a python call back chain instead
of printing failure:
[root@s8360046 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \
-p ./perf -v -ttest-stat-basic
running './tests/attr//test-stat-basic'
Traceback (most recent call last):
File "./tests/attr.py", line 379, in <module>
main()
File "./tests/attr.py", line 370, in main
run_tests(options)
File "./tests/attr.py", line 311, in run_tests
Test(f, options).run()
File "./tests/attr.py", line 300, in run
self.compare(self.expect, self.result)
File "./tests/attr.py", line 248, in compare
exp_event.diff(res_event)
UnboundLocalError: local variable 'res_event' referenced before assignment
[root@s8360046 perf]#
This patch catches this pitfall and prints an error message instead:
[root@s8360047 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \
-p ./perf -vvv -ttest-stat-basic
running './tests/attr//test-stat-basic'
loading expected events
Event event:base-stat
fd = 1
group_fd = -1
flags = 0|8
[....]
sample_regs_user = 0
sample_stack_user = 0
'PERF_TEST_ATTR=/tmp/tmpJbMQMP ./perf stat -o /tmp/tmpJbMQMP/perf.data -e cycles kill >/dev/null 2>&1' ret '1', expected '1'
loading result events
compare
matching [event:base-stat]
match: [event:base-stat] matches []
res_event is empty
FAILED './tests/attr//test-stat-basic' - match failure
[root@s8360047 perf]#
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170913081209.39570-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-04d63nn7svfgxdhi60gq2mlm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/tests/attr.py | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/perf/tests/attr.py b/tools/perf/tests/attr.py
index 6bb50e82a3e3..a13cd780148e 100644
--- a/tools/perf/tests/attr.py
+++ b/tools/perf/tests/attr.py
@@ -237,6 +237,7 @@ class Test(object):
# events in result. Fail if there's not any.
for exp_name, exp_event in expect.items():
exp_list = []
+ res_event = {}
log.debug(" matching [%s]" % exp_name)
for res_name, res_event in result.items():
log.debug(" to [%s]" % res_name)
@@ -253,7 +254,10 @@ class Test(object):
if exp_event.optional():
log.debug(" %s does not match, but is optional" % exp_name)
else:
- exp_event.diff(res_event)
+ if not res_event:
+ log.debug(" res_event is empty");
+ else:
+ exp_event.diff(res_event)
raise Fail(self, 'match failure');
match[exp_name] = exp_list
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/8] perf test attr: Fix ignored test case result
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 1/8] perf tests attr: Fix task term values Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 2/8] perf test attr: Fix python error on empty result Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 4/8] perf tools: Lock to protect namespaces and comm list Arnaldo Carvalho de Melo
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
Hendrik Brueckner, Martin Schwidefsky, Arnaldo Carvalho de Melo
From: Thomas Richter <tmricht@linux.vnet.ibm.com>
Command perf test -v 16 (Setup struct perf_event_attr test) always
reports success even if the test case fails. It works correctly if you
also specify -F (for don't fork).
root@s35lp76 perf]# ./perf test -v 16
15: Setup struct perf_event_attr :
--- start ---
running './tests/attr/test-record-no-delay'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /tmp/tmp4E1h7R/perf.data
(1 samples) ]
expected task=0, got 1
expected precise_ip=0, got 3
expected wakeup_events=1, got 0
FAILED './tests/attr/test-record-no-delay' - match failure
test child finished with 0
---- end ----
Setup struct perf_event_attr: Ok
The reason for the wrong error reporting is the return value of the
system() library call. It is called in run_dir() file tests/attr.c and
returns the exit status, in above case 0xff00.
This value is given as parameter to the exit() function which can only
handle values 0-0xff.
The child process terminates with exit value of 0 and the parent does
not detect any error.
This patch corrects the error reporting and prints the correct test
result.
Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170913081209.39570-2-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-rdube6rfcjsr1nzue72c7lqn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/tests/attr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index c9aafed7da15..25ede4472465 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -166,7 +166,7 @@ static int run_dir(const char *d, const char *perf)
snprintf(cmd, 3*PATH_MAX, PYTHON " %s/attr.py -d %s/attr/ -p %s %.*s",
d, d, perf, vcnt, v);
- return system(cmd);
+ return system(cmd) ? TEST_FAIL : TEST_OK;
}
int test__attr(struct test *test __maybe_unused, int subtest __maybe_unused)
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/8] perf tools: Lock to protect namespaces and comm list
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (2 preceding siblings ...)
2017-10-03 12:55 ` [PATCH 3/8] perf test attr: Fix ignored test case result Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 5/8] perf tools: Lock to protect comm_str rb tree Arnaldo Carvalho de Melo
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Kan Liang, Adrian Hunter,
Alexei Starovoitov, Andi Kleen, He Kuang, Lukasz Odzioba,
Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo
From: Kan Liang <kan.liang@intel.com>
Add two locks to protect namespaces_list and comm_list.
The lock is only needed for multithreaded code, so using mutex wrappers
provided by perf tool.
Not all the comm_list/namespaces_list accessing are protected, e.g.
thread__exec_comm. Because the multithread code for perf top event
synthesizing does not touch them. They don't need a lock.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/thread.c | 53 +++++++++++++++++++++++++++++++++++++++++++-----
tools/perf/util/thread.h | 3 +++
2 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index c09bdb509d82..bf73117b4822 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -45,6 +45,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
thread->cpu = -1;
INIT_LIST_HEAD(&thread->namespaces_list);
INIT_LIST_HEAD(&thread->comm_list);
+ init_rwsem(&thread->namespaces_lock);
+ init_rwsem(&thread->comm_lock);
comm_str = malloc(32);
if (!comm_str)
@@ -83,18 +85,26 @@ void thread__delete(struct thread *thread)
map_groups__put(thread->mg);
thread->mg = NULL;
}
+ down_write(&thread->namespaces_lock);
list_for_each_entry_safe(namespaces, tmp_namespaces,
&thread->namespaces_list, list) {
list_del(&namespaces->list);
namespaces__free(namespaces);
}
+ up_write(&thread->namespaces_lock);
+
+ down_write(&thread->comm_lock);
list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
list_del(&comm->list);
comm__free(comm);
}
+ up_write(&thread->comm_lock);
+
unwind__finish_access(thread);
nsinfo__zput(thread->nsinfo);
+ exit_rwsem(&thread->namespaces_lock);
+ exit_rwsem(&thread->comm_lock);
free(thread);
}
@@ -125,8 +135,8 @@ struct namespaces *thread__namespaces(const struct thread *thread)
return list_first_entry(&thread->namespaces_list, struct namespaces, list);
}
-int thread__set_namespaces(struct thread *thread, u64 timestamp,
- struct namespaces_event *event)
+static int __thread__set_namespaces(struct thread *thread, u64 timestamp,
+ struct namespaces_event *event)
{
struct namespaces *new, *curr = thread__namespaces(thread);
@@ -149,6 +159,17 @@ int thread__set_namespaces(struct thread *thread, u64 timestamp,
return 0;
}
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+ struct namespaces_event *event)
+{
+ int ret;
+
+ down_write(&thread->namespaces_lock);
+ ret = __thread__set_namespaces(thread, timestamp, event);
+ up_write(&thread->namespaces_lock);
+ return ret;
+}
+
struct comm *thread__comm(const struct thread *thread)
{
if (list_empty(&thread->comm_list))
@@ -170,8 +191,8 @@ struct comm *thread__exec_comm(const struct thread *thread)
return last;
}
-int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
- bool exec)
+static int ____thread__set_comm(struct thread *thread, const char *str,
+ u64 timestamp, bool exec)
{
struct comm *new, *curr = thread__comm(thread);
@@ -195,6 +216,17 @@ int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
return 0;
}
+int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
+ bool exec)
+{
+ int ret;
+
+ down_write(&thread->comm_lock);
+ ret = ____thread__set_comm(thread, str, timestamp, exec);
+ up_write(&thread->comm_lock);
+ return ret;
+}
+
int thread__set_comm_from_proc(struct thread *thread)
{
char path[64];
@@ -212,7 +244,7 @@ int thread__set_comm_from_proc(struct thread *thread)
return err;
}
-const char *thread__comm_str(const struct thread *thread)
+static const char *__thread__comm_str(const struct thread *thread)
{
const struct comm *comm = thread__comm(thread);
@@ -222,6 +254,17 @@ const char *thread__comm_str(const struct thread *thread)
return comm__str(comm);
}
+const char *thread__comm_str(const struct thread *thread)
+{
+ const char *str;
+
+ down_read((struct rw_semaphore *)&thread->comm_lock);
+ str = __thread__comm_str(thread);
+ up_read((struct rw_semaphore *)&thread->comm_lock);
+
+ return str;
+}
+
/* CHECKME: it should probably better return the max comm len from its comm list */
int thread__comm_len(struct thread *thread)
{
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index cb1a5dd5c2b9..10555d6a0b86 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -9,6 +9,7 @@
#include "symbol.h"
#include <strlist.h>
#include <intlist.h>
+#include "rwsem.h"
struct thread_stack;
struct unwind_libunwind_ops;
@@ -29,7 +30,9 @@ struct thread {
int comm_len;
bool dead; /* if set thread has exited */
struct list_head namespaces_list;
+ struct rw_semaphore namespaces_lock;
struct list_head comm_list;
+ struct rw_semaphore comm_lock;
u64 db_id;
void *priv;
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/8] perf tools: Lock to protect comm_str rb tree
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (3 preceding siblings ...)
2017-10-03 12:55 ` [PATCH 4/8] perf tools: Lock to protect namespaces and comm list Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads Arnaldo Carvalho de Melo
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Kan Liang, Adrian Hunter,
Alexei Starovoitov, Andi Kleen, He Kuang, Lukasz Odzioba,
Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo
From: Kan Liang <kan.liang@intel.com>
Add comm_str_lock to protect comm_str rb tree.
The lock is only needed for multithreaded code, so using mutex wrappers
provided by perf tool.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/comm.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index 7bc981b6bf29..756a9c14efbb 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -5,6 +5,7 @@
#include <stdio.h>
#include <string.h>
#include <linux/refcount.h>
+#include "rwsem.h"
struct comm_str {
char *str;
@@ -14,6 +15,7 @@ struct comm_str {
/* Should perhaps be moved to struct machine */
static struct rb_root comm_str_root;
+static struct rw_semaphore comm_str_lock = {.lock = PTHREAD_RWLOCK_INITIALIZER,};
static struct comm_str *comm_str__get(struct comm_str *cs)
{
@@ -25,7 +27,9 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
static void comm_str__put(struct comm_str *cs)
{
if (cs && refcount_dec_and_test(&cs->refcnt)) {
+ down_write(&comm_str_lock);
rb_erase(&cs->rb_node, &comm_str_root);
+ up_write(&comm_str_lock);
zfree(&cs->str);
free(cs);
}
@@ -50,7 +54,8 @@ static struct comm_str *comm_str__alloc(const char *str)
return cs;
}
-static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
+static
+struct comm_str *__comm_str__findnew(const char *str, struct rb_root *root)
{
struct rb_node **p = &root->rb_node;
struct rb_node *parent = NULL;
@@ -81,6 +86,17 @@ static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
return new;
}
+static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
+{
+ struct comm_str *cs;
+
+ down_write(&comm_str_lock);
+ cs = __comm_str__findnew(str, root);
+ up_write(&comm_str_lock);
+
+ return cs;
+}
+
struct comm *comm__new(const char *str, u64 timestamp, bool exec)
{
struct comm *comm = zalloc(sizeof(*comm));
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (4 preceding siblings ...)
2017-10-03 12:55 ` [PATCH 5/8] perf tools: Lock to protect comm_str rb tree Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 17:37 ` Ingo Molnar
2017-10-03 12:55 ` [PATCH 7/8] perf top: Add option to set the number of thread for event synthesize Arnaldo Carvalho de Melo
` (2 subsequent siblings)
8 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Kan Liang, Adrian Hunter,
Alexei Starovoitov, Andi Kleen, He Kuang, Lukasz Odzioba,
Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo
From: Kan Liang <kan.liang@intel.com>
The proc files which is sorted with alphabetical order are evenly
assigned to several synthesize threads to be processed in parallel.
For 'perf top', the threads number hard code to online CPU number. The
following patch will introduce an option to set it.
For other perf tools, the thread number is 1. Because the process
function is not ready for multithreading, e.g.
process_synthesized_event.
This patch series only support event synthesize multithreading for 'perf
top'. For other tools, it can be done separately later.
With multithread applied, the total processing time can get up to 1.56x
speedup on Knights Mill for 'perf top'.
For specific single event processing, the processing time could increase
because of the lock contention. So proc_map_timeout may need to be
increased. Otherwise some proc maps will be truncated.
Based on my test, increasing the proc_map_timeout has small impact
on the total processing time. The total processing time still get 1.49x
speedup on Knights Mill after increasing the proc_map_timeout.
The patch itself doesn't increase the proc_map_timeout.
Doesn't need to implement multithreading for per task monitoring,
perf_event__synthesize_thread_map. It doesn't have performance issue.
Committer testing:
# getconf _NPROCESSORS_ONLN
4
# perf trace --no-inherit -e clone -o /tmp/output perf top
# tail -4 /tmp/bla
0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
#
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/builtin-kvm.c | 3 +-
tools/perf/builtin-record.c | 2 +-
tools/perf/builtin-top.c | 8 +-
tools/perf/builtin-trace.c | 2 +-
tools/perf/tests/mmap-thread-lookup.c | 2 +-
tools/perf/util/event.c | 160 +++++++++++++++++++++++++++-------
tools/perf/util/event.h | 3 +-
tools/perf/util/machine.c | 8 +-
tools/perf/util/machine.h | 9 +-
9 files changed, 155 insertions(+), 42 deletions(-)
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index c747a1af49fe..721f4f91291a 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1441,7 +1441,8 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
perf_session__set_id_hdr_size(kvm->session);
ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true);
machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target,
- kvm->evlist->threads, false, kvm->opts.proc_map_timeout);
+ kvm->evlist->threads, false,
+ kvm->opts.proc_map_timeout, 1);
err = kvm_live_open_events(kvm);
if (err)
goto out;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9b379f3a3d99..234fdf4734f6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -863,7 +863,7 @@ static int record__synthesize(struct record *rec, bool tail)
err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
process_synthesized_event, opts->sample_address,
- opts->proc_map_timeout);
+ opts->proc_map_timeout, 1);
out:
return err;
}
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ee954bde7e3e..bc31b93cc1d8 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -958,8 +958,14 @@ static int __cmd_top(struct perf_top *top)
if (perf_session__register_idle_thread(top->session) < 0)
goto out_delete;
+ perf_set_multithreaded();
+
machine__synthesize_threads(&top->session->machines.host, &opts->target,
- top->evlist->threads, false, opts->proc_map_timeout);
+ top->evlist->threads, false,
+ opts->proc_map_timeout,
+ (unsigned int)sysconf(_SC_NPROCESSORS_ONLN));
+
+ perf_set_singlethreaded();
if (perf_hpp_list.socket) {
ret = perf_env__read_cpu_topology_map(&perf_env);
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 967bd351b58d..afef6fe46c45 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1131,7 +1131,7 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
evlist->threads, trace__tool_process, false,
- trace->opts.proc_map_timeout);
+ trace->opts.proc_map_timeout, 1);
if (err)
symbol__exit();
diff --git a/tools/perf/tests/mmap-thread-lookup.c b/tools/perf/tests/mmap-thread-lookup.c
index f94a4196e7c9..2a0068afe3bf 100644
--- a/tools/perf/tests/mmap-thread-lookup.c
+++ b/tools/perf/tests/mmap-thread-lookup.c
@@ -131,7 +131,7 @@ static int synth_all(struct machine *machine)
{
return perf_event__synthesize_threads(NULL,
perf_event__process,
- machine, 0, 500);
+ machine, 0, 500, 1);
}
static int synth_process(struct machine *machine)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 10366b87d0b5..0e678dd6bdbe 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -678,23 +678,21 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
return err;
}
-int perf_event__synthesize_threads(struct perf_tool *tool,
- perf_event__handler_t process,
- struct machine *machine,
- bool mmap_data,
- unsigned int proc_map_timeout)
+static int __perf_event__synthesize_threads(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine,
+ bool mmap_data,
+ unsigned int proc_map_timeout,
+ struct dirent **dirent,
+ int start,
+ int num)
{
union perf_event *comm_event, *mmap_event, *fork_event;
union perf_event *namespaces_event;
- char proc_path[PATH_MAX];
- struct dirent **dirent;
int err = -1;
char *end;
pid_t pid;
- int n, i;
-
- if (machine__is_default_guest(machine))
- return 0;
+ int i;
comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
if (comm_event == NULL)
@@ -714,34 +712,25 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
if (namespaces_event == NULL)
goto out_free_fork;
- snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
- n = scandir(proc_path, &dirent, 0, alphasort);
-
- if (n < 0)
- goto out_free_namespaces;
-
- for (i = 0; i < n; i++) {
+ for (i = start; i < start + num; i++) {
if (!isdigit(dirent[i]->d_name[0]))
continue;
pid = (pid_t)strtol(dirent[i]->d_name, &end, 10);
/* only interested in proper numerical dirents */
- if (!*end) {
- /*
- * We may race with exiting thread, so don't stop just because
- * one thread couldn't be synthesized.
- */
- __event__synthesize_thread(comm_event, mmap_event, fork_event,
- namespaces_event, pid, 1, process,
- tool, machine, mmap_data,
- proc_map_timeout);
- }
- free(dirent[i]);
+ if (*end)
+ continue;
+ /*
+ * We may race with exiting thread, so don't stop just because
+ * one thread couldn't be synthesized.
+ */
+ __event__synthesize_thread(comm_event, mmap_event, fork_event,
+ namespaces_event, pid, 1, process,
+ tool, machine, mmap_data,
+ proc_map_timeout);
}
- free(dirent);
err = 0;
-out_free_namespaces:
free(namespaces_event);
out_free_fork:
free(fork_event);
@@ -753,6 +742,115 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
return err;
}
+struct synthesize_threads_arg {
+ struct perf_tool *tool;
+ perf_event__handler_t process;
+ struct machine *machine;
+ bool mmap_data;
+ unsigned int proc_map_timeout;
+ struct dirent **dirent;
+ int num;
+ int start;
+};
+
+static void *synthesize_threads_worker(void *arg)
+{
+ struct synthesize_threads_arg *args = arg;
+
+ __perf_event__synthesize_threads(args->tool, args->process,
+ args->machine, args->mmap_data,
+ args->proc_map_timeout, args->dirent,
+ args->start, args->num);
+ return NULL;
+}
+
+int perf_event__synthesize_threads(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine,
+ bool mmap_data,
+ unsigned int proc_map_timeout,
+ unsigned int nr_threads_synthesize)
+{
+ struct synthesize_threads_arg *args = NULL;
+ pthread_t *synthesize_threads = NULL;
+ char proc_path[PATH_MAX];
+ struct dirent **dirent;
+ int num_per_thread;
+ int m, n, i, j;
+ int thread_nr;
+ int base = 0;
+ int err = -1;
+
+
+ if (machine__is_default_guest(machine))
+ return 0;
+
+ snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
+ n = scandir(proc_path, &dirent, 0, alphasort);
+ if (n < 0)
+ return err;
+
+ thread_nr = nr_threads_synthesize;
+
+ if (thread_nr <= 1) {
+ err = __perf_event__synthesize_threads(tool, process,
+ machine, mmap_data,
+ proc_map_timeout,
+ dirent, base, n);
+ goto free_dirent;
+ }
+ if (thread_nr > n)
+ thread_nr = n;
+
+ synthesize_threads = calloc(sizeof(pthread_t), thread_nr);
+ if (synthesize_threads == NULL)
+ goto free_dirent;
+
+ args = calloc(sizeof(*args), thread_nr);
+ if (args == NULL)
+ goto free_threads;
+
+ num_per_thread = n / thread_nr;
+ m = n % thread_nr;
+ for (i = 0; i < thread_nr; i++) {
+ args[i].tool = tool;
+ args[i].process = process;
+ args[i].machine = machine;
+ args[i].mmap_data = mmap_data;
+ args[i].proc_map_timeout = proc_map_timeout;
+ args[i].dirent = dirent;
+ }
+ for (i = 0; i < m; i++) {
+ args[i].num = num_per_thread + 1;
+ args[i].start = i * args[i].num;
+ }
+ if (i != 0)
+ base = args[i-1].start + args[i-1].num;
+ for (j = i; j < thread_nr; j++) {
+ args[j].num = num_per_thread;
+ args[j].start = base + (j - i) * args[i].num;
+ }
+
+ for (i = 0; i < thread_nr; i++) {
+ if (pthread_create(&synthesize_threads[i], NULL,
+ synthesize_threads_worker, &args[i]))
+ goto out_join;
+ }
+ err = 0;
+out_join:
+ for (i = 0; i < thread_nr; i++)
+ pthread_join(synthesize_threads[i], NULL);
+ free(args);
+free_threads:
+ free(synthesize_threads);
+free_dirent:
+ for (i = 0; i < n; i++)
+ free(dirent[i]);
+ free(dirent);
+
+ return err;
+}
+
struct process_symbol_args {
const char *name;
u64 start;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index ee7bcc898d35..d6cbb0a0d919 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -680,7 +680,8 @@ int perf_event__synthesize_cpu_map(struct perf_tool *tool,
int perf_event__synthesize_threads(struct perf_tool *tool,
perf_event__handler_t process,
struct machine *machine, bool mmap_data,
- unsigned int proc_map_timeout);
+ unsigned int proc_map_timeout,
+ unsigned int nr_threads_synthesize);
int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
perf_event__handler_t process,
struct machine *machine);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 585b4a3d64a4..7c3aa479201a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2218,12 +2218,16 @@ int machines__for_each_thread(struct machines *machines,
int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
struct target *target, struct thread_map *threads,
perf_event__handler_t process, bool data_mmap,
- unsigned int proc_map_timeout)
+ unsigned int proc_map_timeout,
+ unsigned int nr_threads_synthesize)
{
if (target__has_task(target))
return perf_event__synthesize_thread_map(tool, threads, process, machine, data_mmap, proc_map_timeout);
else if (target__has_cpu(target))
- return perf_event__synthesize_threads(tool, process, machine, data_mmap, proc_map_timeout);
+ return perf_event__synthesize_threads(tool, process,
+ machine, data_mmap,
+ proc_map_timeout,
+ nr_threads_synthesize);
/* command specified */
return 0;
}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index b1cd516f2025..c6a299ea506c 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -257,15 +257,18 @@ int machines__for_each_thread(struct machines *machines,
int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
struct target *target, struct thread_map *threads,
perf_event__handler_t process, bool data_mmap,
- unsigned int proc_map_timeout);
+ unsigned int proc_map_timeout,
+ unsigned int nr_threads_synthesize);
static inline
int machine__synthesize_threads(struct machine *machine, struct target *target,
struct thread_map *threads, bool data_mmap,
- unsigned int proc_map_timeout)
+ unsigned int proc_map_timeout,
+ unsigned int nr_threads_synthesize)
{
return __machine__synthesize_threads(machine, NULL, target, threads,
perf_event__process, data_mmap,
- proc_map_timeout);
+ proc_map_timeout,
+ nr_threads_synthesize);
}
pid_t machine__get_current_tid(struct machine *machine, int cpu);
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads
2017-10-03 12:55 ` [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads Arnaldo Carvalho de Melo
@ 2017-10-03 17:37 ` Ingo Molnar
0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2017-10-03 17:37 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: linux-kernel, linux-perf-users, Kan Liang, Adrian Hunter,
Alexei Starovoitov, Andi Kleen, He Kuang, Lukasz Odzioba,
Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo
* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> The proc files which is sorted with alphabetical order are evenly
> assigned to several synthesize threads to be processed in parallel.
>
> For 'perf top', the threads number hard code to online CPU number. The
> following patch will introduce an option to set it.
>
> For other perf tools, the thread number is 1. Because the process
> function is not ready for multithreading, e.g.
> process_synthesized_event.
>
> This patch series only support event synthesize multithreading for 'perf
> top'. For other tools, it can be done separately later.
Just to give some quick feedback: this is really nice stuff!
Is anyone working on multi-threading 'perf record' (and the recording portion of
'perf top' perhaps)?
Especially with complex, high-frequency profiling there's alot of SMP overhead
coming from a single recording thread. If there was a single thread per CPU, and
it truly only recorded the events from its own CPU, things would become a lot more
scalable.
For example, if we measure the current overhead of perf record of a (limited)
parallel kernel build:
triton:~/tip> perf stat --no-inherit --pre "make clean >/dev/null 2>&1" perf record -F 10000 make -j kernel
...
[ perf record: Captured and wrote 5.124 MB perf.data (108400 samples) ]
Performance counter stats for 'perf record -F 10000 make -j kernel':
183.582587 task-clock (msec) # 0.039 CPUs utilized
2,496 context-switches # 0.014 M/sec
157 cpu-migrations # 0.855 K/sec
6,649 page-faults # 0.036 M/sec
817,478,151 cycles # 4.453 GHz
416,641,913 stalled-cycles-frontend # 50.97% frontend cycles idle
1,018,336,301 instructions # 1.25 insn per cycle
# 0.41 stalled cycles per insn
217,255,137 branches # 1183.419 M/sec
2,970,118 branch-misses # 1.37% of all branches
4.710378510 seconds time elapsed
That's 1018336301 just to record 108400 samples, i.e. every sample takes 9,300
instructions to _record_. That's insanely high overhead from what is in essence a
tracing utility.
Even if I add "-B -N" to disable buildid generation (which is the worst offender),
it's still very high overhead:
[ perf record: Captured and wrote 5.585 MB perf.data ]
Performance counter stats for 'perf record -B -N -F 10000 make -j kernel':
45.625321 task-clock (msec) # 0.009 CPUs utilized
2,950 context-switches # 0.065 M/sec
204 cpu-migrations # 0.004 M/sec
1,992 page-faults # 0.044 M/sec
193,127,853 cycles # 4.233 GHz
117,098,418 stalled-cycles-frontend # 60.63% frontend cycles idle
197,899,633 instructions # 1.02 insn per cycle
# 0.59 stalled cycles per insn
41,221,863 branches # 903.487 M/sec
502,158 branch-misses # 1.22% of all branches
4.858962925 seconds time elapsed
... that's still 1,800+ instructions per event!
As a comparison, ftrace has a tracing overhead of less than 100 instructions per
event.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 7/8] perf top: Add option to set the number of thread for event synthesize
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (5 preceding siblings ...)
2017-10-03 12:55 ` [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 8/8] perf tests attr: Fix group stat tests Arnaldo Carvalho de Melo
2017-10-03 16:38 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Kan Liang, Adrian Hunter,
Alexei Starovoitov, Andi Kleen, He Kuang, Lukasz Odzioba,
Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo
From: Kan Liang <kan.liang@intel.com>
Using UINT_MAX to indicate the default thread#, which is the max number
of online CPU.
Committer testing:
# perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9
# cat /tmp/output
? ( ? ): ... [continued]: clone()) = 26651 (perf)
0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf)
0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf)
0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf)
0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf)
0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf)
0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf)
0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf)
1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf)
246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf)
#
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-top.txt | 3 +++
tools/perf/builtin-top.c | 11 ++++++++---
tools/perf/util/event.c | 5 ++++-
tools/perf/util/top.h | 1 +
4 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index d864ea6fd367..4353262bc462 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -240,6 +240,9 @@ Default is to monitor all CPUS.
--force::
Don't do ownership validation.
+--num-thread-synthesize::
+ The number of threads to run when synthesizing events for existing processes.
+ By default, the number of threads equals to the number of online CPUs.
INTERACTIVE PROMPTING KEYS
--------------------------
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index bc31b93cc1d8..477a8699f0b5 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -958,14 +958,16 @@ static int __cmd_top(struct perf_top *top)
if (perf_session__register_idle_thread(top->session) < 0)
goto out_delete;
- perf_set_multithreaded();
+ if (top->nr_threads_synthesize > 1)
+ perf_set_multithreaded();
machine__synthesize_threads(&top->session->machines.host, &opts->target,
top->evlist->threads, false,
opts->proc_map_timeout,
- (unsigned int)sysconf(_SC_NPROCESSORS_ONLN));
+ top->nr_threads_synthesize);
- perf_set_singlethreaded();
+ if (top->nr_threads_synthesize > 1)
+ perf_set_singlethreaded();
if (perf_hpp_list.socket) {
ret = perf_env__read_cpu_topology_map(&perf_env);
@@ -1118,6 +1120,7 @@ int cmd_top(int argc, const char **argv)
},
.max_stack = sysctl_perf_event_max_stack,
.sym_pcnt_filter = 5,
+ .nr_threads_synthesize = UINT_MAX,
};
struct record_opts *opts = &top.record_opts;
struct target *target = &opts->target;
@@ -1227,6 +1230,8 @@ int cmd_top(int argc, const char **argv)
OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
"Show entries in a hierarchy"),
OPT_BOOLEAN(0, "force", &symbol_conf.force, "don't complain, do it"),
+ OPT_UINTEGER(0, "num-thread-synthesize", &top.nr_threads_synthesize,
+ "number of thread to run event synthesize"),
OPT_END()
};
const char * const top_usage[] = {
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 0e678dd6bdbe..47eff4767edb 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -790,7 +790,10 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
if (n < 0)
return err;
- thread_nr = nr_threads_synthesize;
+ if (nr_threads_synthesize == UINT_MAX)
+ thread_nr = sysconf(_SC_NPROCESSORS_ONLN);
+ else
+ thread_nr = nr_threads_synthesize;
if (thread_nr <= 1) {
err = __perf_event__synthesize_threads(tool, process,
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index 9bdfb78a9a35..f4296e1e3bb8 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -37,6 +37,7 @@ struct perf_top {
int sym_pcnt_filter;
const char *sym_filter;
float min_percent;
+ unsigned int nr_threads_synthesize;
};
#define CONSOLE_CLEAR "^[[H^[[2J"
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 8/8] perf tests attr: Fix group stat tests
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (6 preceding siblings ...)
2017-10-03 12:55 ` [PATCH 7/8] perf top: Add option to set the number of thread for event synthesize Arnaldo Carvalho de Melo
@ 2017-10-03 12:55 ` Arnaldo Carvalho de Melo
2017-10-03 16:38 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar
8 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-03 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Jiri Olsa, Jiri Olsa,
Heiko Carstens, Hendrik Brueckner, Martin Schwidefsky,
Thomas-Mich Richter, Arnaldo Carvalho de Melo
From: Jiri Olsa <jolsa@redhat.com>
We started to use group read whenever it's possible:
82bf311e15d2 perf stat: Use group read for event groups
That breaks some of attr tests, this change adds the new possible
read_format value.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170928160633.GA26973@krava
Link: http://lkml.kernel.org/n/tip-1ko2zc4nph93d8lfwjyk9ivz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/tests/attr/test-stat-group | 2 ++
tools/perf/tests/attr/test-stat-group1 | 2 ++
2 files changed, 4 insertions(+)
diff --git a/tools/perf/tests/attr/test-stat-group b/tools/perf/tests/attr/test-stat-group
index fdc1596a8862..e15d6946e9b3 100644
--- a/tools/perf/tests/attr/test-stat-group
+++ b/tools/perf/tests/attr/test-stat-group
@@ -6,6 +6,7 @@ ret = 1
[event-1:base-stat]
fd=1
group_fd=-1
+read_format=3|15
[event-2:base-stat]
fd=2
@@ -13,3 +14,4 @@ group_fd=1
config=1
disabled=0
enable_on_exec=0
+read_format=3|15
diff --git a/tools/perf/tests/attr/test-stat-group1 b/tools/perf/tests/attr/test-stat-group1
index 2a1f86e4a904..1746751123dc 100644
--- a/tools/perf/tests/attr/test-stat-group1
+++ b/tools/perf/tests/attr/test-stat-group1
@@ -6,6 +6,7 @@ ret = 1
[event-1:base-stat]
fd=1
group_fd=-1
+read_format=3|15
[event-2:base-stat]
fd=2
@@ -13,3 +14,4 @@ group_fd=1
config=1
disabled=0
enable_on_exec=0
+read_format=3|15
--
2.13.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [GIT PULL 0/8] perf/core improvements and fixes
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (7 preceding siblings ...)
2017-10-03 12:55 ` [PATCH 8/8] perf tests attr: Fix group stat tests Arnaldo Carvalho de Melo
@ 2017-10-03 16:38 ` Ingo Molnar
8 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2017-10-03 16:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: linux-kernel, linux-perf-users, Adrian Hunter, Alexei Starovoitov,
Andi Kleen, David Ahern, Heiko Carstens, He Kuang,
Hendrik Brueckner, Jiri Olsa, Kan Liang, Lukasz Odzioba,
Martin Schwidefsky, Namhyung Kim, Peter Zijlstra,
Thomas-Mich Richter, Wang Nan, Arnaldo Carvalho de Melo
* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> Hi Ingo,
>
> I pulled tip/perf/urgent to pick up fixes, please consider
> pulling, I've been away for a while, so I'll be harvesting outstanding
> patches in the next few days, as well as trying and reviewing more
> complex patchkits,
>
> - Arnaldo
>
> Test results at the end of this message, as usual.
>
> The following changes since commit c976a7d6db215481261b63a89a408cb265a9812b:
>
> Merge remote-tracking branch 'tip/perf/urgent' into perf/core, to pick up fixes (2017-10-02 13:58:12 -0300)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.15-20171003
>
> for you to fetch changes up to f6a9820d572bd8384d982357cbad214b3a6c04bb:
>
> perf tests attr: Fix group stat tests (2017-10-03 09:41:45 -0300)
>
> ----------------------------------------------------------------
> perf/core improvements and fixes:
>
> - Multithread the synthesizing of PERF_RECORD_ events for pre-existing
> threads in 'perf top', speeding up that phase, greatly improving the
> user experience in systems such as Intel's Knights Mill (Kan Liang)
>
> - 'perf test' fixes for the perf_event_attr test case (Jiri Olsa, Thomas Richter)
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> ----------------------------------------------------------------
> Jiri Olsa (2):
> perf tests attr: Fix task term values
> perf tests attr: Fix group stat tests
>
> Kan Liang (4):
> perf tools: Lock to protect namespaces and comm list
> perf tools: Lock to protect comm_str rb tree
> perf top: Implement multithreading for perf_event__synthesize_threads
> perf top: Add option to set the number of thread for event synthesize
>
> Thomas Richter (2):
> perf test attr: Fix python error on empty result
> perf test attr: Fix ignored test case result
>
> tools/perf/Documentation/perf-top.txt | 3 +
> tools/perf/builtin-kvm.c | 3 +-
> tools/perf/builtin-record.c | 2 +-
> tools/perf/builtin-top.c | 13 +-
> tools/perf/builtin-trace.c | 2 +-
> tools/perf/tests/attr.c | 2 +-
> tools/perf/tests/attr.py | 6 +-
> tools/perf/tests/attr/base-record | 2 +-
> tools/perf/tests/attr/test-record-group | 1 +
> tools/perf/tests/attr/test-record-group-sampling | 2 +-
> tools/perf/tests/attr/test-record-group1 | 1 +
> tools/perf/tests/attr/test-stat-group | 2 +
> tools/perf/tests/attr/test-stat-group1 | 2 +
> tools/perf/tests/mmap-thread-lookup.c | 2 +-
> tools/perf/util/comm.c | 18 ++-
> tools/perf/util/event.c | 163 ++++++++++++++++++-----
> tools/perf/util/event.h | 3 +-
> tools/perf/util/machine.c | 8 +-
> tools/perf/util/machine.h | 9 +-
> tools/perf/util/thread.c | 53 +++++++-
> tools/perf/util/thread.h | 3 +
> tools/perf/util/top.h | 1 +
> 22 files changed, 249 insertions(+), 52 deletions(-)
Pulled, thanks a lot Arnaldo!
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread