* [PATCH v3 0/4] perf trace: Add --summary-mode option
@ 2025-02-05 20:54 Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on Namhyung Kim
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Namhyung Kim @ 2025-02-05 20:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Howard Chu
Hello,
I've realized that perf trace shows system call summary at the end for
each thread. But sometimes users want to see the global whole system
summary or statistics instead.
So I've added --summary-mode option like below:
$ sudo ./perf trace -as --summary-mode=total sleep 1
Summary of events:
total, 21580 events
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87%
futex 1256 89 13331.197 0.000 10.614 733.722 15.49%
poll 669 0 6806.618 0.000 10.174 459.316 11.77%
ppoll 220 0 3968.797 0.000 18.040 516.775 25.35%
clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00%
epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29%
nanosleep 16 0 60.515 0.000 3.782 10.123 33.33%
ioctl 510 0 4.284 0.001 0.008 0.182 8.84%
recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37%
write 1393 0 2.854 0.001 0.002 0.017 1.79%
read 1063 100 2.236 0.000 0.002 0.083 5.11%
...
Also it changes internal data structure to hash table to track
statistics of syscalls. And removes the rb_resort code.
v3 changes)
* Add Acked-by from Howard
* Update commit messages (Ian)
v2 changes)
* Rebase to current perf-tools-next
* Fix some style issues (Howard)
* Rename to --summary-mode (Howard)
Thanks,
Namhyung
Namhyung Kim (4):
perf trace: Allocate syscall stats only if summary is on
perf trace: Convert syscall_stats to hashmap
perf tools: Get rid of now-unused rb_resort.h
perf trace: Add --summary-mode option
tools/perf/Documentation/perf-trace.txt | 4 +
tools/perf/builtin-trace.c | 247 +++++++++++++++++++-----
tools/perf/util/rb_resort.h | 146 --------------
3 files changed, 203 insertions(+), 194 deletions(-)
delete mode 100644 tools/perf/util/rb_resort.h
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
@ 2025-02-05 20:54 ` Namhyung Kim
2025-02-12 20:50 ` Arnaldo Carvalho de Melo
2025-02-05 20:54 ` [PATCH v3 2/4] perf trace: Convert syscall_stats to hashmap Namhyung Kim
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2025-02-05 20:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Howard Chu
The syscall stats are used only when summary is requested. Let's avoid
unnecessary operations. While at it, let's pass 'trace' pointer
directly instead of passing 'output' file pointer and 'summary' option
in the 'trace' separately.
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-trace.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index ac97632f13dc8f7c..7e0324a2e9182088 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1522,13 +1522,14 @@ struct thread_trace {
struct intlist *syscall_stats;
};
-static struct thread_trace *thread_trace__new(void)
+static struct thread_trace *thread_trace__new(struct trace *trace)
{
struct thread_trace *ttrace = zalloc(sizeof(struct thread_trace));
if (ttrace) {
ttrace->files.max = -1;
- ttrace->syscall_stats = intlist__new(NULL);
+ if (trace->summary)
+ ttrace->syscall_stats = intlist__new(NULL);
}
return ttrace;
@@ -1550,7 +1551,7 @@ static void thread_trace__delete(void *pttrace)
free(ttrace);
}
-static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
+static struct thread_trace *thread__trace(struct thread *thread, struct trace *trace)
{
struct thread_trace *ttrace;
@@ -1558,7 +1559,7 @@ static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
goto fail;
if (thread__priv(thread) == NULL)
- thread__set_priv(thread, thread_trace__new());
+ thread__set_priv(thread, thread_trace__new(trace));
if (thread__priv(thread) == NULL)
goto fail;
@@ -1568,7 +1569,7 @@ static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
return ttrace;
fail:
- color_fprintf(fp, PERF_COLOR_RED,
+ color_fprintf(trace->output, PERF_COLOR_RED,
"WARNING: not enough memory, dropping samples!\n");
return NULL;
}
@@ -2622,7 +2623,7 @@ static int trace__sys_enter(struct trace *trace, struct evsel *evsel,
return -1;
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
- ttrace = thread__trace(thread, trace->output);
+ ttrace = thread__trace(thread, trace);
if (ttrace == NULL)
goto out_put;
@@ -2699,7 +2700,7 @@ static int trace__fprintf_sys_enter(struct trace *trace, struct evsel *evsel,
return -1;
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
- ttrace = thread__trace(thread, trace->output);
+ ttrace = thread__trace(thread, trace);
/*
* We need to get ttrace just to make sure it is there when syscall__scnprintf_args()
* and the rest of the beautifiers accessing it via struct syscall_arg touches it.
@@ -2771,7 +2772,7 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
return -1;
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
- ttrace = thread__trace(thread, trace->output);
+ ttrace = thread__trace(thread, trace);
if (ttrace == NULL)
goto out_put;
@@ -2960,7 +2961,7 @@ static int trace__sched_stat_runtime(struct trace *trace, struct evsel *evsel,
struct thread *thread = machine__findnew_thread(trace->host,
sample->pid,
sample->tid);
- struct thread_trace *ttrace = thread__trace(thread, trace->output);
+ struct thread_trace *ttrace = thread__trace(thread, trace);
if (ttrace == NULL)
goto out_dump;
@@ -3214,7 +3215,7 @@ static int trace__pgfault(struct trace *trace,
}
}
- ttrace = thread__trace(thread, trace->output);
+ ttrace = thread__trace(thread, trace);
if (ttrace == NULL)
goto out_put;
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/4] perf trace: Convert syscall_stats to hashmap
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on Namhyung Kim
@ 2025-02-05 20:54 ` Namhyung Kim
2025-02-12 20:54 ` Arnaldo Carvalho de Melo
2025-02-05 20:54 ` [PATCH v3 3/4] perf tools: Get rid of now-unused rb_resort.h Namhyung Kim
` (3 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2025-02-05 20:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Howard Chu
It was using a RBtree-based int-list as a hash and a custom resort
logic for that. As we have hashmap, let's convert to it and add a
custom sort function for the hashmap entries using an array. It
should be faster and more light-weighted. It's also to prepare
supporting system-wide syscall stats.
No functional changes intended.
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-trace.c | 117 ++++++++++++++++++++++++++++---------
1 file changed, 88 insertions(+), 29 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 7e0324a2e9182088..5e37f05737b75a14 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -39,6 +39,7 @@
#include "util/synthetic-events.h"
#include "util/evlist.h"
#include "util/evswitch.h"
+#include "util/hashmap.h"
#include "util/mmap.h"
#include <subcmd/pager.h>
#include <subcmd/exec-cmd.h>
@@ -63,7 +64,6 @@
#include "print_binary.h"
#include "string2.h"
#include "syscalltbl.h"
-#include "rb_resort.h"
#include "../perf.h"
#include "trace_augment.h"
@@ -1519,17 +1519,50 @@ struct thread_trace {
struct file *table;
} files;
- struct intlist *syscall_stats;
+ struct hashmap *syscall_stats;
};
+static size_t syscall_id_hash(long key, void *ctx __maybe_unused)
+{
+ return key;
+}
+
+static bool syscall_id_equal(long key1, long key2, void *ctx __maybe_unused)
+{
+ return key1 == key2;
+}
+
+static struct hashmap *alloc_syscall_stats(void)
+{
+ return hashmap__new(syscall_id_hash, syscall_id_equal, NULL);
+}
+
+static void delete_syscall_stats(struct hashmap *syscall_stats)
+{
+ struct hashmap_entry *pos;
+ size_t bkt;
+
+ if (syscall_stats == NULL)
+ return;
+
+ hashmap__for_each_entry(syscall_stats, pos, bkt)
+ free(pos->pvalue);
+ hashmap__free(syscall_stats);
+}
+
static struct thread_trace *thread_trace__new(struct trace *trace)
{
struct thread_trace *ttrace = zalloc(sizeof(struct thread_trace));
if (ttrace) {
ttrace->files.max = -1;
- if (trace->summary)
- ttrace->syscall_stats = intlist__new(NULL);
+ if (trace->summary) {
+ ttrace->syscall_stats = alloc_syscall_stats();
+ if (IS_ERR(ttrace->syscall_stats)) {
+ free(ttrace);
+ ttrace = NULL;
+ }
+ }
}
return ttrace;
@@ -1544,7 +1577,7 @@ static void thread_trace__delete(void *pttrace)
if (!ttrace)
return;
- intlist__delete(ttrace->syscall_stats);
+ delete_syscall_stats(ttrace->syscall_stats);
ttrace->syscall_stats = NULL;
thread_trace__free_files(ttrace);
zfree(&ttrace->entry_str);
@@ -2463,22 +2496,19 @@ struct syscall_stats {
static void thread__update_stats(struct thread *thread, struct thread_trace *ttrace,
int id, struct perf_sample *sample, long err, bool errno_summary)
{
- struct int_node *inode;
- struct syscall_stats *stats;
+ struct syscall_stats *stats = NULL;
u64 duration = 0;
- inode = intlist__findnew(ttrace->syscall_stats, id);
- if (inode == NULL)
- return;
-
- stats = inode->priv;
- if (stats == NULL) {
+ if (!hashmap__find(ttrace->syscall_stats, id, &stats)) {
stats = zalloc(sizeof(*stats));
if (stats == NULL)
return;
init_stats(&stats->stats);
- inode->priv = stats;
+ if (hashmap__add(ttrace->syscall_stats, id, stats) < 0) {
+ free(stats);
+ return;
+ }
}
if (ttrace->entry_time && sample->time > ttrace->entry_time)
@@ -4617,18 +4647,45 @@ static size_t trace__fprintf_threads_header(FILE *fp)
return printed;
}
-DEFINE_RESORT_RB(syscall_stats, a->msecs > b->msecs,
+struct syscall_entry {
struct syscall_stats *stats;
double msecs;
int syscall;
-)
+};
+
+static int entry_cmp(const void *e1, const void *e2)
+{
+ const struct syscall_entry *entry1 = e1;
+ const struct syscall_entry *entry2 = e2;
+
+ return entry1->msecs > entry2->msecs ? -1 : 1;
+}
+
+static struct syscall_entry *thread__sort_stats(struct thread_trace *ttrace)
{
- struct int_node *source = rb_entry(nd, struct int_node, rb_node);
- struct syscall_stats *stats = source->priv;
+ struct syscall_entry *entry;
+ struct hashmap_entry *pos;
+ unsigned bkt, i, nr;
+
+ nr = ttrace->syscall_stats->sz;
+ entry = malloc(nr * sizeof(*entry));
+ if (entry == NULL)
+ return NULL;
+
+ i = 0;
+ hashmap__for_each_entry(ttrace->syscall_stats, pos, bkt) {
+ struct syscall_stats *ss = pos->pvalue;
+ struct stats *st = &ss->stats;
- entry->syscall = source->i;
- entry->stats = stats;
- entry->msecs = stats ? (u64)stats->stats.n * (avg_stats(&stats->stats) / NSEC_PER_MSEC) : 0;
+ entry[i].stats = ss;
+ entry[i].msecs = (u64)st->n * (avg_stats(st) / NSEC_PER_MSEC);
+ entry[i].syscall = pos->key;
+ i++;
+ }
+ assert(i == nr);
+
+ qsort(entry, nr, sizeof(*entry), entry_cmp);
+ return entry;
}
static size_t thread__dump_stats(struct thread_trace *ttrace,
@@ -4636,10 +4693,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
{
size_t printed = 0;
struct syscall *sc;
- struct rb_node *nd;
- DECLARE_RESORT_RB_INTLIST(syscall_stats, ttrace->syscall_stats);
+ struct syscall_entry *entries;
- if (syscall_stats == NULL)
+ entries = thread__sort_stats(ttrace);
+ if (entries == NULL)
return 0;
printed += fprintf(fp, "\n");
@@ -4648,8 +4705,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
printed += fprintf(fp, " (msec) (msec) (msec) (msec) (%%)\n");
printed += fprintf(fp, " --------------- -------- ------ -------- --------- --------- --------- ------\n");
- resort_rb__for_each_entry(nd, syscall_stats) {
- struct syscall_stats *stats = syscall_stats_entry->stats;
+ for (size_t i = 0; i < ttrace->syscall_stats->sz; i++) {
+ struct syscall_entry *entry = &entries[i];
+ struct syscall_stats *stats = entry->stats;
+
if (stats) {
double min = (double)(stats->stats.min) / NSEC_PER_MSEC;
double max = (double)(stats->stats.max) / NSEC_PER_MSEC;
@@ -4660,10 +4719,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
pct = avg ? 100.0 * stddev_stats(&stats->stats) / avg : 0.0;
avg /= NSEC_PER_MSEC;
- sc = &trace->syscalls.table[syscall_stats_entry->syscall];
+ sc = &trace->syscalls.table[entry->syscall];
printed += fprintf(fp, " %-15s", sc->name);
printed += fprintf(fp, " %8" PRIu64 " %6" PRIu64 " %9.3f %9.3f %9.3f",
- n, stats->nr_failures, syscall_stats_entry->msecs, min, avg);
+ n, stats->nr_failures, entry->msecs, min, avg);
printed += fprintf(fp, " %9.3f %9.2f%%\n", max, pct);
if (trace->errno_summary && stats->nr_failures) {
@@ -4677,7 +4736,7 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
}
}
- resort_rb__delete(syscall_stats);
+ free(entries);
printed += fprintf(fp, "\n\n");
return printed;
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 3/4] perf tools: Get rid of now-unused rb_resort.h
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 2/4] perf trace: Convert syscall_stats to hashmap Namhyung Kim
@ 2025-02-05 20:54 ` Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 4/4] perf trace: Add --summary-mode option Namhyung Kim
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2025-02-05 20:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Howard Chu
It was only used in perf trace and it switched to use hashmap instead.
Let's delete the code.
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/rb_resort.h | 146 ------------------------------------
1 file changed, 146 deletions(-)
delete mode 100644 tools/perf/util/rb_resort.h
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
deleted file mode 100644
index d927a0d250528505..0000000000000000
--- a/tools/perf/util/rb_resort.h
+++ /dev/null
@@ -1,146 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _PERF_RESORT_RB_H_
-#define _PERF_RESORT_RB_H_
-/*
- * Template for creating a class to resort an existing rb_tree according to
- * a new sort criteria, that must be present in the entries of the source
- * rb_tree.
- *
- * (c) 2016 Arnaldo Carvalho de Melo <acme@redhat.com>
- *
- * Quick example, resorting threads by its shortname:
- *
- * First define the prefix (threads) to be used for the functions and data
- * structures created, and provide an expression for the sorting, then the
- * fields to be present in each of the entries in the new, sorted, rb_tree.
- *
- * The body of the init function should collect the fields, maybe
- * pre-calculating them from multiple entries in the original 'entry' from
- * the rb_tree used as a source for the entries to be sorted:
-
-DEFINE_RB_RESORT_RB(threads, strcmp(a->thread->shortname,
- b->thread->shortname) < 0,
- struct thread *thread;
-)
-{
- entry->thread = rb_entry(nd, struct thread, rb_node);
-}
-
- * After this it is just a matter of instantiating it and iterating it,
- * for a few data structures with existing rb_trees, such as 'struct machine',
- * helpers are available to get the rb_root and the nr_entries:
-
- DECLARE_RESORT_RB_MACHINE_THREADS(threads, machine_ptr);
-
- * This will instantiate the new rb_tree and a cursor for it, that can be used as:
-
- struct rb_node *nd;
-
- resort_rb__for_each_entry(nd, threads) {
- struct thread *t = threads_entry;
- printf("%s: %d\n", t->shortname, t->tid);
- }
-
- * Then delete it:
-
- resort_rb__delete(threads);
-
- * The name of the data structures and functions will have a _sorted suffix
- * right before the method names, i.e. will look like:
- *
- * struct threads_sorted_entry {}
- * threads_sorted__insert()
- */
-
-#define DEFINE_RESORT_RB(__name, __comp, ...) \
-struct __name##_sorted_entry { \
- struct rb_node rb_node; \
- __VA_ARGS__ \
-}; \
-static void __name##_sorted__init_entry(struct rb_node *nd, \
- struct __name##_sorted_entry *entry); \
- \
-static int __name##_sorted__cmp(struct rb_node *nda, struct rb_node *ndb) \
-{ \
- struct __name##_sorted_entry *a, *b; \
- a = rb_entry(nda, struct __name##_sorted_entry, rb_node); \
- b = rb_entry(ndb, struct __name##_sorted_entry, rb_node); \
- return __comp; \
-} \
- \
-struct __name##_sorted { \
- struct rb_root entries; \
- struct __name##_sorted_entry nd[0]; \
-}; \
- \
-static void __name##_sorted__insert(struct __name##_sorted *sorted, \
- struct rb_node *sorted_nd) \
-{ \
- struct rb_node **p = &sorted->entries.rb_node, *parent = NULL; \
- while (*p != NULL) { \
- parent = *p; \
- if (__name##_sorted__cmp(sorted_nd, parent)) \
- p = &(*p)->rb_left; \
- else \
- p = &(*p)->rb_right; \
- } \
- rb_link_node(sorted_nd, parent, p); \
- rb_insert_color(sorted_nd, &sorted->entries); \
-} \
- \
-static void __name##_sorted__sort(struct __name##_sorted *sorted, \
- struct rb_root *entries) \
-{ \
- struct rb_node *nd; \
- unsigned int i = 0; \
- for (nd = rb_first(entries); nd; nd = rb_next(nd)) { \
- struct __name##_sorted_entry *snd = &sorted->nd[i++]; \
- __name##_sorted__init_entry(nd, snd); \
- __name##_sorted__insert(sorted, &snd->rb_node); \
- } \
-} \
- \
-static struct __name##_sorted *__name##_sorted__new(struct rb_root *entries, \
- int nr_entries) \
-{ \
- struct __name##_sorted *sorted; \
- sorted = malloc(sizeof(*sorted) + sizeof(sorted->nd[0]) * nr_entries); \
- if (sorted) { \
- sorted->entries = RB_ROOT; \
- __name##_sorted__sort(sorted, entries); \
- } \
- return sorted; \
-} \
- \
-static void __name##_sorted__delete(struct __name##_sorted *sorted) \
-{ \
- free(sorted); \
-} \
- \
-static void __name##_sorted__init_entry(struct rb_node *nd, \
- struct __name##_sorted_entry *entry)
-
-#define DECLARE_RESORT_RB(__name) \
-struct __name##_sorted_entry *__name##_entry; \
-struct __name##_sorted *__name = __name##_sorted__new
-
-#define resort_rb__for_each_entry(__nd, __name) \
- for (__nd = rb_first(&__name->entries); \
- __name##_entry = rb_entry(__nd, struct __name##_sorted_entry, \
- rb_node), __nd; \
- __nd = rb_next(__nd))
-
-#define resort_rb__delete(__name) \
- __name##_sorted__delete(__name), __name = NULL
-
-/*
- * Helpers for other classes that contains both an rbtree and the
- * number of entries in it:
- */
-
-/* For 'struct intlist' */
-#define DECLARE_RESORT_RB_INTLIST(__name, __ilist) \
- DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries.rb_root, \
- __ilist->rblist.nr_entries)
-
-#endif /* _PERF_RESORT_RB_H_ */
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 4/4] perf trace: Add --summary-mode option
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
` (2 preceding siblings ...)
2025-02-05 20:54 ` [PATCH v3 3/4] perf tools: Get rid of now-unused rb_resort.h Namhyung Kim
@ 2025-02-05 20:54 ` Namhyung Kim
2025-02-08 5:25 ` [PATCH v3 0/4] " Howard Chu
2025-02-13 17:21 ` Namhyung Kim
5 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2025-02-05 20:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Howard Chu
The --summary-mode option will select how to show the syscall summary at
the end. By default, it'll show the summary for each thread and it's
the same as if --summary-mode=thread is passed.
The other option is to show total summary, which is --summary-mode=total.
I'd like to have this instead of a separate option like --total-summary
because we may want to add a new summary mode (by cgroup) later.
$ sudo ./perf trace -as --summary-mode=total sleep 1
Summary of events:
total, 21580 events
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87%
futex 1256 89 13331.197 0.000 10.614 733.722 15.49%
poll 669 0 6806.618 0.000 10.174 459.316 11.77%
ppoll 220 0 3968.797 0.000 18.040 516.775 25.35%
clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00%
epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29%
nanosleep 16 0 60.515 0.000 3.782 10.123 33.33%
ioctl 510 0 4.284 0.001 0.008 0.182 8.84%
recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37%
write 1393 0 2.854 0.001 0.002 0.017 1.79%
read 1063 100 2.236 0.000 0.002 0.083 5.11%
...
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/Documentation/perf-trace.txt | 4 +
tools/perf/builtin-trace.c | 129 ++++++++++++++++++++----
2 files changed, 114 insertions(+), 19 deletions(-)
diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
index fb3d2af33844c06b..887dc37773d0f4d6 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -150,6 +150,10 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs.
To be used with -s or -S, to show stats for the errnos experienced by
syscalls, using only this option will trigger --summary.
+--summary-mode=mode::
+ To be used with -s or -S, to select how to show summary. By default it'll
+ show the syscall summary by thread. Possible values are: thread, total.
+
--tool_stats::
Show tool stats such as number of times fd->pathname was discovered thru
hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc.
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 5e37f05737b75a14..8411dc5e67198303 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -139,6 +139,12 @@ struct syscall_fmt {
bool hexret;
};
+enum summary_mode {
+ SUMMARY__NONE = 0,
+ SUMMARY__BY_TOTAL,
+ SUMMARY__BY_THREAD,
+};
+
struct trace {
struct perf_tool tool;
struct syscalltbl *sctbl;
@@ -177,14 +183,17 @@ struct trace {
pid_t *entries;
struct bpf_map *map;
} filter_pids;
+ struct hashmap *syscall_stats;
double duration_filter;
double runtime_ms;
+ unsigned long pfmaj, pfmin;
struct {
u64 vfs_getname,
proc_getname;
} stats;
unsigned int max_stack;
unsigned int min_stack;
+ enum summary_mode summary_mode;
int raw_augmented_syscalls_args_size;
bool raw_augmented_syscalls;
bool fd_path_disabled;
@@ -2494,18 +2503,23 @@ struct syscall_stats {
};
static void thread__update_stats(struct thread *thread, struct thread_trace *ttrace,
- int id, struct perf_sample *sample, long err, bool errno_summary)
+ int id, struct perf_sample *sample, long err,
+ struct trace *trace)
{
+ struct hashmap *syscall_stats = ttrace->syscall_stats;
struct syscall_stats *stats = NULL;
u64 duration = 0;
- if (!hashmap__find(ttrace->syscall_stats, id, &stats)) {
+ if (trace->summary_mode == SUMMARY__BY_TOTAL)
+ syscall_stats = trace->syscall_stats;
+
+ if (!hashmap__find(syscall_stats, id, &stats)) {
stats = zalloc(sizeof(*stats));
if (stats == NULL)
return;
init_stats(&stats->stats);
- if (hashmap__add(ttrace->syscall_stats, id, stats) < 0) {
+ if (hashmap__add(syscall_stats, id, stats) < 0) {
free(stats);
return;
}
@@ -2519,7 +2533,7 @@ static void thread__update_stats(struct thread *thread, struct thread_trace *ttr
if (err < 0) {
++stats->nr_failures;
- if (!errno_summary)
+ if (!trace->errno_summary)
return;
err = -err;
@@ -2811,7 +2825,7 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
ret = perf_evsel__sc_tp_uint(evsel, ret, sample);
if (trace->summary)
- thread__update_stats(thread, ttrace, id, sample, ret, trace->errno_summary);
+ thread__update_stats(thread, ttrace, id, sample, ret, trace);
if (!trace->fd_path_disabled && sc->is_open && ret >= 0 && ttrace->filename.pending_open) {
trace__set_fd_pathname(thread, ret, ttrace->filename.name);
@@ -3249,10 +3263,13 @@ static int trace__pgfault(struct trace *trace,
if (ttrace == NULL)
goto out_put;
- if (evsel->core.attr.config == PERF_COUNT_SW_PAGE_FAULTS_MAJ)
+ if (evsel->core.attr.config == PERF_COUNT_SW_PAGE_FAULTS_MAJ) {
ttrace->pfmaj++;
- else
+ trace->pfmaj++;
+ } else {
ttrace->pfmin++;
+ trace->pfmin++;
+ }
if (trace->summary_only)
goto out;
@@ -3411,6 +3428,7 @@ static int trace__record(struct trace *trace, int argc, const char **argv)
}
static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp);
+static size_t trace__fprintf_total_summary(struct trace *trace, FILE *fp);
static bool evlist__add_vfs_getname(struct evlist *evlist)
{
@@ -4323,6 +4341,12 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
goto out_delete_evlist;
}
+ if (trace->summary_mode == SUMMARY__BY_TOTAL) {
+ trace->syscall_stats = alloc_syscall_stats();
+ if (trace->syscall_stats == NULL)
+ goto out_delete_evlist;
+ }
+
evlist__config(evlist, &trace->opts, &callchain_param);
if (forks) {
@@ -4483,8 +4507,12 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
ordered_events__flush(&trace->oe.data, OE_FLUSH__FINAL);
if (!err) {
- if (trace->summary)
- trace__fprintf_thread_summary(trace, trace->output);
+ if (trace->summary) {
+ if (trace->summary_mode == SUMMARY__BY_TOTAL)
+ trace__fprintf_total_summary(trace, trace->output);
+ else
+ trace__fprintf_thread_summary(trace, trace->output);
+ }
if (trace->show_tool_stats) {
fprintf(trace->output, "Stats:\n "
@@ -4496,6 +4524,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
}
out_delete_evlist:
+ delete_syscall_stats(trace->syscall_stats);
trace__symbols__exit(trace);
evlist__free_syscall_tp_fields(evlist);
evlist__delete(evlist);
@@ -4623,6 +4652,12 @@ static int trace__replay(struct trace *trace)
evsel->handler = trace__pgfault;
}
+ if (trace->summary_mode == SUMMARY__BY_TOTAL) {
+ trace->syscall_stats = alloc_syscall_stats();
+ if (trace->syscall_stats == NULL)
+ goto out;
+ }
+
setup_pager();
err = perf_session__process_events(session);
@@ -4633,12 +4668,13 @@ static int trace__replay(struct trace *trace)
trace__fprintf_thread_summary(trace, trace->output);
out:
+ delete_syscall_stats(trace->syscall_stats);
perf_session__delete(session);
return err;
}
-static size_t trace__fprintf_threads_header(FILE *fp)
+static size_t trace__fprintf_summary_header(FILE *fp)
{
size_t printed;
@@ -4661,19 +4697,19 @@ static int entry_cmp(const void *e1, const void *e2)
return entry1->msecs > entry2->msecs ? -1 : 1;
}
-static struct syscall_entry *thread__sort_stats(struct thread_trace *ttrace)
+static struct syscall_entry *syscall__sort_stats(struct hashmap *syscall_stats)
{
struct syscall_entry *entry;
struct hashmap_entry *pos;
unsigned bkt, i, nr;
- nr = ttrace->syscall_stats->sz;
+ nr = syscall_stats->sz;
entry = malloc(nr * sizeof(*entry));
if (entry == NULL)
return NULL;
i = 0;
- hashmap__for_each_entry(ttrace->syscall_stats, pos, bkt) {
+ hashmap__for_each_entry(syscall_stats, pos, bkt) {
struct syscall_stats *ss = pos->pvalue;
struct stats *st = &ss->stats;
@@ -4688,14 +4724,14 @@ static struct syscall_entry *thread__sort_stats(struct thread_trace *ttrace)
return entry;
}
-static size_t thread__dump_stats(struct thread_trace *ttrace,
- struct trace *trace, FILE *fp)
+static size_t syscall__dump_stats(struct trace *trace, FILE *fp,
+ struct hashmap *syscall_stats)
{
size_t printed = 0;
struct syscall *sc;
struct syscall_entry *entries;
- entries = thread__sort_stats(ttrace);
+ entries = syscall__sort_stats(syscall_stats);
if (entries == NULL)
return 0;
@@ -4705,7 +4741,7 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
printed += fprintf(fp, " (msec) (msec) (msec) (msec) (%%)\n");
printed += fprintf(fp, " --------------- -------- ------ -------- --------- --------- --------- ------\n");
- for (size_t i = 0; i < ttrace->syscall_stats->sz; i++) {
+ for (size_t i = 0; i < syscall_stats->sz; i++) {
struct syscall_entry *entry = &entries[i];
struct syscall_stats *stats = entry->stats;
@@ -4742,6 +4778,17 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
return printed;
}
+static size_t thread__dump_stats(struct thread_trace *ttrace,
+ struct trace *trace, FILE *fp)
+{
+ return syscall__dump_stats(trace, fp, ttrace->syscall_stats);
+}
+
+static size_t system__dump_stats(struct trace *trace, FILE *fp)
+{
+ return syscall__dump_stats(trace, fp, trace->syscall_stats);
+}
+
static size_t trace__fprintf_thread(FILE *fp, struct thread *thread, struct trace *trace)
{
size_t printed = 0;
@@ -4795,7 +4842,7 @@ static int trace_nr_events_cmp(void *priv __maybe_unused,
static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp)
{
- size_t printed = trace__fprintf_threads_header(fp);
+ size_t printed = trace__fprintf_summary_header(fp);
LIST_HEAD(threads);
if (machine__thread_list(trace->host, &threads) == 0) {
@@ -4810,6 +4857,27 @@ static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp)
return printed;
}
+static size_t trace__fprintf_total_summary(struct trace *trace, FILE *fp)
+{
+ size_t printed = trace__fprintf_summary_header(fp);
+
+ printed += fprintf(fp, " total, ");
+ printed += fprintf(fp, "%lu events", trace->nr_events);
+
+ if (trace->pfmaj)
+ printed += fprintf(fp, ", %lu majfaults", trace->pfmaj);
+ if (trace->pfmin)
+ printed += fprintf(fp, ", %lu minfaults", trace->pfmin);
+ if (trace->sched)
+ printed += fprintf(fp, ", %.3f msec\n", trace->runtime_ms);
+ else if (fputc('\n', fp) != EOF)
+ ++printed;
+
+ printed += system__dump_stats(trace, fp);
+
+ return printed;
+}
+
static int trace__set_duration(const struct option *opt, const char *str,
int unset __maybe_unused)
{
@@ -5081,6 +5149,23 @@ static int trace__parse_cgroups(const struct option *opt, const char *str, int u
return 0;
}
+static int trace__parse_summary_mode(const struct option *opt, const char *str,
+ int unset __maybe_unused)
+{
+ struct trace *trace = opt->value;
+
+ if (!strcmp(str, "thread")) {
+ trace->summary_mode = SUMMARY__BY_THREAD;
+ } else if (!strcmp(str, "total")) {
+ trace->summary_mode = SUMMARY__BY_TOTAL;
+ } else {
+ pr_err("Unknown summary mode: %s\n", str);
+ return -1;
+ }
+
+ return 0;
+}
+
static int trace__config(const char *var, const char *value, void *arg)
{
struct trace *trace = arg;
@@ -5228,6 +5313,9 @@ int cmd_trace(int argc, const char **argv)
"Show all syscalls and summary with statistics"),
OPT_BOOLEAN(0, "errno-summary", &trace.errno_summary,
"Show errno stats per syscall, use with -s or -S"),
+ OPT_CALLBACK(0, "summary-mode", &trace, "mode",
+ "How to show summary: select thread (default) or total",
+ trace__parse_summary_mode),
OPT_CALLBACK_DEFAULT('F', "pf", &trace.trace_pgfaults, "all|maj|min",
"Trace pagefaults", parse_pagefaults, "maj"),
OPT_BOOLEAN(0, "syscalls", &trace.trace_syscalls, "Trace syscalls"),
@@ -5524,8 +5612,11 @@ int cmd_trace(int argc, const char **argv)
trace.summary = trace.summary_only;
/* Keep exited threads, otherwise information might be lost for summary */
- if (trace.summary)
+ if (trace.summary) {
symbol_conf.keep_exited_threads = true;
+ if (trace.summary_mode == SUMMARY__NONE)
+ trace.summary_mode = SUMMARY__BY_THREAD;
+ }
if (output_name != NULL) {
err = trace__open_output(&trace, output_name);
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] perf trace: Add --summary-mode option
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
` (3 preceding siblings ...)
2025-02-05 20:54 ` [PATCH v3 4/4] perf trace: Add --summary-mode option Namhyung Kim
@ 2025-02-08 5:25 ` Howard Chu
2025-02-12 18:49 ` Namhyung Kim
2025-02-13 17:21 ` Namhyung Kim
5 siblings, 1 reply; 13+ messages in thread
From: Howard Chu @ 2025-02-08 5:25 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang, Jiri Olsa,
Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
Hello Namhyung,
On Wed, Feb 5, 2025 at 12:54 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> I've realized that perf trace shows system call summary at the end for
> each thread. But sometimes users want to see the global whole system
> summary or statistics instead.
>
> So I've added --summary-mode option like below:
>
> $ sudo ./perf trace -as --summary-mode=total sleep 1
>
> Summary of events:
>
> total, 21580 events
>
> syscall calls errors total min avg max stddev
> (msec) (msec) (msec) (msec) (%)
> --------------- -------- ------ -------- --------- --------- --------- ------
> epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87%
> futex 1256 89 13331.197 0.000 10.614 733.722 15.49%
> poll 669 0 6806.618 0.000 10.174 459.316 11.77%
> ppoll 220 0 3968.797 0.000 18.040 516.775 25.35%
> clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00%
> epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29%
> nanosleep 16 0 60.515 0.000 3.782 10.123 33.33%
> ioctl 510 0 4.284 0.001 0.008 0.182 8.84%
> recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37%
> write 1393 0 2.854 0.001 0.002 0.017 1.79%
> read 1063 100 2.236 0.000 0.002 0.083 5.11%
> ...
>
> Also it changes internal data structure to hash table to track
> statistics of syscalls. And removes the rb_resort code.
>
> v3 changes)
> * Add Acked-by from Howard
> * Update commit messages (Ian)
Sorry for the delay.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Thanks,
Howard
>
> v2 changes)
> * Rebase to current perf-tools-next
> * Fix some style issues (Howard)
> * Rename to --summary-mode (Howard)
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (4):
> perf trace: Allocate syscall stats only if summary is on
> perf trace: Convert syscall_stats to hashmap
> perf tools: Get rid of now-unused rb_resort.h
> perf trace: Add --summary-mode option
>
> tools/perf/Documentation/perf-trace.txt | 4 +
> tools/perf/builtin-trace.c | 247 +++++++++++++++++++-----
> tools/perf/util/rb_resort.h | 146 --------------
> 3 files changed, 203 insertions(+), 194 deletions(-)
> delete mode 100644 tools/perf/util/rb_resort.h
>
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] perf trace: Add --summary-mode option
2025-02-08 5:25 ` [PATCH v3 0/4] " Howard Chu
@ 2025-02-12 18:49 ` Namhyung Kim
2025-02-12 20:59 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2025-02-12 18:49 UTC (permalink / raw)
To: Howard Chu, Arnaldo Carvalho de Melo, Ian Rogers
Cc: Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
LKML, linux-perf-users
On Fri, Feb 07, 2025 at 09:25:02PM -0800, Howard Chu wrote:
> Hello Namhyung,
>
> On Wed, Feb 5, 2025 at 12:54 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > I've realized that perf trace shows system call summary at the end for
> > each thread. But sometimes users want to see the global whole system
> > summary or statistics instead.
> >
> > So I've added --summary-mode option like below:
> >
> > $ sudo ./perf trace -as --summary-mode=total sleep 1
> >
> > Summary of events:
> >
> > total, 21580 events
> >
> > syscall calls errors total min avg max stddev
> > (msec) (msec) (msec) (msec) (%)
> > --------------- -------- ------ -------- --------- --------- --------- ------
> > epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87%
> > futex 1256 89 13331.197 0.000 10.614 733.722 15.49%
> > poll 669 0 6806.618 0.000 10.174 459.316 11.77%
> > ppoll 220 0 3968.797 0.000 18.040 516.775 25.35%
> > clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00%
> > epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29%
> > nanosleep 16 0 60.515 0.000 3.782 10.123 33.33%
> > ioctl 510 0 4.284 0.001 0.008 0.182 8.84%
> > recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37%
> > write 1393 0 2.854 0.001 0.002 0.017 1.79%
> > read 1063 100 2.236 0.000 0.002 0.083 5.11%
> > ...
> >
> > Also it changes internal data structure to hash table to track
> > statistics of syscalls. And removes the rb_resort code.
> >
> > v3 changes)
> > * Add Acked-by from Howard
> > * Update commit messages (Ian)
>
> Sorry for the delay.
>
> Reviewed-by: Howard Chu <howardchu95@gmail.com>
Thanks for your review!
Arnaldo and Ian, are you ok with this?
Thanks,
Namhyung
> >
> > v2 changes)
> > * Rebase to current perf-tools-next
> > * Fix some style issues (Howard)
> > * Rename to --summary-mode (Howard)
> >
> > Thanks,
> > Namhyung
> >
> >
> > Namhyung Kim (4):
> > perf trace: Allocate syscall stats only if summary is on
> > perf trace: Convert syscall_stats to hashmap
> > perf tools: Get rid of now-unused rb_resort.h
> > perf trace: Add --summary-mode option
> >
> > tools/perf/Documentation/perf-trace.txt | 4 +
> > tools/perf/builtin-trace.c | 247 +++++++++++++++++++-----
> > tools/perf/util/rb_resort.h | 146 --------------
> > 3 files changed, 203 insertions(+), 194 deletions(-)
> > delete mode 100644 tools/perf/util/rb_resort.h
> >
> > --
> > 2.48.1.502.g6dc24dfdaf-goog
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on
2025-02-05 20:54 ` [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on Namhyung Kim
@ 2025-02-12 20:50 ` Arnaldo Carvalho de Melo
2025-02-13 3:39 ` Namhyung Kim
0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-02-12 20:50 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users, Howard Chu, Wander Costa
On Wed, Feb 05, 2025 at 12:54:40PM -0800, Namhyung Kim wrote:
> The syscall stats are used only when summary is requested. Let's avoid
> unnecessary operations. While at it, let's pass 'trace' pointer
> directly instead of passing 'output' file pointer and 'summary' option
> in the 'trace' separately.
root@number:~# perf probe -x ~/bin/perf intlist__new
Added new event:
probe_perf:intlist_new (on intlist__new in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:intlist_new -aR sleep 1
root@number:~# perf trace -e probe_perf:intlist_new perf trace -e *nanosleep sleep 1
0.000 (1000.184 ms): sleep/574971 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7ffda05c1be0) = 0
root@number:~# perf trace -e probe_perf:intlist_new perf trace -e *nanosleep --summary sleep 1
0.000 :574984/574984 probe_perf:intlist_new(__probe_ip: 6861748)
Summary of events:
sleep (574985), 2 events, 25.0%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
clock_nanosleep 1 0 1000.205 1000.205 1000.205 1000.205 0.00%
root@number:~#
I'm trying to convince a colleague to have this short streamlined to
something like:
# perf trace -e ~/bin/perf/intlist_new()/ perf trace -e *nanosleep sleep 1
Or some other event syntax that allows us to specify compactly and
unambiguously that we want to put a probe if one isn't there yet and
inside the () to specify which of the arguments we want collected, etc,
to save the 'perf probe' step, adding the probe and removing it if it
isn't there or reusing a pre-existing, compatible one.
Anyway, for the patch tested:
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Arnaldo
> Acked-by: Howard Chu <howardchu95@gmail.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/builtin-trace.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index ac97632f13dc8f7c..7e0324a2e9182088 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -1522,13 +1522,14 @@ struct thread_trace {
> struct intlist *syscall_stats;
> };
>
> -static struct thread_trace *thread_trace__new(void)
> +static struct thread_trace *thread_trace__new(struct trace *trace)
> {
> struct thread_trace *ttrace = zalloc(sizeof(struct thread_trace));
>
> if (ttrace) {
> ttrace->files.max = -1;
> - ttrace->syscall_stats = intlist__new(NULL);
> + if (trace->summary)
> + ttrace->syscall_stats = intlist__new(NULL);
> }
>
> return ttrace;
> @@ -1550,7 +1551,7 @@ static void thread_trace__delete(void *pttrace)
> free(ttrace);
> }
>
> -static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
> +static struct thread_trace *thread__trace(struct thread *thread, struct trace *trace)
> {
> struct thread_trace *ttrace;
>
> @@ -1558,7 +1559,7 @@ static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
> goto fail;
>
> if (thread__priv(thread) == NULL)
> - thread__set_priv(thread, thread_trace__new());
> + thread__set_priv(thread, thread_trace__new(trace));
>
> if (thread__priv(thread) == NULL)
> goto fail;
> @@ -1568,7 +1569,7 @@ static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
>
> return ttrace;
> fail:
> - color_fprintf(fp, PERF_COLOR_RED,
> + color_fprintf(trace->output, PERF_COLOR_RED,
> "WARNING: not enough memory, dropping samples!\n");
> return NULL;
> }
> @@ -2622,7 +2623,7 @@ static int trace__sys_enter(struct trace *trace, struct evsel *evsel,
> return -1;
>
> thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> - ttrace = thread__trace(thread, trace->output);
> + ttrace = thread__trace(thread, trace);
> if (ttrace == NULL)
> goto out_put;
>
> @@ -2699,7 +2700,7 @@ static int trace__fprintf_sys_enter(struct trace *trace, struct evsel *evsel,
> return -1;
>
> thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> - ttrace = thread__trace(thread, trace->output);
> + ttrace = thread__trace(thread, trace);
> /*
> * We need to get ttrace just to make sure it is there when syscall__scnprintf_args()
> * and the rest of the beautifiers accessing it via struct syscall_arg touches it.
> @@ -2771,7 +2772,7 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
> return -1;
>
> thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> - ttrace = thread__trace(thread, trace->output);
> + ttrace = thread__trace(thread, trace);
> if (ttrace == NULL)
> goto out_put;
>
> @@ -2960,7 +2961,7 @@ static int trace__sched_stat_runtime(struct trace *trace, struct evsel *evsel,
> struct thread *thread = machine__findnew_thread(trace->host,
> sample->pid,
> sample->tid);
> - struct thread_trace *ttrace = thread__trace(thread, trace->output);
> + struct thread_trace *ttrace = thread__trace(thread, trace);
>
> if (ttrace == NULL)
> goto out_dump;
> @@ -3214,7 +3215,7 @@ static int trace__pgfault(struct trace *trace,
> }
> }
>
> - ttrace = thread__trace(thread, trace->output);
> + ttrace = thread__trace(thread, trace);
> if (ttrace == NULL)
> goto out_put;
>
> --
> 2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/4] perf trace: Convert syscall_stats to hashmap
2025-02-05 20:54 ` [PATCH v3 2/4] perf trace: Convert syscall_stats to hashmap Namhyung Kim
@ 2025-02-12 20:54 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-02-12 20:54 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users, Howard Chu
On Wed, Feb 05, 2025 at 12:54:41PM -0800, Namhyung Kim wrote:
> It was using a RBtree-based int-list as a hash and a custom resort
> logic for that. As we have hashmap, let's convert to it and add a
> custom sort function for the hashmap entries using an array. It
> should be faster and more light-weighted. It's also to prepare
> supporting system-wide syscall stats.
>
> No functional changes intended.
>
> Acked-by: Howard Chu <howardchu95@gmail.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/builtin-trace.c | 117 ++++++++++++++++++++++++++++---------
> 1 file changed, 88 insertions(+), 29 deletions(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 7e0324a2e9182088..5e37f05737b75a14 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -39,6 +39,7 @@
> #include "util/synthetic-events.h"
> #include "util/evlist.h"
> #include "util/evswitch.h"
> +#include "util/hashmap.h"
> #include "util/mmap.h"
> #include <subcmd/pager.h>
> #include <subcmd/exec-cmd.h>
> @@ -63,7 +64,6 @@
> #include "print_binary.h"
> #include "string2.h"
> #include "syscalltbl.h"
> -#include "rb_resort.h"
> #include "../perf.h"
> #include "trace_augment.h"
>
> @@ -1519,17 +1519,50 @@ struct thread_trace {
> struct file *table;
> } files;
>
> - struct intlist *syscall_stats;
> + struct hashmap *syscall_stats;
> };
>
> +static size_t syscall_id_hash(long key, void *ctx __maybe_unused)
> +{
> + return key;
> +}
> +
> +static bool syscall_id_equal(long key1, long key2, void *ctx __maybe_unused)
> +{
> + return key1 == key2;
> +}
> +
> +static struct hashmap *alloc_syscall_stats(void)
> +{
> + return hashmap__new(syscall_id_hash, syscall_id_equal, NULL);
> +}
> +
> +static void delete_syscall_stats(struct hashmap *syscall_stats)
> +{
> + struct hashmap_entry *pos;
> + size_t bkt;
> +
> + if (syscall_stats == NULL)
> + return;
> +
> + hashmap__for_each_entry(syscall_stats, pos, bkt)
> + free(pos->pvalue);
nit:
zfree(&pos->pvalue);
> + hashmap__free(syscall_stats);
> +}
> +
> static struct thread_trace *thread_trace__new(struct trace *trace)
> {
> struct thread_trace *ttrace = zalloc(sizeof(struct thread_trace));
>
> if (ttrace) {
> ttrace->files.max = -1;
> - if (trace->summary)
> - ttrace->syscall_stats = intlist__new(NULL);
> + if (trace->summary) {
> + ttrace->syscall_stats = alloc_syscall_stats();
> + if (IS_ERR(ttrace->syscall_stats)) {
> + free(ttrace);
> + ttrace = NULL;
nit: zfree(&ttrace);
> + }
> + }
> }
>
> return ttrace;
> @@ -1544,7 +1577,7 @@ static void thread_trace__delete(void *pttrace)
> if (!ttrace)
> return;
>
> - intlist__delete(ttrace->syscall_stats);
> + delete_syscall_stats(ttrace->syscall_stats);
> ttrace->syscall_stats = NULL;
> thread_trace__free_files(ttrace);
> zfree(&ttrace->entry_str);
> @@ -2463,22 +2496,19 @@ struct syscall_stats {
> static void thread__update_stats(struct thread *thread, struct thread_trace *ttrace,
> int id, struct perf_sample *sample, long err, bool errno_summary)
> {
> - struct int_node *inode;
> - struct syscall_stats *stats;
> + struct syscall_stats *stats = NULL;
> u64 duration = 0;
>
> - inode = intlist__findnew(ttrace->syscall_stats, id);
> - if (inode == NULL)
> - return;
> -
> - stats = inode->priv;
> - if (stats == NULL) {
> + if (!hashmap__find(ttrace->syscall_stats, id, &stats)) {
> stats = zalloc(sizeof(*stats));
> if (stats == NULL)
> return;
>
> init_stats(&stats->stats);
> - inode->priv = stats;
> + if (hashmap__add(ttrace->syscall_stats, id, stats) < 0) {
> + free(stats);
> + return;
> + }
> }
>
> if (ttrace->entry_time && sample->time > ttrace->entry_time)
> @@ -4617,18 +4647,45 @@ static size_t trace__fprintf_threads_header(FILE *fp)
> return printed;
> }
>
> -DEFINE_RESORT_RB(syscall_stats, a->msecs > b->msecs,
> +struct syscall_entry {
> struct syscall_stats *stats;
> double msecs;
> int syscall;
> -)
> +};
> +
> +static int entry_cmp(const void *e1, const void *e2)
> +{
> + const struct syscall_entry *entry1 = e1;
> + const struct syscall_entry *entry2 = e2;
> +
> + return entry1->msecs > entry2->msecs ? -1 : 1;
> +}
> +
> +static struct syscall_entry *thread__sort_stats(struct thread_trace *ttrace)
> {
> - struct int_node *source = rb_entry(nd, struct int_node, rb_node);
> - struct syscall_stats *stats = source->priv;
> + struct syscall_entry *entry;
> + struct hashmap_entry *pos;
> + unsigned bkt, i, nr;
> +
> + nr = ttrace->syscall_stats->sz;
> + entry = malloc(nr * sizeof(*entry));
> + if (entry == NULL)
> + return NULL;
> +
> + i = 0;
> + hashmap__for_each_entry(ttrace->syscall_stats, pos, bkt) {
> + struct syscall_stats *ss = pos->pvalue;
> + struct stats *st = &ss->stats;
>
> - entry->syscall = source->i;
> - entry->stats = stats;
> - entry->msecs = stats ? (u64)stats->stats.n * (avg_stats(&stats->stats) / NSEC_PER_MSEC) : 0;
> + entry[i].stats = ss;
> + entry[i].msecs = (u64)st->n * (avg_stats(st) / NSEC_PER_MSEC);
> + entry[i].syscall = pos->key;
> + i++;
> + }
> + assert(i == nr);
> +
> + qsort(entry, nr, sizeof(*entry), entry_cmp);
> + return entry;
> }
>
> static size_t thread__dump_stats(struct thread_trace *ttrace,
> @@ -4636,10 +4693,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
> {
> size_t printed = 0;
> struct syscall *sc;
> - struct rb_node *nd;
> - DECLARE_RESORT_RB_INTLIST(syscall_stats, ttrace->syscall_stats);
> + struct syscall_entry *entries;
>
> - if (syscall_stats == NULL)
> + entries = thread__sort_stats(ttrace);
> + if (entries == NULL)
> return 0;
>
> printed += fprintf(fp, "\n");
> @@ -4648,8 +4705,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
> printed += fprintf(fp, " (msec) (msec) (msec) (msec) (%%)\n");
> printed += fprintf(fp, " --------------- -------- ------ -------- --------- --------- --------- ------\n");
>
> - resort_rb__for_each_entry(nd, syscall_stats) {
> - struct syscall_stats *stats = syscall_stats_entry->stats;
> + for (size_t i = 0; i < ttrace->syscall_stats->sz; i++) {
> + struct syscall_entry *entry = &entries[i];
> + struct syscall_stats *stats = entry->stats;
> +
> if (stats) {
> double min = (double)(stats->stats.min) / NSEC_PER_MSEC;
> double max = (double)(stats->stats.max) / NSEC_PER_MSEC;
> @@ -4660,10 +4719,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
> pct = avg ? 100.0 * stddev_stats(&stats->stats) / avg : 0.0;
> avg /= NSEC_PER_MSEC;
>
> - sc = &trace->syscalls.table[syscall_stats_entry->syscall];
> + sc = &trace->syscalls.table[entry->syscall];
> printed += fprintf(fp, " %-15s", sc->name);
> printed += fprintf(fp, " %8" PRIu64 " %6" PRIu64 " %9.3f %9.3f %9.3f",
> - n, stats->nr_failures, syscall_stats_entry->msecs, min, avg);
> + n, stats->nr_failures, entry->msecs, min, avg);
> printed += fprintf(fp, " %9.3f %9.2f%%\n", max, pct);
>
> if (trace->errno_summary && stats->nr_failures) {
> @@ -4677,7 +4736,7 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
> }
> }
>
> - resort_rb__delete(syscall_stats);
> + free(entries);
> printed += fprintf(fp, "\n\n");
>
> return printed;
> --
> 2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] perf trace: Add --summary-mode option
2025-02-12 18:49 ` Namhyung Kim
@ 2025-02-12 20:59 ` Arnaldo Carvalho de Melo
2025-02-12 22:10 ` Namhyung Kim
0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-02-12 20:59 UTC (permalink / raw)
To: Namhyung Kim
Cc: Howard Chu, Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users
On Wed, Feb 12, 2025 at 10:49:44AM -0800, Namhyung Kim wrote:
> On Fri, Feb 07, 2025 at 09:25:02PM -0800, Howard Chu wrote:
> > Hello Namhyung,
> >
> > On Wed, Feb 5, 2025 at 12:54 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > I've realized that perf trace shows system call summary at the end for
> > > each thread. But sometimes users want to see the global whole system
> > > summary or statistics instead.
> > >
> > > So I've added --summary-mode option like below:
> > >
> > > $ sudo ./perf trace -as --summary-mode=total sleep 1
> > >
> > > Summary of events:
> > >
> > > total, 21580 events
> > >
> > > syscall calls errors total min avg max stddev
> > > (msec) (msec) (msec) (msec) (%)
> > > --------------- -------- ------ -------- --------- --------- --------- ------
> > > epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87%
> > > futex 1256 89 13331.197 0.000 10.614 733.722 15.49%
> > > poll 669 0 6806.618 0.000 10.174 459.316 11.77%
> > > ppoll 220 0 3968.797 0.000 18.040 516.775 25.35%
> > > clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00%
> > > epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29%
> > > nanosleep 16 0 60.515 0.000 3.782 10.123 33.33%
> > > ioctl 510 0 4.284 0.001 0.008 0.182 8.84%
> > > recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37%
> > > write 1393 0 2.854 0.001 0.002 0.017 1.79%
> > > read 1063 100 2.236 0.000 0.002 0.083 5.11%
> > > ...
> > >
> > > Also it changes internal data structure to hash table to track
> > > statistics of syscalls. And removes the rb_resort code.
> > >
> > > v3 changes)
> > > * Add Acked-by from Howard
> > > * Update commit messages (Ian)
> >
> > Sorry for the delay.
> >
> > Reviewed-by: Howard Chu <howardchu95@gmail.com>
>
> Thanks for your review!
>
> Arnaldo and Ian, are you ok with this?
I looked at it quickly, some minor nits, apart from that, for the
series:
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Arnaldo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] perf trace: Add --summary-mode option
2025-02-12 20:59 ` Arnaldo Carvalho de Melo
@ 2025-02-12 22:10 ` Namhyung Kim
0 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2025-02-12 22:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Howard Chu, Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users
On Wed, Feb 12, 2025 at 09:59:26PM +0100, Arnaldo Carvalho de Melo wrote:
> On Wed, Feb 12, 2025 at 10:49:44AM -0800, Namhyung Kim wrote:
> > On Fri, Feb 07, 2025 at 09:25:02PM -0800, Howard Chu wrote:
> > > Hello Namhyung,
> > >
> > > On Wed, Feb 5, 2025 at 12:54 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I've realized that perf trace shows system call summary at the end for
> > > > each thread. But sometimes users want to see the global whole system
> > > > summary or statistics instead.
> > > >
> > > > So I've added --summary-mode option like below:
> > > >
> > > > $ sudo ./perf trace -as --summary-mode=total sleep 1
> > > >
> > > > Summary of events:
> > > >
> > > > total, 21580 events
> > > >
> > > > syscall calls errors total min avg max stddev
> > > > (msec) (msec) (msec) (msec) (%)
> > > > --------------- -------- ------ -------- --------- --------- --------- ------
> > > > epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87%
> > > > futex 1256 89 13331.197 0.000 10.614 733.722 15.49%
> > > > poll 669 0 6806.618 0.000 10.174 459.316 11.77%
> > > > ppoll 220 0 3968.797 0.000 18.040 516.775 25.35%
> > > > clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00%
> > > > epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29%
> > > > nanosleep 16 0 60.515 0.000 3.782 10.123 33.33%
> > > > ioctl 510 0 4.284 0.001 0.008 0.182 8.84%
> > > > recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37%
> > > > write 1393 0 2.854 0.001 0.002 0.017 1.79%
> > > > read 1063 100 2.236 0.000 0.002 0.083 5.11%
> > > > ...
> > > >
> > > > Also it changes internal data structure to hash table to track
> > > > statistics of syscalls. And removes the rb_resort code.
> > > >
> > > > v3 changes)
> > > > * Add Acked-by from Howard
> > > > * Update commit messages (Ian)
> > >
> > > Sorry for the delay.
> > >
> > > Reviewed-by: Howard Chu <howardchu95@gmail.com>
> >
> > Thanks for your review!
> >
> > Arnaldo and Ian, are you ok with this?
>
> I looked at it quickly, some minor nits, apart from that, for the
> series:
>
> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thanks, I'll update them.
Namhyung
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on
2025-02-12 20:50 ` Arnaldo Carvalho de Melo
@ 2025-02-13 3:39 ` Namhyung Kim
0 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2025-02-13 3:39 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users, Howard Chu, Wander Costa
On Wed, Feb 12, 2025 at 09:50:32PM +0100, Arnaldo Carvalho de Melo wrote:
> On Wed, Feb 05, 2025 at 12:54:40PM -0800, Namhyung Kim wrote:
> > The syscall stats are used only when summary is requested. Let's avoid
> > unnecessary operations. While at it, let's pass 'trace' pointer
> > directly instead of passing 'output' file pointer and 'summary' option
> > in the 'trace' separately.
>
> root@number:~# perf probe -x ~/bin/perf intlist__new
> Added new event:
> probe_perf:intlist_new (on intlist__new in /home/acme/bin/perf)
>
> You can now use it in all perf tools, such as:
>
> perf record -e probe_perf:intlist_new -aR sleep 1
>
> root@number:~# perf trace -e probe_perf:intlist_new perf trace -e *nanosleep sleep 1
> 0.000 (1000.184 ms): sleep/574971 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7ffda05c1be0) = 0
> root@number:~# perf trace -e probe_perf:intlist_new perf trace -e *nanosleep --summary sleep 1
> 0.000 :574984/574984 probe_perf:intlist_new(__probe_ip: 6861748)
>
> Summary of events:
>
> sleep (574985), 2 events, 25.0%
>
> syscall calls errors total min avg max stddev
> (msec) (msec) (msec) (msec) (%)
> --------------- -------- ------ -------- --------- --------- --------- ------
> clock_nanosleep 1 0 1000.205 1000.205 1000.205 1000.205 0.00%
>
> root@number:~#
>
> I'm trying to convince a colleague to have this short streamlined to
> something like:
>
> # perf trace -e ~/bin/perf/intlist_new()/ perf trace -e *nanosleep sleep 1
>
> Or some other event syntax that allows us to specify compactly and
> unambiguously that we want to put a probe if one isn't there yet and
> inside the () to specify which of the arguments we want collected, etc,
> to save the 'perf probe' step, adding the probe and removing it if it
> isn't there or reusing a pre-existing, compatible one.
Yep, probably we need a syntax without '/' since it'd be confused by
the path separators.
>
> Anyway, for the patch tested:
>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thanks,
Namhyung
>
> - Arnaldo
>
> > Acked-by: Howard Chu <howardchu95@gmail.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/builtin-trace.c | 21 +++++++++++----------
> > 1 file changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> > index ac97632f13dc8f7c..7e0324a2e9182088 100644
> > --- a/tools/perf/builtin-trace.c
> > +++ b/tools/perf/builtin-trace.c
> > @@ -1522,13 +1522,14 @@ struct thread_trace {
> > struct intlist *syscall_stats;
> > };
> >
> > -static struct thread_trace *thread_trace__new(void)
> > +static struct thread_trace *thread_trace__new(struct trace *trace)
> > {
> > struct thread_trace *ttrace = zalloc(sizeof(struct thread_trace));
> >
> > if (ttrace) {
> > ttrace->files.max = -1;
> > - ttrace->syscall_stats = intlist__new(NULL);
> > + if (trace->summary)
> > + ttrace->syscall_stats = intlist__new(NULL);
> > }
> >
> > return ttrace;
> > @@ -1550,7 +1551,7 @@ static void thread_trace__delete(void *pttrace)
> > free(ttrace);
> > }
> >
> > -static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
> > +static struct thread_trace *thread__trace(struct thread *thread, struct trace *trace)
> > {
> > struct thread_trace *ttrace;
> >
> > @@ -1558,7 +1559,7 @@ static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
> > goto fail;
> >
> > if (thread__priv(thread) == NULL)
> > - thread__set_priv(thread, thread_trace__new());
> > + thread__set_priv(thread, thread_trace__new(trace));
> >
> > if (thread__priv(thread) == NULL)
> > goto fail;
> > @@ -1568,7 +1569,7 @@ static struct thread_trace *thread__trace(struct thread *thread, FILE *fp)
> >
> > return ttrace;
> > fail:
> > - color_fprintf(fp, PERF_COLOR_RED,
> > + color_fprintf(trace->output, PERF_COLOR_RED,
> > "WARNING: not enough memory, dropping samples!\n");
> > return NULL;
> > }
> > @@ -2622,7 +2623,7 @@ static int trace__sys_enter(struct trace *trace, struct evsel *evsel,
> > return -1;
> >
> > thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> > - ttrace = thread__trace(thread, trace->output);
> > + ttrace = thread__trace(thread, trace);
> > if (ttrace == NULL)
> > goto out_put;
> >
> > @@ -2699,7 +2700,7 @@ static int trace__fprintf_sys_enter(struct trace *trace, struct evsel *evsel,
> > return -1;
> >
> > thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> > - ttrace = thread__trace(thread, trace->output);
> > + ttrace = thread__trace(thread, trace);
> > /*
> > * We need to get ttrace just to make sure it is there when syscall__scnprintf_args()
> > * and the rest of the beautifiers accessing it via struct syscall_arg touches it.
> > @@ -2771,7 +2772,7 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
> > return -1;
> >
> > thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> > - ttrace = thread__trace(thread, trace->output);
> > + ttrace = thread__trace(thread, trace);
> > if (ttrace == NULL)
> > goto out_put;
> >
> > @@ -2960,7 +2961,7 @@ static int trace__sched_stat_runtime(struct trace *trace, struct evsel *evsel,
> > struct thread *thread = machine__findnew_thread(trace->host,
> > sample->pid,
> > sample->tid);
> > - struct thread_trace *ttrace = thread__trace(thread, trace->output);
> > + struct thread_trace *ttrace = thread__trace(thread, trace);
> >
> > if (ttrace == NULL)
> > goto out_dump;
> > @@ -3214,7 +3215,7 @@ static int trace__pgfault(struct trace *trace,
> > }
> > }
> >
> > - ttrace = thread__trace(thread, trace->output);
> > + ttrace = thread__trace(thread, trace);
> > if (ttrace == NULL)
> > goto out_put;
> >
> > --
> > 2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] perf trace: Add --summary-mode option
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
` (4 preceding siblings ...)
2025-02-08 5:25 ` [PATCH v3 0/4] " Howard Chu
@ 2025-02-13 17:21 ` Namhyung Kim
5 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2025-02-13 17:21 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang, Namhyung Kim
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Howard Chu
On Wed, 05 Feb 2025 12:54:39 -0800, Namhyung Kim wrote:
> I've realized that perf trace shows system call summary at the end for
> each thread. But sometimes users want to see the global whole system
> summary or statistics instead.
>
> So I've added --summary-mode option like below:
>
> $ sudo ./perf trace -as --summary-mode=total sleep 1
>
> [...]
Applied to perf-tools-next, thanks!
Best regards,
Namhyung
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-02-13 17:21 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05 20:54 [PATCH v3 0/4] perf trace: Add --summary-mode option Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 1/4] perf trace: Allocate syscall stats only if summary is on Namhyung Kim
2025-02-12 20:50 ` Arnaldo Carvalho de Melo
2025-02-13 3:39 ` Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 2/4] perf trace: Convert syscall_stats to hashmap Namhyung Kim
2025-02-12 20:54 ` Arnaldo Carvalho de Melo
2025-02-05 20:54 ` [PATCH v3 3/4] perf tools: Get rid of now-unused rb_resort.h Namhyung Kim
2025-02-05 20:54 ` [PATCH v3 4/4] perf trace: Add --summary-mode option Namhyung Kim
2025-02-08 5:25 ` [PATCH v3 0/4] " Howard Chu
2025-02-12 18:49 ` Namhyung Kim
2025-02-12 20:59 ` Arnaldo Carvalho de Melo
2025-02-12 22:10 ` Namhyung Kim
2025-02-13 17:21 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).