From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>,
irogers@google.com, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org, eranian@google.com,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2] perf report: Add wall-clock and parallelism profiling
Date: Tue, 14 Jan 2025 14:52:59 -0300 [thread overview]
Message-ID: <Z4akewi7UPXpagce@x1> (raw)
In-Reply-To: <CACT4Y+b6nbqNHt1MwT_KfWjNrMA_1qvMD=Gkk9gfZNXLL79xdQ@mail.gmail.com>
On Tue, Jan 14, 2025 at 05:07:14PM +0100, Dmitry Vyukov wrote:
> On Tue, 14 Jan 2025 at 16:57, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > On Tue, Jan 14, 2025 at 09:26:57AM +0100, Dmitry Vyukov wrote:
> > > On Tue, 14 Jan 2025 at 02:51, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > I understand your point but I think it has some limitation so maybe it's
> > > > better to put in a separate mode with special flags.
> > > I hate all options.
> > > The 3rd option is to add a _mandary_ flag to perf record to ask for
> > > profiling mode. Then the old "perf record" will say "nope, you need to
> > > choose, here are your options and explanation of the options".
> > Well, we have a line for tips about perf usage at the botton, for
> > instance, when I ran 'perf report' to check it it showed:
> > Tip: Add -I to perf record to sample register values, which will be visible in perf report sample context.
> > We could add a line with:
> > Tip: Use '-s wallclock' to get wallclock/latency profiling
> Thanks for the review!
> This is an option too.
> I am not very strong about the exact option we choose, I understand
> what I proposed has downsides too.
> I am ready to go with a solution we will agree on here.
> > And when using it we could have another tip:
> >
> > Tip: Use 'perf config report.mode=wallclock' to change the default
> >
> > ⬢ [acme@toolbox pahole]$ perf config report.mode=wallclock
> > ⬢ [acme@toolbox pahole]$ perf config report.mode
> > report.mode=wallclock
> > ⬢ [acme@toolbox pahole]$
> >
> > Or something along these lines.
> > Or we could have a popup that would appear with a "Important notice",
> > explaining the wallclock mode and asking the user to opt-in to having
> > both, just one or keep the old mode, that would be a quick, temporary
> > "annoyance" for people used with the current mode while bringing this
> > cool new mode to the attention of users.
> Is there an example of doing something similar in the code?
Sure, for the config part look at "report.children", that is processed
in tools/perf/builtin-report.c report__config() function:
if (!strcmp(var, "report.children")) {
symbol_conf.cumulate_callchain = perf_config_bool(var, value);
return 0;
}
To test, capture with callchains:
root@number:~# perf record -g -a -e cpu_core/cycles/P sleep 5
[ perf record: Woken up 97 times to write data ]
[ perf record: Captured and wrote 28.084 MB perf.data (194988 samples) ]
root@number:~#
root@number:~# perf evlist
cpu_core/cycles/P
dummy:u
root@number:~#
The default ends up being:
root@number:~# perf report
Samples: 194K of event 'cpu_core/cycles/P', Event count (approx.): 242759929168
Children Self Command Shared Object Symbol
+ 9.25% 0.00% cc1 [unknown] [.] 0000000000000000
+ 7.23% 0.05% cc1 [kernel.kallsyms] [k] entry_SYSCALL_64
+ 7.09% 0.02% cc1 [kernel.kallsyms] [k] do_syscall_64
+ 4.43% 0.02% cc1 [kernel.kallsyms] [k] asm_exc_page_fault
+ 3.89% 0.06% cc1 libc.so.6 [.] __GI___getcwd
The other mode:
root@number:~# perf report --no-children
Samples: 194K of event 'cpu_core/cycles/P', Event count (approx.): 242759929168
Overhead Command Shared Object Symbol
+ 2.43% cc1 cc1 [.] ht_lookup_with_hash(ht*, unsigned char const*, unsigned long, unsigned int, ht_lookup_option)
+ 1.62% cc1 cc1 [.] _cpp_lex_direct
+ 1.34% cc1 cc1 [.] iterative_hash
+ 1.09% cc1 cc1 [.] bitmap_set_bit(bitmap_head*, int)
+ 0.98% cc1 libc.so.6 [.] sysmalloc
+ 0.97% cc1 libc.so.6 [.] _int_malloc
So people not liking the default do:
root@number:~# perf config report.children=no
root@number:~# perf config report.children
report.children=no
root@number:~#
To avoid having to use --no-children.
See tools/perf/Documentation/perf-config.txt for more of these config
options.
We don't have the popup that sets the ~/.perfconfig variable, but I
think it is easily possible using ui_browser__dialog_yesno()... ok, look
at the patch below, with it:
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf config annotate.disassemblers=objdump,llvm
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,llvm
root@x1:~# perf report
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,llvm
[report]
wallclock = yes
root@x1:~#
So with it when the user doesn't have that "report.wallclock=yes" (or
"no") set, i.e. everybody, the question will be made, once answered it
will not be made again.
> > Another idea, more general, is to be able to press some hotkey that
> > would toggle available sort keys, like we have, for instance, 'k' to
> > toggle the 'Shared object' column in the default 'perf report/top'
> > TUI while filtering out non-kernel samples.
> >
> > But my initial feeling is that you're right and we want this wallclock
> > column on by default, its just a matter of finding a convenient/easy way
> > for users to opt-in.
> >
> > Being able to quickly toggle ordering by the Overhead/Wallclock columns
> > also looks useful.
>
> I've tried to do something similar, e.g. filtering by parallelism
> level on-the-fly, thus this patch:
> https://lore.kernel.org/linux-perf-users/CACT4Y+b8X7PnUwQtuU2QXSqumNDbN8pWDm8EX+wnvgNAKbW0xw@mail.gmail.com/T/#t
> But it turned out the reload functionality is more broken than working.
>
> I agree it would be nice to have these usability improvements.
Right, and I'm not talking about those as a requirement for your patch
to get accepted.
> But this is also a large change, so I would prefer to merge the basic
> functionality, and then we can do some improvements on top. This is
> also the first time I am touching perf code, there are too many new
> things to learn for me.
I'm happy you made the effort and will try to help you as much as I can.
I agree that, with the patch below, that I'll try to split and merge
before travelling, we'll have a good starting point.
Here is the patch, I'll split the perf_config__set_variable into a
separate patch and merge, so that you can continue from there. The
"<LONGER EXPLANATION>" part should tell the user that 'perf config
report.workload=yes|no' can be done later, to switch the mode, if one
doesn't want to continue with it as default.
Thanks,
- Arnaldo
diff --git a/tools/perf/builtin-config.c b/tools/perf/builtin-config.c
index 2e8363778935e8d5..45b5312fbe8370e8 100644
--- a/tools/perf/builtin-config.c
+++ b/tools/perf/builtin-config.c
@@ -154,6 +154,44 @@ static int parse_config_arg(char *arg, char **var, char **value)
return 0;
}
+int perf_config__set_variable(const char *var, const char *value)
+{
+ char path[PATH_MAX];
+ char *user_config = mkpath(path, sizeof(path), "%s/.perfconfig", getenv("HOME"));
+ const char *config_filename;
+ struct perf_config_set *set;
+ int ret = -1;
+
+ if (use_system_config)
+ config_exclusive_filename = perf_etc_perfconfig();
+ else if (use_user_config)
+ config_exclusive_filename = user_config;
+
+ if (!config_exclusive_filename)
+ config_filename = user_config;
+ else
+ config_filename = config_exclusive_filename;
+
+ set = perf_config_set__new();
+ if (!set)
+ goto out_err;
+
+ if (perf_config_set__collect(set, config_filename, var, value) < 0) {
+ pr_err("Failed to add '%s=%s'\n", var, value);
+ goto out_err;
+ }
+
+ if (set_config(set, config_filename) < 0) {
+ pr_err("Failed to set the configs on %s\n", config_filename);
+ goto out_err;
+ }
+
+ ret = 0;
+out_err:
+ perf_config_set__delete(set);
+ return ret;
+}
+
int cmd_config(int argc, const char **argv)
{
int i, ret = -1;
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f5fbd670d619a32a..ccd6eef8ece6e238 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -50,6 +50,7 @@
#include "util/units.h"
#include "util/util.h" // perf_tip()
#include "ui/ui.h"
+#include "ui/util.h"
#include "ui/progress.h"
#include "util/block-info.h"
@@ -99,6 +100,8 @@ struct report {
bool disable_order;
bool skip_empty;
bool data_type;
+ bool wallclock_config_set;
+ bool use_wallclock;
int max_stack;
struct perf_read_values show_threads_values;
const char *pretty_printing_style;
@@ -151,6 +154,11 @@ static int report__config(const char *var, const char *value, void *cb)
}
return 0;
}
+ if (!strcmp(var, "report.wallclock")) {
+ rep->use_wallclock = perf_config_bool(var, value);
+ rep->wallclock_config_set = true;
+ return 0;
+ }
if (!strcmp(var, "report.skip-empty")) {
rep->skip_empty = perf_config_bool(var, value);
@@ -1089,6 +1097,15 @@ static int __cmd_report(struct report *rep)
report__warn_kptr_restrict(rep);
+ if (!rep->wallclock_config_set) {
+ if (ui__dialog_yesno("Do you want to use wallclock/latency mode?\n"
+ "<LONGER EXPLANATION>\n")) {
+ rep->use_wallclock = true;
+ // do other prep to add the "wallclock" key to the sort order, etc
+ perf_config__set_variable("report.wallclock", "yes");
+ }
+ }
+
evlist__for_each_entry(session->evlist, pos)
rep->nr_entries += evsel__hists(pos)->nr_entries;
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 9971313d61c1e5c6..a727c95cb1195e06 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -50,6 +50,7 @@ int perf_config_set__collect(struct perf_config_set *set, const char *file_name,
const char *var, const char *value);
void perf_config__exit(void);
void perf_config__refresh(void);
+int perf_config__set_variable(const char *var, const char *value);
/**
* perf_config_sections__for_each - iterate thru all the sections
next prev parent reply other threads:[~2025-01-14 17:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 8:24 [PATCH] tools/perf: Add wall-clock and parallelism profiling Dmitry Vyukov
2025-01-08 8:34 ` Dmitry Vyukov
2025-01-13 12:25 ` Dmitry Vyukov
2025-01-13 13:40 ` [PATCH v2] perf report: " Dmitry Vyukov
2025-01-14 1:51 ` Namhyung Kim
2025-01-14 8:26 ` Dmitry Vyukov
2025-01-14 15:56 ` Arnaldo Carvalho de Melo
2025-01-14 16:07 ` Dmitry Vyukov
2025-01-14 17:52 ` Arnaldo Carvalho de Melo [this message]
2025-01-14 18:16 ` Arnaldo Carvalho de Melo
2025-01-19 10:22 ` Dmitry Vyukov
2025-01-15 0:30 ` Namhyung Kim
2025-01-19 10:50 ` Dmitry Vyukov
2025-01-24 10:46 ` Dmitry Vyukov
2025-01-24 17:56 ` Namhyung Kim
2025-01-15 5:59 ` Ian Rogers
2025-01-15 7:11 ` Dmitry Vyukov
2025-01-16 18:55 ` Namhyung Kim
2025-01-19 11:08 ` Off-CPU sampling (was perf report: Add wall-clock and parallelism profiling) Dmitry Vyukov
2025-01-23 23:34 ` Namhyung Kim
2025-01-24 19:02 ` [PATCH v2] perf report: Add wall-clock and parallelism profiling Namhyung Kim
2025-01-27 10:01 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4akewi7UPXpagce@x1 \
--to=acme@kernel.org \
--cc=dvyukov@google.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.