From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40D5B1FAC5D; Tue, 14 Jan 2025 17:53:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736877183; cv=none; b=i7MhAfZJU+NHob/pYg5v0DQEc2bkLtp94UldWBtZCno9TmDgVkqCag0XX5jFSqCVDa3H9uyUdC7Dm6Lx9BSgXCGpcYXk5s8GLR54K1cn2YUO2v+/Q962RZCzosAy7tMZZ0Ku5cxO0w00kNnCh95RyIhOwpr4d4tRfqT2nL2pRzc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736877183; c=relaxed/simple; bh=UmyDMCIBlDbXeJ4AwTy+f4YgJo3t/KvwozdKaaE/URc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=uk0diIdvUegFcfRH6ZcriEpsEPCn2tJlqIEVbwIYQESTbwVGBOOsvFZqMUH1VvYsR01cQ7mJcd/AfjUlhh6Eg1GvQE1zc3z4/YyZtuF3EfaeyphpTN0EXL2gGnMjq2nk5O2l+BiHIo0amwdGo1Kl0trKLjR1L04dThU+hcQkzwQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WTKDh3gC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WTKDh3gC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 22F31C4CEED; Tue, 14 Jan 2025 17:53:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736877182; bh=UmyDMCIBlDbXeJ4AwTy+f4YgJo3t/KvwozdKaaE/URc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WTKDh3gC0GpaRMDMm1qsbZ+hJJ7X0K1IiTNs3GAXj4CUue38X7j0An3TKVgLHEIi9 sslCsbM7vh5RlNkhLnN2GSYMK014OJQdxXbyDBIgu3EWyjtrOa95Ocovw074dv/RsM jmVqpA3xMFb9JqLd1/ITk0YMIGYyugzJ/D6tE8+IoeRCihAbRIUtmboHjxspHDepOM +kVVCGHLSq5gmoqijUNqVKIdyXSQmmx9gJdCW8YBwNQMW4Bty8j0CQn8aNJKAoboeV D6IdW07+eBQBBxjbJV+r627rEjiRmF0KfzBZSXGR6BpSmesZ7WL8TbUNhh6DeJAdNk xq8JfqnM3BNBQ== Date: Tue, 14 Jan 2025 14:52:59 -0300 From: Arnaldo Carvalho de Melo To: Dmitry Vyukov Cc: Namhyung Kim , irogers@google.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, eranian@google.com, Ingo Molnar , Peter Zijlstra Subject: Re: [PATCH v2] perf report: Add wall-clock and parallelism profiling Message-ID: References: <20250113134022.2545894-1-dvyukov@google.com> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Jan 14, 2025 at 05:07:14PM +0100, Dmitry Vyukov wrote: > On Tue, 14 Jan 2025 at 16:57, Arnaldo Carvalho de Melo wrote: > > On Tue, Jan 14, 2025 at 09:26:57AM +0100, Dmitry Vyukov wrote: > > > On Tue, 14 Jan 2025 at 02:51, Namhyung Kim wrote: > > > > I understand your point but I think it has some limitation so maybe it's > > > > better to put in a separate mode with special flags. > > > I hate all options. > > > The 3rd option is to add a _mandary_ flag to perf record to ask for > > > profiling mode. Then the old "perf record" will say "nope, you need to > > > choose, here are your options and explanation of the options". > > Well, we have a line for tips about perf usage at the botton, for > > instance, when I ran 'perf report' to check it it showed: > > Tip: Add -I to perf record to sample register values, which will be visible in perf report sample context. > > We could add a line with: > > Tip: Use '-s wallclock' to get wallclock/latency profiling > Thanks for the review! > This is an option too. > I am not very strong about the exact option we choose, I understand > what I proposed has downsides too. > I am ready to go with a solution we will agree on here. > > And when using it we could have another tip: > > > > Tip: Use 'perf config report.mode=wallclock' to change the default > > > > ⬢ [acme@toolbox pahole]$ perf config report.mode=wallclock > > ⬢ [acme@toolbox pahole]$ perf config report.mode > > report.mode=wallclock > > ⬢ [acme@toolbox pahole]$ > > > > Or something along these lines. > > Or we could have a popup that would appear with a "Important notice", > > explaining the wallclock mode and asking the user to opt-in to having > > both, just one or keep the old mode, that would be a quick, temporary > > "annoyance" for people used with the current mode while bringing this > > cool new mode to the attention of users. > Is there an example of doing something similar in the code? Sure, for the config part look at "report.children", that is processed in tools/perf/builtin-report.c report__config() function: if (!strcmp(var, "report.children")) { symbol_conf.cumulate_callchain = perf_config_bool(var, value); return 0; } To test, capture with callchains: root@number:~# perf record -g -a -e cpu_core/cycles/P sleep 5 [ perf record: Woken up 97 times to write data ] [ perf record: Captured and wrote 28.084 MB perf.data (194988 samples) ] root@number:~# root@number:~# perf evlist cpu_core/cycles/P dummy:u root@number:~# The default ends up being: root@number:~# perf report Samples: 194K of event 'cpu_core/cycles/P', Event count (approx.): 242759929168 Children Self Command Shared Object Symbol + 9.25% 0.00% cc1 [unknown] [.] 0000000000000000 + 7.23% 0.05% cc1 [kernel.kallsyms] [k] entry_SYSCALL_64 + 7.09% 0.02% cc1 [kernel.kallsyms] [k] do_syscall_64 + 4.43% 0.02% cc1 [kernel.kallsyms] [k] asm_exc_page_fault + 3.89% 0.06% cc1 libc.so.6 [.] __GI___getcwd The other mode: root@number:~# perf report --no-children Samples: 194K of event 'cpu_core/cycles/P', Event count (approx.): 242759929168 Overhead Command Shared Object Symbol + 2.43% cc1 cc1 [.] ht_lookup_with_hash(ht*, unsigned char const*, unsigned long, unsigned int, ht_lookup_option) + 1.62% cc1 cc1 [.] _cpp_lex_direct + 1.34% cc1 cc1 [.] iterative_hash + 1.09% cc1 cc1 [.] bitmap_set_bit(bitmap_head*, int) + 0.98% cc1 libc.so.6 [.] sysmalloc + 0.97% cc1 libc.so.6 [.] _int_malloc So people not liking the default do: root@number:~# perf config report.children=no root@number:~# perf config report.children report.children=no root@number:~# To avoid having to use --no-children. See tools/perf/Documentation/perf-config.txt for more of these config options. We don't have the popup that sets the ~/.perfconfig variable, but I think it is easily possible using ui_browser__dialog_yesno()... ok, look at the patch below, with it: root@x1:~# rm -f ~/.perfconfig root@x1:~# perf config annotate.disassemblers=objdump,llvm root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,llvm root@x1:~# perf report root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,llvm [report] wallclock = yes root@x1:~# So with it when the user doesn't have that "report.wallclock=yes" (or "no") set, i.e. everybody, the question will be made, once answered it will not be made again. > > Another idea, more general, is to be able to press some hotkey that > > would toggle available sort keys, like we have, for instance, 'k' to > > toggle the 'Shared object' column in the default 'perf report/top' > > TUI while filtering out non-kernel samples. > > > > But my initial feeling is that you're right and we want this wallclock > > column on by default, its just a matter of finding a convenient/easy way > > for users to opt-in. > > > > Being able to quickly toggle ordering by the Overhead/Wallclock columns > > also looks useful. > > I've tried to do something similar, e.g. filtering by parallelism > level on-the-fly, thus this patch: > https://lore.kernel.org/linux-perf-users/CACT4Y+b8X7PnUwQtuU2QXSqumNDbN8pWDm8EX+wnvgNAKbW0xw@mail.gmail.com/T/#t > But it turned out the reload functionality is more broken than working. > > I agree it would be nice to have these usability improvements. Right, and I'm not talking about those as a requirement for your patch to get accepted. > But this is also a large change, so I would prefer to merge the basic > functionality, and then we can do some improvements on top. This is > also the first time I am touching perf code, there are too many new > things to learn for me. I'm happy you made the effort and will try to help you as much as I can. I agree that, with the patch below, that I'll try to split and merge before travelling, we'll have a good starting point. Here is the patch, I'll split the perf_config__set_variable into a separate patch and merge, so that you can continue from there. The "" part should tell the user that 'perf config report.workload=yes|no' can be done later, to switch the mode, if one doesn't want to continue with it as default. Thanks, - Arnaldo diff --git a/tools/perf/builtin-config.c b/tools/perf/builtin-config.c index 2e8363778935e8d5..45b5312fbe8370e8 100644 --- a/tools/perf/builtin-config.c +++ b/tools/perf/builtin-config.c @@ -154,6 +154,44 @@ static int parse_config_arg(char *arg, char **var, char **value) return 0; } +int perf_config__set_variable(const char *var, const char *value) +{ + char path[PATH_MAX]; + char *user_config = mkpath(path, sizeof(path), "%s/.perfconfig", getenv("HOME")); + const char *config_filename; + struct perf_config_set *set; + int ret = -1; + + if (use_system_config) + config_exclusive_filename = perf_etc_perfconfig(); + else if (use_user_config) + config_exclusive_filename = user_config; + + if (!config_exclusive_filename) + config_filename = user_config; + else + config_filename = config_exclusive_filename; + + set = perf_config_set__new(); + if (!set) + goto out_err; + + if (perf_config_set__collect(set, config_filename, var, value) < 0) { + pr_err("Failed to add '%s=%s'\n", var, value); + goto out_err; + } + + if (set_config(set, config_filename) < 0) { + pr_err("Failed to set the configs on %s\n", config_filename); + goto out_err; + } + + ret = 0; +out_err: + perf_config_set__delete(set); + return ret; +} + int cmd_config(int argc, const char **argv) { int i, ret = -1; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index f5fbd670d619a32a..ccd6eef8ece6e238 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -50,6 +50,7 @@ #include "util/units.h" #include "util/util.h" // perf_tip() #include "ui/ui.h" +#include "ui/util.h" #include "ui/progress.h" #include "util/block-info.h" @@ -99,6 +100,8 @@ struct report { bool disable_order; bool skip_empty; bool data_type; + bool wallclock_config_set; + bool use_wallclock; int max_stack; struct perf_read_values show_threads_values; const char *pretty_printing_style; @@ -151,6 +154,11 @@ static int report__config(const char *var, const char *value, void *cb) } return 0; } + if (!strcmp(var, "report.wallclock")) { + rep->use_wallclock = perf_config_bool(var, value); + rep->wallclock_config_set = true; + return 0; + } if (!strcmp(var, "report.skip-empty")) { rep->skip_empty = perf_config_bool(var, value); @@ -1089,6 +1097,15 @@ static int __cmd_report(struct report *rep) report__warn_kptr_restrict(rep); + if (!rep->wallclock_config_set) { + if (ui__dialog_yesno("Do you want to use wallclock/latency mode?\n" + "\n")) { + rep->use_wallclock = true; + // do other prep to add the "wallclock" key to the sort order, etc + perf_config__set_variable("report.wallclock", "yes"); + } + } + evlist__for_each_entry(session->evlist, pos) rep->nr_entries += evsel__hists(pos)->nr_entries; diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h index 9971313d61c1e5c6..a727c95cb1195e06 100644 --- a/tools/perf/util/config.h +++ b/tools/perf/util/config.h @@ -50,6 +50,7 @@ int perf_config_set__collect(struct perf_config_set *set, const char *file_name, const char *var, const char *value); void perf_config__exit(void); void perf_config__refresh(void); +int perf_config__set_variable(const char *var, const char *value); /** * perf_config_sections__for_each - iterate thru all the sections