From: Andi Kleen <ak@linux.intel.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: namhyung@kernel.org, irogers@google.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [PATCH v5 0/8] perf report: Add latency and parallelism profiling
Date: Thu, 06 Feb 2025 10:30:37 -0800 [thread overview]
Message-ID: <87ldujkjsi.fsf@linux.intel.com> (raw)
In-Reply-To: <cover.1738772628.git.dvyukov@google.com> (Dmitry Vyukov's message of "Wed, 5 Feb 2025 17:27:39 +0100")
Dmitry Vyukov <dvyukov@google.com> writes:
> There are two notions of time: wall-clock time and CPU time.
> For a single-threaded program, or a program running on a single-core
> machine, these notions are the same. However, for a multi-threaded/
> multi-process program running on a multi-core machine, these notions are
> significantly different. Each second of wall-clock time we have
> number-of-cores seconds of CPU time.
I'm curious how does this interact with the time / --time-quantum sort key?
I assume it just works, but might be worth checking.
It was intended to address some of these issues too.
> Optimizing CPU overhead is useful to improve 'throughput', while
> optimizing wall-clock overhead is useful to improve 'latency'.
> These profiles are complementary and are not interchangeable.
> Examples of where latency profile is needed:
> - optimzing build latency
> - optimizing server request latency
> - optimizing ML training/inference latency
> - optimizing running time of any command line program
>
> CPU profile is useless for these use cases at best (if a user understands
> the difference), or misleading at worst (if a user tries to use a wrong
> profile for a job).
I would agree in the general case, but not if the time sort key
is chosen with a suitable quantum. You can see how the parallelism
changes over time then which is often a good enough proxy.
> We still default to the CPU profile, so it's up to users to learn
> about the second profiling mode and use it when appropriate.
You should add it to tips.txt then
> .../callchain-overhead-calculation.txt | 5 +-
> .../cpu-and-latency-overheads.txt | 85 ++++++++++++++
> tools/perf/Documentation/perf-record.txt | 4 +
> tools/perf/Documentation/perf-report.txt | 54 ++++++---
> tools/perf/Documentation/tips.txt | 3 +
> tools/perf/builtin-record.c | 20 ++++
> tools/perf/builtin-report.c | 39 +++++++
> tools/perf/ui/browsers/hists.c | 27 +++--
> tools/perf/ui/hist.c | 104 ++++++++++++------
> tools/perf/util/addr_location.c | 1 +
> tools/perf/util/addr_location.h | 7 +-
> tools/perf/util/event.c | 11 ++
> tools/perf/util/events_stats.h | 2 +
> tools/perf/util/hist.c | 90 ++++++++++++---
> tools/perf/util/hist.h | 32 +++++-
> tools/perf/util/machine.c | 7 ++
> tools/perf/util/machine.h | 6 +
> tools/perf/util/sample.h | 2 +-
> tools/perf/util/session.c | 12 ++
> tools/perf/util/session.h | 1 +
> tools/perf/util/sort.c | 69 +++++++++++-
> tools/perf/util/sort.h | 3 +-
> tools/perf/util/symbol.c | 34 ++++++
> tools/perf/util/symbol_conf.h | 8 +-
We traditionally didn't do it, but in general test coverage
of perf report is too low, so I would recommend to add some simple
test case in the perf test scripts.
-Andi
next prev parent reply other threads:[~2025-02-06 18:30 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-05 16:27 [PATCH v5 0/8] perf report: Add latency and parallelism profiling Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 1/8] perf report: Add machine parallelism Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 2/8] perf report: Add parallelism sort key Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 3/8] perf report: Switch filtered from u8 to u16 Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 4/8] perf report: Add parallelism filter Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 5/8] perf report: Add latency output field Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 6/8] perf report: Add --latency flag Dmitry Vyukov
2025-02-07 3:44 ` Namhyung Kim
2025-02-07 7:23 ` Dmitry Vyukov
2025-02-11 1:02 ` Namhyung Kim
2025-02-11 8:30 ` Dmitry Vyukov
2025-02-11 8:42 ` Dmitry Vyukov
2025-02-11 17:42 ` Namhyung Kim
2025-02-11 20:23 ` Dmitry Vyukov
2025-02-12 19:47 ` Namhyung Kim
2025-02-13 9:09 ` Dmitry Vyukov
2025-02-07 3:53 ` Namhyung Kim
2025-02-07 11:42 ` Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 7/8] perf report: Add latency and parallelism profiling documentation Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 8/8] perf hist: Shrink struct hist_entry size Dmitry Vyukov
2025-02-06 18:30 ` Andi Kleen [this message]
2025-02-06 18:41 ` [PATCH v5 0/8] perf report: Add latency and parallelism profiling Dmitry Vyukov
2025-02-06 18:51 ` Ian Rogers
2025-02-07 3:57 ` Namhyung Kim
2025-02-07 11:44 ` Dmitry Vyukov
2025-02-06 18:57 ` Andi Kleen
2025-02-06 19:07 ` Andi Kleen
2025-02-07 8:16 ` Dmitry Vyukov
2025-02-07 18:30 ` Andi Kleen
2025-02-10 7:17 ` Dmitry Vyukov
2025-02-10 17:11 ` Andi Kleen
2025-02-13 9:09 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ldujkjsi.fsf@linux.intel.com \
--to=ak@linux.intel.com \
--cc=acme@kernel.org \
--cc=dvyukov@google.com \
--cc=irogers@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=namhyung@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).