From: Andi Kleen <ak@linux.intel.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: namhyung@kernel.org, irogers@google.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [PATCH v5 0/8] perf report: Add latency and parallelism profiling
Date: Thu, 06 Feb 2025 10:30:37 -0800 [thread overview]
Message-ID: <87ldujkjsi.fsf@linux.intel.com> (raw)
In-Reply-To: <cover.1738772628.git.dvyukov@google.com> (Dmitry Vyukov's message of "Wed, 5 Feb 2025 17:27:39 +0100")
Dmitry Vyukov <dvyukov@google.com> writes:
> There are two notions of time: wall-clock time and CPU time.
> For a single-threaded program, or a program running on a single-core
> machine, these notions are the same. However, for a multi-threaded/
> multi-process program running on a multi-core machine, these notions are
> significantly different. Each second of wall-clock time we have
> number-of-cores seconds of CPU time.
I'm curious how does this interact with the time / --time-quantum sort key?
I assume it just works, but might be worth checking.
It was intended to address some of these issues too.
> Optimizing CPU overhead is useful to improve 'throughput', while
> optimizing wall-clock overhead is useful to improve 'latency'.
> These profiles are complementary and are not interchangeable.
> Examples of where latency profile is needed:
> - optimzing build latency
> - optimizing server request latency
> - optimizing ML training/inference latency
> - optimizing running time of any command line program
>
> CPU profile is useless for these use cases at best (if a user understands
> the difference), or misleading at worst (if a user tries to use a wrong
> profile for a job).
I would agree in the general case, but not if the time sort key
is chosen with a suitable quantum. You can see how the parallelism
changes over time then which is often a good enough proxy.
> We still default to the CPU profile, so it's up to users to learn
> about the second profiling mode and use it when appropriate.
You should add it to tips.txt then
> .../callchain-overhead-calculation.txt | 5 +-
> .../cpu-and-latency-overheads.txt | 85 ++++++++++++++
> tools/perf/Documentation/perf-record.txt | 4 +
> tools/perf/Documentation/perf-report.txt | 54 ++++++---
> tools/perf/Documentation/tips.txt | 3 +
> tools/perf/builtin-record.c | 20 ++++
> tools/perf/builtin-report.c | 39 +++++++
> tools/perf/ui/browsers/hists.c | 27 +++--
> tools/perf/ui/hist.c | 104 ++++++++++++------
> tools/perf/util/addr_location.c | 1 +
> tools/perf/util/addr_location.h | 7 +-
> tools/perf/util/event.c | 11 ++
> tools/perf/util/events_stats.h | 2 +
> tools/perf/util/hist.c | 90 ++++++++++++---
> tools/perf/util/hist.h | 32 +++++-
> tools/perf/util/machine.c | 7 ++
> tools/perf/util/machine.h | 6 +
> tools/perf/util/sample.h | 2 +-
> tools/perf/util/session.c | 12 ++
> tools/perf/util/session.h | 1 +
> tools/perf/util/sort.c | 69 +++++++++++-
> tools/perf/util/sort.h | 3 +-
> tools/perf/util/symbol.c | 34 ++++++
> tools/perf/util/symbol_conf.h | 8 +-
We traditionally didn't do it, but in general test coverage
of perf report is too low, so I would recommend to add some simple
test case in the perf test scripts.
-Andi
next prev parent reply other threads:[~2025-02-06 18:30 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-05 16:27 [PATCH v5 0/8] perf report: Add latency and parallelism profiling Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 1/8] perf report: Add machine parallelism Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 2/8] perf report: Add parallelism sort key Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 3/8] perf report: Switch filtered from u8 to u16 Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 4/8] perf report: Add parallelism filter Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 5/8] perf report: Add latency output field Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 6/8] perf report: Add --latency flag Dmitry Vyukov
2025-02-07 3:44 ` Namhyung Kim
2025-02-07 7:23 ` Dmitry Vyukov
2025-02-11 1:02 ` Namhyung Kim
2025-02-11 8:30 ` Dmitry Vyukov
2025-02-11 8:42 ` Dmitry Vyukov
2025-02-11 17:42 ` Namhyung Kim
2025-02-11 20:23 ` Dmitry Vyukov
2025-02-12 19:47 ` Namhyung Kim
2025-02-13 9:09 ` Dmitry Vyukov
2025-02-07 3:53 ` Namhyung Kim
2025-02-07 11:42 ` Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 7/8] perf report: Add latency and parallelism profiling documentation Dmitry Vyukov
2025-02-05 16:27 ` [PATCH v5 8/8] perf hist: Shrink struct hist_entry size Dmitry Vyukov
2025-02-06 18:30 ` Andi Kleen [this message]
2025-02-06 18:41 ` [PATCH v5 0/8] perf report: Add latency and parallelism profiling Dmitry Vyukov
2025-02-06 18:51 ` Ian Rogers
2025-02-07 3:57 ` Namhyung Kim
2025-02-07 11:44 ` Dmitry Vyukov
2025-02-06 18:57 ` Andi Kleen
2025-02-06 19:07 ` Andi Kleen
2025-02-07 8:16 ` Dmitry Vyukov
2025-02-07 18:30 ` Andi Kleen
2025-02-10 7:17 ` Dmitry Vyukov
2025-02-10 17:11 ` Andi Kleen
2025-02-13 9:09 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ldujkjsi.fsf@linux.intel.com \
--to=ak@linux.intel.com \
--cc=acme@kernel.org \
--cc=dvyukov@google.com \
--cc=irogers@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=namhyung@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.