From: Namhyung Kim <namhyung@kernel.org>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: irogers@google.com, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org, eranian@google.com
Subject: Re: [PATCH v2] perf report: Add wall-clock and parallelism profiling
Date: Mon, 13 Jan 2025 17:51:35 -0800 [thread overview]
Message-ID: <Z4XDJyvjiie3howF@google.com> (raw)
In-Reply-To: <20250113134022.2545894-1-dvyukov@google.com>
Hello,
On Mon, Jan 13, 2025 at 02:40:06PM +0100, Dmitry Vyukov wrote:
> There are two notions of time: wall-clock time and CPU time.
> For a single-threaded program, or a program running on a single-core
> machine, these notions are the same. However, for a multi-threaded/
> multi-process program running on a multi-core machine, these notions are
> significantly different. Each second of wall-clock time we have
> number-of-cores seconds of CPU time.
>
> Currently perf only allows to profile CPU time. Perf (and all other
> existing profilers to the be best of my knowledge) does not allow to
> profile wall-clock time.
>
> Optimizing CPU overhead is useful to improve 'throughput', while
> optimizing wall-clock overhead is useful to improve 'latency'.
> These profiles are complementary and are not interchangeable.
> Examples of where wall-clock profile is needed:
> - optimzing build latency
> - optimizing server request latency
> - optimizing ML training/inference latency
> - optimizing running time of any command line program
>
> CPU profile is useless for these use cases at best (if a user understands
> the difference), or misleading at worst (if a user tries to use a wrong
> profile for a job).
>
> This patch adds wall-clock and parallelization profiling.
> See the added documentation and flags descriptions for details.
>
> Brief outline of the implementation:
> - add context switch collection during record
> - calculate number of threads running on CPUs (parallelism level)
> during report
> - divide each sample weight by the parallelism level
> This effectively models that we were taking 1 sample per unit of
> wall-clock time.
Thanks for working on this, very interesting!
But I guess this implementation depends on cpu-cycles event and single
target process. Do you think if it'd work for system-wide profiling?
How do you define wall-clock overhead if the event counts something
different (like the number of L3 cache-misses)?
Also I'm not sure about the impact of context switch events which could
generate a lot of records that may end up with losing some of them. And
in that case the parallelism tracking would break.
>
> The feature is added on an equal footing with the existing CPU profiling
> rather than a separate mode enabled with special flags. The reasoning is
> that users may not understand the problem and the meaning of numbers they
> are seeing in the first place, so won't even realize that they may need
> to be looking for some different profiling mode. When they are presented
> with 2 sets of different numbers, they should start asking questions.
I understand your point but I think it has some limitation so maybe it's
better to put in a separate mode with special flags.
Thanks,
Namhyung
next prev parent reply other threads:[~2025-01-14 1:51 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 8:24 [PATCH] tools/perf: Add wall-clock and parallelism profiling Dmitry Vyukov
2025-01-08 8:34 ` Dmitry Vyukov
2025-01-13 12:25 ` Dmitry Vyukov
2025-01-13 13:40 ` [PATCH v2] perf report: " Dmitry Vyukov
2025-01-14 1:51 ` Namhyung Kim [this message]
2025-01-14 8:26 ` Dmitry Vyukov
2025-01-14 15:56 ` Arnaldo Carvalho de Melo
2025-01-14 16:07 ` Dmitry Vyukov
2025-01-14 17:52 ` Arnaldo Carvalho de Melo
2025-01-14 18:16 ` Arnaldo Carvalho de Melo
2025-01-19 10:22 ` Dmitry Vyukov
2025-01-15 0:30 ` Namhyung Kim
2025-01-19 10:50 ` Dmitry Vyukov
2025-01-24 10:46 ` Dmitry Vyukov
2025-01-24 17:56 ` Namhyung Kim
2025-01-15 5:59 ` Ian Rogers
2025-01-15 7:11 ` Dmitry Vyukov
2025-01-16 18:55 ` Namhyung Kim
2025-01-19 11:08 ` Off-CPU sampling (was perf report: Add wall-clock and parallelism profiling) Dmitry Vyukov
2025-01-23 23:34 ` Namhyung Kim
2025-01-24 19:02 ` [PATCH v2] perf report: Add wall-clock and parallelism profiling Namhyung Kim
2025-01-27 10:01 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4XDJyvjiie3howF@google.com \
--to=namhyung@kernel.org \
--cc=dvyukov@google.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).