Re: [PATCH v2] perf report: Add wall-clock and parallelism profiling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: irogers@google.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, eranian@google.com
Subject: Re: [PATCH v2] perf report: Add wall-clock and parallelism profiling
Date: Mon, 13 Jan 2025 17:51:35 -0800	[thread overview]
Message-ID: <Z4XDJyvjiie3howF@google.com> (raw)
In-Reply-To: <20250113134022.2545894-1-dvyukov@google.com>

Hello,

On Mon, Jan 13, 2025 at 02:40:06PM +0100, Dmitry Vyukov wrote:
> There are two notions of time: wall-clock time and CPU time.
> For a single-threaded program, or a program running on a single-core
> machine, these notions are the same. However, for a multi-threaded/
> multi-process program running on a multi-core machine, these notions are
> significantly different. Each second of wall-clock time we have
> number-of-cores seconds of CPU time.
> 
> Currently perf only allows to profile CPU time. Perf (and all other
> existing profilers to the be best of my knowledge) does not allow to
> profile wall-clock time.
> 
> Optimizing CPU overhead is useful to improve 'throughput', while
> optimizing wall-clock overhead is useful to improve 'latency'.
> These profiles are complementary and are not interchangeable.
> Examples of where wall-clock profile is needed:
>  - optimzing build latency
>  - optimizing server request latency
>  - optimizing ML training/inference latency
>  - optimizing running time of any command line program
> 
> CPU profile is useless for these use cases at best (if a user understands
> the difference), or misleading at worst (if a user tries to use a wrong
> profile for a job).
> 
> This patch adds wall-clock and parallelization profiling.
> See the added documentation and flags descriptions for details.
> 
> Brief outline of the implementation:
>  - add context switch collection during record
>  - calculate number of threads running on CPUs (parallelism level)
>    during report
>  - divide each sample weight by the parallelism level
> This effectively models that we were taking 1 sample per unit of
> wall-clock time.

Thanks for working on this, very interesting!

But I guess this implementation depends on cpu-cycles event and single
target process.  Do you think if it'd work for system-wide profiling?
How do you define wall-clock overhead if the event counts something
different (like the number of L3 cache-misses)?

Also I'm not sure about the impact of context switch events which could
generate a lot of records that may end up with losing some of them.  And
in that case the parallelism tracking would break.

> 
> The feature is added on an equal footing with the existing CPU profiling
> rather than a separate mode enabled with special flags. The reasoning is
> that users may not understand the problem and the meaning of numbers they
> are seeing in the first place, so won't even realize that they may need
> to be looking for some different profiling mode. When they are presented
> with 2 sets of different numbers, they should start asking questions.

I understand your point but I think it has some limitation so maybe it's
better to put in a separate mode with special flags.

Thanks,
Namhyung

next prev parent reply	other threads:[~2025-01-14  1:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08  8:24 [PATCH] tools/perf: Add wall-clock and parallelism profiling Dmitry Vyukov
2025-01-08  8:34 ` Dmitry Vyukov
2025-01-13 12:25   ` Dmitry Vyukov
2025-01-13 13:40     ` [PATCH v2] perf report: " Dmitry Vyukov
2025-01-14  1:51       ` Namhyung Kim [this message]
2025-01-14  8:26         ` Dmitry Vyukov
2025-01-14 15:56           ` Arnaldo Carvalho de Melo
2025-01-14 16:07             ` Dmitry Vyukov
2025-01-14 17:52               ` Arnaldo Carvalho de Melo
2025-01-14 18:16                 ` Arnaldo Carvalho de Melo
2025-01-19 10:22                 ` Dmitry Vyukov
2025-01-15  0:30           ` Namhyung Kim
2025-01-19 10:50             ` Dmitry Vyukov
2025-01-24 10:46               ` Dmitry Vyukov
2025-01-24 17:56                 ` Namhyung Kim
2025-01-15  5:59           ` Ian Rogers
2025-01-15  7:11             ` Dmitry Vyukov
2025-01-16 18:55               ` Namhyung Kim
2025-01-19 11:08                 ` Off-CPU sampling (was perf report: Add wall-clock and parallelism profiling) Dmitry Vyukov
2025-01-23 23:34                   ` Namhyung Kim
2025-01-24 19:02       ` [PATCH v2] perf report: Add wall-clock and parallelism profiling Namhyung Kim
2025-01-27 10:01         ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4XDJyvjiie3howF@google.com \
    --to=namhyung@kernel.org \
    --cc=dvyukov@google.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.