Re: [PATCH v4 0/8] perf report: Add latency and parallelism profiling

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: irogers@google.com, acme@kernel.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/8] perf report: Add latency and parallelism profiling
Date: Mon, 3 Feb 2025 20:20:58 -0800	[thread overview]
Message-ID: <Z6GVqh6detN8rbKV@google.com> (raw)
In-Reply-To: <cover.1738592865.git.dvyukov@google.com>

Hello,

On Mon, Feb 03, 2025 at 03:30:35PM +0100, Dmitry Vyukov wrote:
> There are two notions of time: wall-clock time and CPU time.
> For a single-threaded program, or a program running on a single-core
> machine, these notions are the same. However, for a multi-threaded/
> multi-process program running on a multi-core machine, these notions are
> significantly different. Each second of wall-clock time we have
> number-of-cores seconds of CPU time.
> 
> Currently perf only allows to profile CPU time. Perf (and all other
> existing profilers to the be best of my knowledge) does not allow to
> profile wall-clock time.
> 
> Optimizing CPU overhead is useful to improve 'throughput', while
> optimizing wall-clock overhead is useful to improve 'latency'.
> These profiles are complementary and are not interchangeable.
> Examples of where latency profile is needed:
>  - optimzing build latency
>  - optimizing server request latency
>  - optimizing ML training/inference latency
>  - optimizing running time of any command line program
> 
> CPU profile is useless for these use cases at best (if a user understands
> the difference), or misleading at worst (if a user tries to use a wrong
> profile for a job).
> 
> This series add latency and parallelization profiling.
> See the added documentation and flags descriptions for details.
> 
> Brief outline of the implementation:
>  - add context switch collection during record
>  - calculate number of threads running on CPUs (parallelism level)
>    during report
>  - divide each sample weight by the parallelism level
> This effectively models that we were taking 1 sample per unit of
> wall-clock time.
> 
> We still default to the CPU profile, so it's up to users to learn
> about the second profiling mode and use it when appropriate.
> 
> Changes in v4:
>  - added "Shrink struct hist_entry size" commit
>  - rebased to perf-tools-next HEAD
> 
> Changes in v3:
>  - rebase and split into patches
>  - rename 'wallclock' to 'latency' everywhere
>  - don't enable latency profiling by default,
>    instead add record/report --latency flag
> 
> Dmitry Vyukov (8):
>   perf report: Add machine parallelism
>   perf report: Add parallelism sort key
>   perf report: Switch filtered from u8 to u16
>   perf report: Add parallelism filter
>   perf report: Add latency output field
>   perf report: Add --latency flag
>   perf report: Add latency and parallelism profiling documentation
>   perf hist: Shrink struct hist_entry size

I found that stdio output is broken that latency is too high.  TUI is fine.

  $ ./perf report --latency -s dso --stdio
  ...
  #
  #  Latency  Overhead  Shared Object       
  # ........  ........  ....................
  #
     57669.15%    57.67%  [kernel.kallsyms]   
     42330.85%    42.33%  ld-linux-x86-64.so.2

Thanks,
Namhyung

> 
>  .../callchain-overhead-calculation.txt        |  5 +-
>  .../cpu-and-latency-overheads.txt             | 85 ++++++++++++++++++
>  tools/perf/Documentation/perf-report.txt      | 49 ++++++----
>  tools/perf/Documentation/tips.txt             |  3 +
>  tools/perf/builtin-record.c                   | 20 +++++
>  tools/perf/builtin-report.c                   | 39 ++++++++
>  tools/perf/ui/browsers/hists.c                | 27 +++---
>  tools/perf/ui/hist.c                          | 64 +++++++++----
>  tools/perf/util/addr_location.c               |  1 +
>  tools/perf/util/addr_location.h               |  7 +-
>  tools/perf/util/event.c                       | 11 +++
>  tools/perf/util/events_stats.h                |  2 +
>  tools/perf/util/hist.c                        | 90 +++++++++++++++----
>  tools/perf/util/hist.h                        | 32 +++++--
>  tools/perf/util/machine.c                     |  7 ++
>  tools/perf/util/machine.h                     |  6 ++
>  tools/perf/util/sample.h                      |  2 +-
>  tools/perf/util/session.c                     | 12 +++
>  tools/perf/util/session.h                     |  1 +
>  tools/perf/util/sort.c                        | 69 ++++++++++++--
>  tools/perf/util/sort.h                        |  3 +-
>  tools/perf/util/symbol.c                      | 34 +++++++
>  tools/perf/util/symbol_conf.h                 |  8 +-
>  23 files changed, 502 insertions(+), 75 deletions(-)
>  create mode 100644 tools/perf/Documentation/cpu-and-latency-overheads.txt
> 
> 
> base-commit: 8ce0d2da14d3fb62844dd0e95982c194326b1a5f
> -- 
> 2.48.1.362.g079036d154-goog
>

next prev parent reply	other threads:[~2025-02-04  4:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-03 14:30 [PATCH v4 0/8] perf report: Add latency and parallelism profiling Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 1/8] perf report: Add machine parallelism Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 2/8] perf report: Add parallelism sort key Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 3/8] perf report: Switch filtered from u8 to u16 Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 4/8] perf report: Add parallelism filter Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 5/8] perf report: Add latency output field Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 6/8] perf report: Add --latency flag Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 7/8] perf report: Add latency and parallelism profiling documentation Dmitry Vyukov
2025-02-04  4:27   ` Namhyung Kim
2025-02-05 16:46     ` Dmitry Vyukov
2025-02-03 14:30 ` [PATCH v3 8/8] perf hist: Shrink struct hist_entry size Dmitry Vyukov
2025-02-04  4:20 ` Namhyung Kim [this message]
2025-02-05 16:37   ` [PATCH v4 0/8] perf report: Add latency and parallelism profiling Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6GVqh6detN8rbKV@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=dvyukov@google.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).