[PATCH v3 0/7] perf report: Add latency and parallelism profiling

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dmitry Vyukov <dvyukov@google.com>
To: namhyung@kernel.org, irogers@google.com
Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Dmitry Vyukov <dvyukov@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: [PATCH v3 0/7] perf report: Add latency and parallelism profiling
Date: Mon, 27 Jan 2025 10:58:47 +0100	[thread overview]
Message-ID: <cover.1737971364.git.dvyukov@google.com> (raw)

There are two notions of time: wall-clock time and CPU time.
For a single-threaded program, or a program running on a single-core
machine, these notions are the same. However, for a multi-threaded/
multi-process program running on a multi-core machine, these notions are
significantly different. Each second of wall-clock time we have
number-of-cores seconds of CPU time.

Currently perf only allows to profile CPU time. Perf (and all other
existing profilers to the be best of my knowledge) does not allow to
profile wall-clock time.

Optimizing CPU overhead is useful to improve 'throughput', while
optimizing wall-clock overhead is useful to improve 'latency'.
These profiles are complementary and are not interchangeable.
Examples of where latency profile is needed:
 - optimzing build latency
 - optimizing server request latency
 - optimizing ML training/inference latency
 - optimizing running time of any command line program

CPU profile is useless for these use cases at best (if a user understands
the difference), or misleading at worst (if a user tries to use a wrong
profile for a job).

This series add latency and parallelization profiling.
See the added documentation and flags descriptions for details.

Brief outline of the implementation:
 - add context switch collection during record
 - calculate number of threads running on CPUs (parallelism level)
   during report
 - divide each sample weight by the parallelism level
This effectively models that we were taking 1 sample per unit of
wall-clock time.

We still default to the CPU profile, so it's up to users to learn
about the second profiling mode and use it when appropriate.

Changes in v3:
 - rebase and split into patches
 - rename 'wallclock' to 'latency' everywhere
 - don't enable latency profiling by default,
   instead add record/report --latency flag

Dmitry Vyukov (7):
  perf report: Add machine parallelism
  perf report: Add parallelism sort key
  perf report: Switch filtered from u8 to u16
  perf report: Add parallelism filter
  perf report: Add latency output field
  perf report: Add --latency flag
  perf report: Add latency and parallelism profiling documentation

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

 .../callchain-overhead-calculation.txt        |  5 +-
 .../cpu-and-latency-overheads.txt             | 85 ++++++++++++++++++
 tools/perf/Documentation/perf-report.txt      | 49 ++++++----
 tools/perf/Documentation/tips.txt             |  3 +
 tools/perf/builtin-record.c                   | 20 +++++
 tools/perf/builtin-report.c                   | 39 ++++++++
 tools/perf/ui/browsers/hists.c                | 27 +++---
 tools/perf/ui/hist.c                          | 64 +++++++++----
 tools/perf/util/addr_location.c               |  1 +
 tools/perf/util/addr_location.h               |  7 +-
 tools/perf/util/event.c                       | 11 +++
 tools/perf/util/events_stats.h                |  2 +
 tools/perf/util/hist.c                        | 90 +++++++++++++++----
 tools/perf/util/hist.h                        | 26 +++++-
 tools/perf/util/machine.c                     |  7 ++
 tools/perf/util/machine.h                     |  6 ++
 tools/perf/util/session.c                     | 12 +++
 tools/perf/util/session.h                     |  1 +
 tools/perf/util/sort.c                        | 69 ++++++++++++--
 tools/perf/util/sort.h                        |  3 +-
 tools/perf/util/symbol.c                      | 34 +++++++
 tools/perf/util/symbol_conf.h                 |  8 +-
 22 files changed, 498 insertions(+), 71 deletions(-)
 create mode 100644 tools/perf/Documentation/cpu-and-latency-overheads.txt


base-commit: 91b7747dc70d64b5ec56ffe493310f207e7ffc99
-- 
2.48.1.262.g85cc9f2d1e-goog

next             reply	other threads:[~2025-01-27  9:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-27  9:58 Dmitry Vyukov [this message]
2025-01-27  9:58 ` [PATCH v3 1/7] perf report: Add machine parallelism Dmitry Vyukov
2025-01-27  9:58 ` [PATCH v3 2/7] perf report: Add parallelism sort key Dmitry Vyukov
2025-01-29  4:42   ` Namhyung Kim
2025-01-29  7:18     ` Dmitry Vyukov
2025-01-30  5:28       ` Namhyung Kim
2025-02-03 14:40         ` Dmitry Vyukov
2025-01-27  9:58 ` [PATCH v3 3/7] perf report: Switch filtered from u8 to u16 Dmitry Vyukov
2025-01-27  9:58 ` [PATCH v3 4/7] perf report: Add parallelism filter Dmitry Vyukov
2025-01-27  9:58 ` [PATCH v3 5/7] perf report: Add latency output field Dmitry Vyukov
2025-01-29  4:56   ` Namhyung Kim
2025-01-29  6:55     ` Dmitry Vyukov
2025-01-30  5:33       ` Namhyung Kim
2025-01-27  9:58 ` [PATCH v3 6/7] perf report: Add --latency flag Dmitry Vyukov
2025-01-29  5:03   ` Namhyung Kim
2025-01-29  7:12     ` Dmitry Vyukov
2025-01-30  6:30       ` Namhyung Kim
2025-02-03 14:45         ` Dmitry Vyukov
2025-01-27  9:58 ` [PATCH v3 7/7] perf report: Add latency and parallelism profiling documentation Dmitry Vyukov
2025-01-29  5:05 ` [PATCH v3 0/7] perf report: Add latency and parallelism profiling Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1737971364.git.dvyukov@google.com \
    --to=dvyukov@google.com \
    --cc=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).