Re: [PATCH v4 00/11] perf sched: Introduce stats tool

public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
       [not found] <20250826051039.2626894-1-swapnil.sapkal@amd.com>
@ 2025-08-28  4:43 ` Sapkal, Swapnil
  2025-12-09 21:03   ` Ian Rogers
  0 siblings, 1 reply; 7+ messages in thread
From: Sapkal, Swapnil @ 2025-08-28  4:43 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, james.clark
  Cc: ravi.bangoria, swapnil.sapkal, yu.c.chen, mark.rutland,
	alexander.shishkin, jolsa, rostedt, vincent.guittot,
	adrian.hunter, kan.liang, gautham.shenoy, kprateek.nayak,
	juri.lelli, yangjihong, void, tj, sshegde, linux-kernel,
	linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu

Hello all,

Missed to add perf folks to the list. Adding them here. Sorry about that.

--
Thanks and Regards,
Swapnil

On 8/26/2025 10:40 AM, Swapnil Sapkal wrote:
> Apologies for long delay, I was side tracked on some other items. I was
> not able to focus on this.
> 
> MOTIVATION
> ----------
> 
> Existing `perf sched` is quite exhaustive and provides lot of insights
> into scheduler behavior but it quickly becomes impractical to use for
> long running or scheduler intensive workload. For ex, `perf sched record`
> has ~7.77% overhead on hackbench (with 25 groups each running 700K loops
> on a 2-socket 128 Cores 256 Threads 3rd Generation EPYC Server), and it
> generates huge 56G perf.data for which perf takes ~137 mins to prepare
> and write it to disk [1].
> 
> Unlike `perf sched record`, which hooks onto set of scheduler tracepoints
> and generates samples on a tracepoint hit, `perf sched stats record` takes
> snapshot of the /proc/schedstat file before and after the workload, i.e.
> there is almost zero interference on workload run. Also, it takes very
> minimal time to parse /proc/schedstat, convert it into perf samples and
> save those samples into perf.data file. Result perf.data file is much
> smaller. So, overall `perf sched stats record` is much more light weight
> compare to `perf sched record`.
> 
> We, internally at AMD, have been using this (a variant of this, known as
> "sched-scoreboard"[2]) and found it to be very useful to analyse impact
> of any scheduler code changes[3][4]. Prateek used v2[5] of this patch
> series to report the analysis[6][7].
> 
> Please note that, this is not a replacement of perf sched record/report.
> The intended users of the new tool are scheduler developers, not regular
> users.
> 
> USAGE
> -----
> 
>    # perf sched stats record
>    # perf sched stats report
>    # perf sched stats diff
> 
> Note: Although `perf sched stats` tool supports workload profiling syntax
> (i.e. -- <workload> ), the recorded profile is still systemwide since the
> /proc/schedstat is a systemwide file.
> 
> HOW TO INTERPRET THE REPORT
> ---------------------------
> 
> The `perf sched stats report` starts with description of the columns
> present in the report. These column names are given before cpu and
> domain stats to improve the readability of the report.
> 
>    ----------------------------------------------------------------------------------------------------
>    DESC                    -> Description of the field
>    COUNT                   -> Value of the field
>    PCT_CHANGE              -> Percent change with corresponding base value
>    AVG_JIFFIES             -> Avg time in jiffies between two consecutive occurrence of event
>    ----------------------------------------------------------------------------------------------------
> 
> Next is the total profiling time in terms of jiffies:
> 
>    ----------------------------------------------------------------------------------------------------
>    Time elapsed (in jiffies)                                   :       24537
>    ----------------------------------------------------------------------------------------------------
> 
> Next is CPU scheduling statistics. These are simple diffs of
> /proc/schedstat CPU lines along with description. The report also
> prints % relative to base stat.
> 
> In the example below, schedule() left the CPU0 idle 36.58% of the time.
> 0.45% of total try_to_wake_up() was to wakeup local CPU. And, the total
> waittime by tasks on CPU0 is 48.70% of the total runtime by tasks on the
> same CPU.
> 
>    ----------------------------------------------------------------------------------------------------
>    CPU 0
>    ----------------------------------------------------------------------------------------------------
>    DESC                                                                     COUNT   PCT_CHANGE
>    ----------------------------------------------------------------------------------------------------
>    yld_count                                                        :           0
>    array_exp                                                        :           0
>    sched_count                                                      :      402267
>    sched_goidle                                                     :      147161  (    36.58% )
>    ttwu_count                                                       :      236309
>    ttwu_local                                                       :        1062  (     0.45% )
>    rq_cpu_time                                                      :  7083791148
>    run_delay                                                        :  3449973971  (    48.70% )
>    pcount                                                           :      255035
>    ----------------------------------------------------------------------------------------------------
> 
> Next is load balancing statistics. For each of the sched domains
> (eg: `SMT`, `MC`, `DIE`...), the scheduler computes statistics under
> the following three categories:
> 
>    1) Idle Load Balance: Load balancing performed on behalf of a long
>                          idling CPU by some other CPU.
>    2) Busy Load Balance: Load balancing performed when the CPU was busy.
>    3) New Idle Balance : Load balancing performed when a CPU just became
>                          idle.
> 
> Under each of these three categories, sched stats report provides
> different load balancing statistics. Along with direct stats, the
> report also contains derived metrics prefixed with *. Example:
> 
>    ----------------------------------------------------------------------------------------------------
>    CPU 0, DOMAIN SMT CPUS 0,64
>    ----------------------------------------------------------------------------------------------------
>    DESC                                                                     COUNT    AVG_JIFFIES
>    ----------------------------------------- <Category busy> ------------------------------------------
>    busy_lb_count                                                    :         136  $       17.08 $
>    busy_lb_balanced                                                 :         131  $       17.73 $
>    busy_lb_failed                                                   :           0  $        0.00 $
>    busy_lb_imbalance_load                                           :          58
>    busy_lb_imbalance_util                                           :           0
>    busy_lb_imbalance_task                                           :           0
>    busy_lb_imbalance_misfit                                         :           0
>    busy_lb_gained                                                   :           7
>    busy_lb_hot_gained                                               :           0
>    busy_lb_nobusyq                                                  :           2  $     1161.50 $
>    busy_lb_nobusyg                                                  :         129  $       18.01 $
>    *busy_lb_success_count                                           :           5
>    *busy_lb_avg_pulled                                              :        1.40
>    ----------------------------------------- <Category idle> ------------------------------------------
>    idle_lb_count                                                    :         449  $        5.17 $
>    idle_lb_balanced                                                 :         382  $        6.08 $
>    idle_lb_failed                                                   :           3  $      774.33 $
>    idle_lb_imbalance_load                                           :           0
>    idle_lb_imbalance_util                                           :           0
>    idle_lb_imbalance_task                                           :          71
>    idle_lb_imbalance_misfit                                         :           0
>    idle_lb_gained                                                   :          67
>    idle_lb_hot_gained                                               :           0
>    idle_lb_nobusyq                                                  :           0  $        0.00 $
>    idle_lb_nobusyg                                                  :         382  $        6.08 $
>    *idle_lb_success_count                                           :          64
>    *idle_lb_avg_pulled                                              :        1.05
>    ---------------------------------------- <Category newidle> ----------------------------------------
>    newidle_lb_count                                                 :       30471  $        0.08 $
>    newidle_lb_balanced                                              :       28490  $        0.08 $
>    newidle_lb_failed                                                :         633  $        3.67 $
>    newidle_lb_imbalance_load                                        :           0
>    newidle_lb_imbalance_util                                        :           0
>    newidle_lb_imbalance_task                                        :        2040
>    newidle_lb_imbalance_misfit                                      :           0
>    newidle_lb_gained                                                :        1348
>    newidle_lb_hot_gained                                            :           0
>    newidle_lb_nobusyq                                               :           6  $      387.17 $
>    newidle_lb_nobusyg                                               :       26634  $        0.09 $
>    *newidle_lb_success_count                                        :        1348
>    *newidle_lb_avg_pulled                                           :        1.00
>    ----------------------------------------------------------------------------------------------------
> 
> Consider following line:
> 
> newidle_lb_balanced                                              :       28490  $        0.08 $
> 
> While profiling was active, the load-balancer found 28490 times the load
> needs to be balanced on a newly idle CPU 0. Following value encapsulated
> inside $ is average jiffies between two events (28490 / 24537 = 0.08).
> 
> Next are active_load_balance() stats. alb did not trigger while the
> profiling was active, hence it's all 0s.
> 
> 
>    --------------------------------- <Category active_load_balance()> ---------------------------------
>    alb_count                                                        :           0
>    alb_failed                                                       :           0
>    alb_pushed                                                       :           0
>    ----------------------------------------------------------------------------------------------------
> 
> Next are sched_balance_exec() and sched_balance_fork() stats. They are
> not used but we kept it in RFC just for legacy purpose. Unless opposed,
> we plan to remove them in next revision.
> 
> Next are wakeup statistics. For every domain, the report also shows
> task-wakeup statistics. Example:
> 
>    ------------------------------------------ <Wakeup Info> -------------------------------------------
>    ttwu_wake_remote                                                 :        1590
>    ttwu_move_affine                                                 :          84
>    ttwu_move_balance                                                :           0
>    ----------------------------------------------------------------------------------------------------
> 
> Same set of stats are reported for each CPU and each domain level.
> 
> HOW TO INTERPRET THE DIFF
> -------------------------
> 
> The `perf sched stats diff` will also start with explaining the columns
> present in the diff. Then it will show the diff in time in terms of
> jiffies. The order of the values depends on the order of input data
> files. Example:
> 
>    ----------------------------------------------------------------------------------------------------
>    Time elapsed (in jiffies)                                        :        2763,       2763
>    ----------------------------------------------------------------------------------------------------
> 
> Below is the sample representing the difference in cpu and domain stats of
> two runs. Here third column or the values enclosed in `|...|` shows the
> percent change between the two. Second and fourth columns shows the
> side-by-side representions of the corresponding fields from `perf sched
> stats report`.
> 
>    ----------------------------------------------------------------------------------------------------
>    CPU <ALL CPUS SUMMARY>
>    ----------------------------------------------------------------------------------------------------
>    DESC                                                                    COUNT1      COUNT2   PCT_CHANG>
>    ----------------------------------------------------------------------------------------------------
>    yld_count                                                        :           0,          0  |     0.00>
>    array_exp                                                        :           0,          0  |     0.00>
>    sched_count                                                      :      528533,     412573  |   -21.94>
>    sched_goidle                                                     :      193426,     146082  |   -24.48>
>    ttwu_count                                                       :      313134,     385975  |    23.26>
>    ttwu_local                                                       :        1126,       1282  |    13.85>
>    rq_cpu_time                                                      :  8257200244, 8301250047  |     0.53>
>    run_delay                                                        :  4728347053, 3997100703  |   -15.47>
>    pcount                                                           :      335031,     266396  |   -20.49>
>    ----------------------------------------------------------------------------------------------------
> 
> Below is the sample of domain stats diff:
> 
>    ----------------------------------------------------------------------------------------------------
>    CPU <ALL CPUS SUMMARY>, DOMAIN SMT
>    ----------------------------------------------------------------------------------------------------
>    DESC                                                                    COUNT1      COUNT2   PCT_CHANG>
>    ----------------------------------------- <Category busy> ------------------------------------------
>    busy_lb_count                                                    :         122,         80  |   -34.43>
>    busy_lb_balanced                                                 :         115,         76  |   -33.91>
>    busy_lb_failed                                                   :           1,          3  |   200.00>
>    busy_lb_imbalance_load                                           :          35,         49  |    40.00>
>    busy_lb_imbalance_util                                           :           0,          0  |     0.00>
>    busy_lb_imbalance_task                                           :           0,          0  |     0.00>
>    busy_lb_imbalance_misfit                                         :           0,          0  |     0.00>
>    busy_lb_gained                                                   :           7,          2  |   -71.43>
>    busy_lb_hot_gained                                               :           0,          0  |     0.00>
>    busy_lb_nobusyq                                                  :           0,          0  |     0.00>
>    busy_lb_nobusyg                                                  :         115,         76  |   -33.91>
>    *busy_lb_success_count                                           :           6,          1  |   -83.33>
>    *busy_lb_avg_pulled                                              :        1.17,       2.00  |    71.43>
>    ----------------------------------------- <Category idle> ------------------------------------------
>    idle_lb_count                                                    :         568,        620  |     9.15>
>    idle_lb_balanced                                                 :         462,        449  |    -2.81>
>    idle_lb_failed                                                   :          11,         21  |    90.91>
>    idle_lb_imbalance_load                                           :           0,          0  |     0.00>
>    idle_lb_imbalance_util                                           :           0,          0  |     0.00>
>    idle_lb_imbalance_task                                           :         115,        189  |    64.35>
>    idle_lb_imbalance_misfit                                         :           0,          0  |     0.00>
>    idle_lb_gained                                                   :         103,        169  |    64.08>
>    idle_lb_hot_gained                                               :           0,          0  |     0.00>
>    idle_lb_nobusyq                                                  :           0,          0  |     0.00>
>    idle_lb_nobusyg                                                  :         462,        449  |    -2.81>
>    *idle_lb_success_count                                           :          95,        150  |    57.89>
>    *idle_lb_avg_pulled                                              :        1.08,       1.13  |     3.92>
>    ---------------------------------------- <Category newidle> ----------------------------------------
>    newidle_lb_count                                                 :       16961,       3155  |   -81.40>
>    newidle_lb_balanced                                              :       15646,       2556  |   -83.66>
>    newidle_lb_failed                                                :         397,        142  |   -64.23>
>    newidle_lb_imbalance_load                                        :           0,          0  |     0.00>
>    newidle_lb_imbalance_util                                        :           0,          0  |     0.00>
>    newidle_lb_imbalance_task                                        :        1376,        655  |   -52.40>
>    newidle_lb_imbalance_misfit                                      :           0,          0  |     0.00>
>    newidle_lb_gained                                                :         917,        457  |   -50.16>
>    newidle_lb_hot_gained                                            :           0,          0  |     0.00>
>    newidle_lb_nobusyq                                               :           3,          1  |   -66.67>
>    newidle_lb_nobusyg                                               :       14480,       2103  |   -85.48>
>    *newidle_lb_success_count                                        :         918,        457  |   -50.22>
>    *newidle_lb_avg_pulled                                           :        1.00,       1.00  |     0.11>
>    --------------------------------- <Category active_load_balance()> ---------------------------------
>    alb_count                                                        :           0,          1  |     0.00>
>    alb_failed                                                       :           0,          0  |     0.00>
>    alb_pushed                                                       :           0,          1  |     0.00>
>    --------------------------------- <Category sched_balance_exec()> ----------------------------------
>    sbe_count                                                        :           0,          0  |     0.00>
>    sbe_balanced                                                     :           0,          0  |     0.00>
>    sbe_pushed                                                       :           0,          0  |     0.00>
>    --------------------------------- <Category sched_balance_fork()> ----------------------------------
>    sbf_count                                                        :           0,          0  |     0.00>
>    sbf_balanced                                                     :           0,          0  |     0.00>
>    sbf_pushed                                                       :           0,          0  |     0.00>
>    ------------------------------------------ <Wakeup Info> -------------------------------------------
>    ttwu_wake_remote                                                 :        2031,       2914  |    43.48>
>    ttwu_move_affine                                                 :          73,        124  |    69.86>
>    ttwu_move_balance                                                :           0,          0  |     0.00>
>    ----------------------------------------------------------------------------------------------------
> 
> v3: https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/
> v3->v4:
>   - All the review comments from v3 are addressed [Namhyung Kim].
>   - Print short names instead of field descripion in the report [Peter Zijlstra]
>   - Fix the double free issue [Cristian Prundeanu]
>   - Documentation update related to `perf sched stats diff` [Chen yu]
>   - Bail out `perf sched stats diff` if perf.data files have different schedstat
>     versions [Peter Zijlstra]
> 
> v2: https://lore.kernel.org/all/20241122084452.1064968-1-swapnil.sapkal@amd.com/
> v2->v3:
>   - Add perf unit test for basic sched stats functionalities
>   - Describe new tool, it's usage and interpretation of report data in the
>     perf-sched man page.
>   - Add /proc/schedstat version 17 support.
> 
> v1: https://lore.kernel.org/lkml/20240916164722.1838-1-ravi.bangoria@amd.com
> v1->v2
>   - Add the support for `perf sched stats diff`
>   - Add column header in report for better readability. Use
>     procfs__mountpoint for consistency. Add hint for enabling
>     CONFIG_SCHEDSTAT if disabled. [James Clark]
>   - Use a single header file for both cpu and domain fileds. Change
>     the layout of structs to minimise the padding. I tried changing
>     `v15` to `15` in the header files but it was not giving any
>     benefits so drop the idea. [Namhyung Kim]
>   - Add tested-by.
> 
> RFC: https://lore.kernel.org/r/20240508060427.417-1-ravi.bangoria@amd.com
> RFC->v1:
>   - [Kernel] Print domain name along with domain number in /proc/schedstat
>     file.
>   - s/schedstat/stats/ for the subcommand.
>   - Record domain name and cpumask details, also show them in report.
>   - Add CPU filtering capability at record and report time.
>   - Add /proc/schedstat v16 support.
>   - Live mode support. Similar to perf stat command, live mode prints the
>     sched stats on the stdout.
>   - Add pager support in `perf sched stats report` for better scrolling.
>   - Some minor cosmetic changes in report output to improve readability.
>   - Rebase to latest perf-tools-next/perf-tools-next (1de5b5dcb835).
> 
> TODO:
>   - perf sched stats records /proc/schedstat which is a CPU and domain
>     level scheduler statistic. We are planning to add taskstat tool which
>     reads task stats from procfs and generate scheduler statistic report
>     at task granularity. this will probably a standalone tool, something
>     like `perf sched taskstat record/report`.
>   - Except pre-processor related checkpatch warnings, we have addressed
>     most of the other possible warnings.
>   - This version supports diff for two perf.data files captured for same
>     schedstats version but the target is to show diff for multiple
>     perf.data files. Plan is to support diff if perf.data files provided
>     has different schedstat versions.
> 
> Patches are prepared on v6.17-rc3 (1b237f190eb3).
> 
> [1] https://youtu.be/lg-9aG2ajA0?t=283
> [2] https://github.com/AMDESE/sched-scoreboard
> [3] https://lore.kernel.org/lkml/c50bdbfe-02ce-c1bc-c761-c95f8e216ca0@amd.com/
> [4] https://lore.kernel.org/lkml/3e32bec6-5e59-c66a-7676-7d15df2c961c@amd.com/
> [5] https://lore.kernel.org/all/20241122084452.1064968-1-swapnil.sapkal@amd.com/
> [6] https://lore.kernel.org/lkml/3170d16e-eb67-4db8-a327-eb8188397fdb@amd.com/
> [7] https://lore.kernel.org/lkml/feb31b6e-6457-454c-a4f3-ce8ad96bf8de@amd.com/
> 
> Swapnil Sapkal (11):
>    perf: Add print_separator to util
>    tools/lib: Add list_is_first()
>    perf header: Support CPU DOMAIN relation info
>    perf sched stats: Add record and rawdump support
>    perf sched stats: Add schedstat v16 support
>    perf sched stats: Add schedstat v17 support
>    perf sched stats: Add support for report subcommand
>    perf sched stats: Add support for live mode
>    perf sched stats: Add support for diff subcommand
>    perf sched stats: Add basic perf sched stats test
>    perf sched stats: Add details in man page
> 
>   tools/include/linux/list.h                    |   10 +
>   tools/lib/perf/Documentation/libperf.txt      |    2 +
>   tools/lib/perf/Makefile                       |    1 +
>   tools/lib/perf/include/perf/event.h           |   69 ++
>   tools/lib/perf/include/perf/schedstat-v15.h   |  146 +++
>   tools/lib/perf/include/perf/schedstat-v16.h   |  146 +++
>   tools/lib/perf/include/perf/schedstat-v17.h   |  164 +++
>   tools/perf/Documentation/perf-sched.txt       |  261 ++++-
>   .../Documentation/perf.data-file-format.txt   |   17 +
>   tools/perf/builtin-inject.c                   |    3 +
>   tools/perf/builtin-kwork.c                    |   13 +-
>   tools/perf/builtin-sched.c                    | 1027 ++++++++++++++++-
>   tools/perf/tests/shell/perf_sched_stats.sh    |   64 +
>   tools/perf/util/env.h                         |   16 +
>   tools/perf/util/event.c                       |   52 +
>   tools/perf/util/event.h                       |    2 +
>   tools/perf/util/header.c                      |  304 +++++
>   tools/perf/util/header.h                      |    6 +
>   tools/perf/util/session.c                     |   22 +
>   tools/perf/util/synthetic-events.c            |  196 ++++
>   tools/perf/util/synthetic-events.h            |    3 +
>   tools/perf/util/tool.c                        |   18 +
>   tools/perf/util/tool.h                        |    4 +-
>   tools/perf/util/util.c                        |   48 +
>   tools/perf/util/util.h                        |    5 +
>   25 files changed, 2587 insertions(+), 12 deletions(-)
>   create mode 100644 tools/lib/perf/include/perf/schedstat-v15.h
>   create mode 100644 tools/lib/perf/include/perf/schedstat-v16.h
>   create mode 100644 tools/lib/perf/include/perf/schedstat-v17.h
>   create mode 100755 tools/perf/tests/shell/perf_sched_stats.sh
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
  2025-08-28  4:43 ` [PATCH v4 00/11] perf sched: Introduce stats tool Sapkal, Swapnil
@ 2025-12-09 21:03   ` Ian Rogers
  2025-12-12  3:43     ` Ravi Bangoria
  2025-12-16 10:09     ` Swapnil Sapkal
  0 siblings, 2 replies; 7+ messages in thread
From: Ian Rogers @ 2025-12-09 21:03 UTC (permalink / raw)
  To: Sapkal, Swapnil
  Cc: peterz, mingo, acme, namhyung, james.clark, ravi.bangoria,
	yu.c.chen, mark.rutland, alexander.shishkin, jolsa, rostedt,
	vincent.guittot, adrian.hunter, kan.liang, gautham.shenoy,
	kprateek.nayak, juri.lelli, yangjihong, void, tj, sshegde,
	linux-kernel, linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu

On Wed, Aug 27, 2025 at 9:43 PM Sapkal, Swapnil <swapnil.sapkal@amd.com> wrote:
>
> Hello all,
>
> Missed to add perf folks to the list. Adding them here. Sorry about that.

Hi Swapnil,

I was wondering if this patch series was active? The kernel test robot
mentioned an issue.

> --
> Thanks and Regards,
> Swapnil
>
> On 8/26/2025 10:40 AM, Swapnil Sapkal wrote:
> > Apologies for long delay, I was side tracked on some other items. I was
> > not able to focus on this.
> >
> > MOTIVATION
> > ----------
> >
> > Existing `perf sched` is quite exhaustive and provides lot of insights
> > into scheduler behavior but it quickly becomes impractical to use for
> > long running or scheduler intensive workload. For ex, `perf sched record`
> > has ~7.77% overhead on hackbench (with 25 groups each running 700K loops
> > on a 2-socket 128 Cores 256 Threads 3rd Generation EPYC Server), and it
> > generates huge 56G perf.data for which perf takes ~137 mins to prepare
> > and write it to disk [1].
> >
> > Unlike `perf sched record`, which hooks onto set of scheduler tracepoints
> > and generates samples on a tracepoint hit, `perf sched stats record` takes
> > snapshot of the /proc/schedstat file before and after the workload, i.e.
> > there is almost zero interference on workload run. Also, it takes very
> > minimal time to parse /proc/schedstat, convert it into perf samples and
> > save those samples into perf.data file. Result perf.data file is much
> > smaller. So, overall `perf sched stats record` is much more light weight
> > compare to `perf sched record`.
> >
> > We, internally at AMD, have been using this (a variant of this, known as
> > "sched-scoreboard"[2]) and found it to be very useful to analyse impact
> > of any scheduler code changes[3][4]. Prateek used v2[5] of this patch
> > series to report the analysis[6][7].
> >
> > Please note that, this is not a replacement of perf sched record/report.
> > The intended users of the new tool are scheduler developers, not regular
> > users.
> >
> > USAGE
> > -----
> >
> >    # perf sched stats record
> >    # perf sched stats report
> >    # perf sched stats diff
> >
> > Note: Although `perf sched stats` tool supports workload profiling syntax
> > (i.e. -- <workload> ), the recorded profile is still systemwide since the
> > /proc/schedstat is a systemwide file.
> >
> > HOW TO INTERPRET THE REPORT
> > ---------------------------
> >
> > The `perf sched stats report` starts with description of the columns
> > present in the report. These column names are given before cpu and
> > domain stats to improve the readability of the report.
> >
> >    ----------------------------------------------------------------------------------------------------
> >    DESC                    -> Description of the field
> >    COUNT                   -> Value of the field
> >    PCT_CHANGE              -> Percent change with corresponding base value
> >    AVG_JIFFIES             -> Avg time in jiffies between two consecutive occurrence of event
> >    ----------------------------------------------------------------------------------------------------
> >
> > Next is the total profiling time in terms of jiffies:
> >
> >    ----------------------------------------------------------------------------------------------------
> >    Time elapsed (in jiffies)                                   :       24537
> >    ----------------------------------------------------------------------------------------------------
> >
> > Next is CPU scheduling statistics. These are simple diffs of
> > /proc/schedstat CPU lines along with description. The report also
> > prints % relative to base stat.

I wonder if this is similar to user_time and system_time:
```
$ perf list
...
tool:
...
 system_time
      [System/kernel time in nanoseconds. Unit: tool]
...
 user_time
      [User (non-kernel) time in nanoseconds. Unit: tool]
...
```
These events are implemented by reading /proc/stat and /proc/pid/stat:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/tool_pmu.c?h=perf-tools-next#n267

As they are events then they can appear in perf stat output and also
within metrics.

Thanks,
Ian

> >
> > In the example below, schedule() left the CPU0 idle 36.58% of the time.
> > 0.45% of total try_to_wake_up() was to wakeup local CPU. And, the total
> > waittime by tasks on CPU0 is 48.70% of the total runtime by tasks on the
> > same CPU.
> >
> >    ----------------------------------------------------------------------------------------------------
> >    CPU 0
> >    ----------------------------------------------------------------------------------------------------
> >    DESC                                                                     COUNT   PCT_CHANGE
> >    ----------------------------------------------------------------------------------------------------
> >    yld_count                                                        :           0
> >    array_exp                                                        :           0
> >    sched_count                                                      :      402267
> >    sched_goidle                                                     :      147161  (    36.58% )
> >    ttwu_count                                                       :      236309
> >    ttwu_local                                                       :        1062  (     0.45% )
> >    rq_cpu_time                                                      :  7083791148
> >    run_delay                                                        :  3449973971  (    48.70% )
> >    pcount                                                           :      255035
> >    ----------------------------------------------------------------------------------------------------
> >
> > Next is load balancing statistics. For each of the sched domains
> > (eg: `SMT`, `MC`, `DIE`...), the scheduler computes statistics under
> > the following three categories:
> >
> >    1) Idle Load Balance: Load balancing performed on behalf of a long
> >                          idling CPU by some other CPU.
> >    2) Busy Load Balance: Load balancing performed when the CPU was busy.
> >    3) New Idle Balance : Load balancing performed when a CPU just became
> >                          idle.
> >
> > Under each of these three categories, sched stats report provides
> > different load balancing statistics. Along with direct stats, the
> > report also contains derived metrics prefixed with *. Example:
> >
> >    ----------------------------------------------------------------------------------------------------
> >    CPU 0, DOMAIN SMT CPUS 0,64
> >    ----------------------------------------------------------------------------------------------------
> >    DESC                                                                     COUNT    AVG_JIFFIES
> >    ----------------------------------------- <Category busy> ------------------------------------------
> >    busy_lb_count                                                    :         136  $       17.08 $
> >    busy_lb_balanced                                                 :         131  $       17.73 $
> >    busy_lb_failed                                                   :           0  $        0.00 $
> >    busy_lb_imbalance_load                                           :          58
> >    busy_lb_imbalance_util                                           :           0
> >    busy_lb_imbalance_task                                           :           0
> >    busy_lb_imbalance_misfit                                         :           0
> >    busy_lb_gained                                                   :           7
> >    busy_lb_hot_gained                                               :           0
> >    busy_lb_nobusyq                                                  :           2  $     1161.50 $
> >    busy_lb_nobusyg                                                  :         129  $       18.01 $
> >    *busy_lb_success_count                                           :           5
> >    *busy_lb_avg_pulled                                              :        1.40
> >    ----------------------------------------- <Category idle> ------------------------------------------
> >    idle_lb_count                                                    :         449  $        5.17 $
> >    idle_lb_balanced                                                 :         382  $        6.08 $
> >    idle_lb_failed                                                   :           3  $      774.33 $
> >    idle_lb_imbalance_load                                           :           0
> >    idle_lb_imbalance_util                                           :           0
> >    idle_lb_imbalance_task                                           :          71
> >    idle_lb_imbalance_misfit                                         :           0
> >    idle_lb_gained                                                   :          67
> >    idle_lb_hot_gained                                               :           0
> >    idle_lb_nobusyq                                                  :           0  $        0.00 $
> >    idle_lb_nobusyg                                                  :         382  $        6.08 $
> >    *idle_lb_success_count                                           :          64
> >    *idle_lb_avg_pulled                                              :        1.05
> >    ---------------------------------------- <Category newidle> ----------------------------------------
> >    newidle_lb_count                                                 :       30471  $        0.08 $
> >    newidle_lb_balanced                                              :       28490  $        0.08 $
> >    newidle_lb_failed                                                :         633  $        3.67 $
> >    newidle_lb_imbalance_load                                        :           0
> >    newidle_lb_imbalance_util                                        :           0
> >    newidle_lb_imbalance_task                                        :        2040
> >    newidle_lb_imbalance_misfit                                      :           0
> >    newidle_lb_gained                                                :        1348
> >    newidle_lb_hot_gained                                            :           0
> >    newidle_lb_nobusyq                                               :           6  $      387.17 $
> >    newidle_lb_nobusyg                                               :       26634  $        0.09 $
> >    *newidle_lb_success_count                                        :        1348
> >    *newidle_lb_avg_pulled                                           :        1.00
> >    ----------------------------------------------------------------------------------------------------
> >
> > Consider following line:
> >
> > newidle_lb_balanced                                              :       28490  $        0.08 $
> >
> > While profiling was active, the load-balancer found 28490 times the load
> > needs to be balanced on a newly idle CPU 0. Following value encapsulated
> > inside $ is average jiffies between two events (28490 / 24537 = 0.08).
> >
> > Next are active_load_balance() stats. alb did not trigger while the
> > profiling was active, hence it's all 0s.
> >
> >
> >    --------------------------------- <Category active_load_balance()> ---------------------------------
> >    alb_count                                                        :           0
> >    alb_failed                                                       :           0
> >    alb_pushed                                                       :           0
> >    ----------------------------------------------------------------------------------------------------
> >
> > Next are sched_balance_exec() and sched_balance_fork() stats. They are
> > not used but we kept it in RFC just for legacy purpose. Unless opposed,
> > we plan to remove them in next revision.
> >
> > Next are wakeup statistics. For every domain, the report also shows
> > task-wakeup statistics. Example:
> >
> >    ------------------------------------------ <Wakeup Info> -------------------------------------------
> >    ttwu_wake_remote                                                 :        1590
> >    ttwu_move_affine                                                 :          84
> >    ttwu_move_balance                                                :           0
> >    ----------------------------------------------------------------------------------------------------
> >
> > Same set of stats are reported for each CPU and each domain level.
> >
> > HOW TO INTERPRET THE DIFF
> > -------------------------
> >
> > The `perf sched stats diff` will also start with explaining the columns
> > present in the diff. Then it will show the diff in time in terms of
> > jiffies. The order of the values depends on the order of input data
> > files. Example:
> >
> >    ----------------------------------------------------------------------------------------------------
> >    Time elapsed (in jiffies)                                        :        2763,       2763
> >    ----------------------------------------------------------------------------------------------------
> >
> > Below is the sample representing the difference in cpu and domain stats of
> > two runs. Here third column or the values enclosed in `|...|` shows the
> > percent change between the two. Second and fourth columns shows the
> > side-by-side representions of the corresponding fields from `perf sched
> > stats report`.
> >
> >    ----------------------------------------------------------------------------------------------------
> >    CPU <ALL CPUS SUMMARY>
> >    ----------------------------------------------------------------------------------------------------
> >    DESC                                                                    COUNT1      COUNT2   PCT_CHANG>
> >    ----------------------------------------------------------------------------------------------------
> >    yld_count                                                        :           0,          0  |     0.00>
> >    array_exp                                                        :           0,          0  |     0.00>
> >    sched_count                                                      :      528533,     412573  |   -21.94>
> >    sched_goidle                                                     :      193426,     146082  |   -24.48>
> >    ttwu_count                                                       :      313134,     385975  |    23.26>
> >    ttwu_local                                                       :        1126,       1282  |    13.85>
> >    rq_cpu_time                                                      :  8257200244, 8301250047  |     0.53>
> >    run_delay                                                        :  4728347053, 3997100703  |   -15.47>
> >    pcount                                                           :      335031,     266396  |   -20.49>
> >    ----------------------------------------------------------------------------------------------------
> >
> > Below is the sample of domain stats diff:
> >
> >    ----------------------------------------------------------------------------------------------------
> >    CPU <ALL CPUS SUMMARY>, DOMAIN SMT
> >    ----------------------------------------------------------------------------------------------------
> >    DESC                                                                    COUNT1      COUNT2   PCT_CHANG>
> >    ----------------------------------------- <Category busy> ------------------------------------------
> >    busy_lb_count                                                    :         122,         80  |   -34.43>
> >    busy_lb_balanced                                                 :         115,         76  |   -33.91>
> >    busy_lb_failed                                                   :           1,          3  |   200.00>
> >    busy_lb_imbalance_load                                           :          35,         49  |    40.00>
> >    busy_lb_imbalance_util                                           :           0,          0  |     0.00>
> >    busy_lb_imbalance_task                                           :           0,          0  |     0.00>
> >    busy_lb_imbalance_misfit                                         :           0,          0  |     0.00>
> >    busy_lb_gained                                                   :           7,          2  |   -71.43>
> >    busy_lb_hot_gained                                               :           0,          0  |     0.00>
> >    busy_lb_nobusyq                                                  :           0,          0  |     0.00>
> >    busy_lb_nobusyg                                                  :         115,         76  |   -33.91>
> >    *busy_lb_success_count                                           :           6,          1  |   -83.33>
> >    *busy_lb_avg_pulled                                              :        1.17,       2.00  |    71.43>
> >    ----------------------------------------- <Category idle> ------------------------------------------
> >    idle_lb_count                                                    :         568,        620  |     9.15>
> >    idle_lb_balanced                                                 :         462,        449  |    -2.81>
> >    idle_lb_failed                                                   :          11,         21  |    90.91>
> >    idle_lb_imbalance_load                                           :           0,          0  |     0.00>
> >    idle_lb_imbalance_util                                           :           0,          0  |     0.00>
> >    idle_lb_imbalance_task                                           :         115,        189  |    64.35>
> >    idle_lb_imbalance_misfit                                         :           0,          0  |     0.00>
> >    idle_lb_gained                                                   :         103,        169  |    64.08>
> >    idle_lb_hot_gained                                               :           0,          0  |     0.00>
> >    idle_lb_nobusyq                                                  :           0,          0  |     0.00>
> >    idle_lb_nobusyg                                                  :         462,        449  |    -2.81>
> >    *idle_lb_success_count                                           :          95,        150  |    57.89>
> >    *idle_lb_avg_pulled                                              :        1.08,       1.13  |     3.92>
> >    ---------------------------------------- <Category newidle> ----------------------------------------
> >    newidle_lb_count                                                 :       16961,       3155  |   -81.40>
> >    newidle_lb_balanced                                              :       15646,       2556  |   -83.66>
> >    newidle_lb_failed                                                :         397,        142  |   -64.23>
> >    newidle_lb_imbalance_load                                        :           0,          0  |     0.00>
> >    newidle_lb_imbalance_util                                        :           0,          0  |     0.00>
> >    newidle_lb_imbalance_task                                        :        1376,        655  |   -52.40>
> >    newidle_lb_imbalance_misfit                                      :           0,          0  |     0.00>
> >    newidle_lb_gained                                                :         917,        457  |   -50.16>
> >    newidle_lb_hot_gained                                            :           0,          0  |     0.00>
> >    newidle_lb_nobusyq                                               :           3,          1  |   -66.67>
> >    newidle_lb_nobusyg                                               :       14480,       2103  |   -85.48>
> >    *newidle_lb_success_count                                        :         918,        457  |   -50.22>
> >    *newidle_lb_avg_pulled                                           :        1.00,       1.00  |     0.11>
> >    --------------------------------- <Category active_load_balance()> ---------------------------------
> >    alb_count                                                        :           0,          1  |     0.00>
> >    alb_failed                                                       :           0,          0  |     0.00>
> >    alb_pushed                                                       :           0,          1  |     0.00>
> >    --------------------------------- <Category sched_balance_exec()> ----------------------------------
> >    sbe_count                                                        :           0,          0  |     0.00>
> >    sbe_balanced                                                     :           0,          0  |     0.00>
> >    sbe_pushed                                                       :           0,          0  |     0.00>
> >    --------------------------------- <Category sched_balance_fork()> ----------------------------------
> >    sbf_count                                                        :           0,          0  |     0.00>
> >    sbf_balanced                                                     :           0,          0  |     0.00>
> >    sbf_pushed                                                       :           0,          0  |     0.00>
> >    ------------------------------------------ <Wakeup Info> -------------------------------------------
> >    ttwu_wake_remote                                                 :        2031,       2914  |    43.48>
> >    ttwu_move_affine                                                 :          73,        124  |    69.86>
> >    ttwu_move_balance                                                :           0,          0  |     0.00>
> >    ----------------------------------------------------------------------------------------------------
> >
> > v3: https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/
> > v3->v4:
> >   - All the review comments from v3 are addressed [Namhyung Kim].
> >   - Print short names instead of field descripion in the report [Peter Zijlstra]
> >   - Fix the double free issue [Cristian Prundeanu]
> >   - Documentation update related to `perf sched stats diff` [Chen yu]
> >   - Bail out `perf sched stats diff` if perf.data files have different schedstat
> >     versions [Peter Zijlstra]
> >
> > v2: https://lore.kernel.org/all/20241122084452.1064968-1-swapnil.sapkal@amd.com/
> > v2->v3:
> >   - Add perf unit test for basic sched stats functionalities
> >   - Describe new tool, it's usage and interpretation of report data in the
> >     perf-sched man page.
> >   - Add /proc/schedstat version 17 support.
> >
> > v1: https://lore.kernel.org/lkml/20240916164722.1838-1-ravi.bangoria@amd.com
> > v1->v2
> >   - Add the support for `perf sched stats diff`
> >   - Add column header in report for better readability. Use
> >     procfs__mountpoint for consistency. Add hint for enabling
> >     CONFIG_SCHEDSTAT if disabled. [James Clark]
> >   - Use a single header file for both cpu and domain fileds. Change
> >     the layout of structs to minimise the padding. I tried changing
> >     `v15` to `15` in the header files but it was not giving any
> >     benefits so drop the idea. [Namhyung Kim]
> >   - Add tested-by.
> >
> > RFC: https://lore.kernel.org/r/20240508060427.417-1-ravi.bangoria@amd.com
> > RFC->v1:
> >   - [Kernel] Print domain name along with domain number in /proc/schedstat
> >     file.
> >   - s/schedstat/stats/ for the subcommand.
> >   - Record domain name and cpumask details, also show them in report.
> >   - Add CPU filtering capability at record and report time.
> >   - Add /proc/schedstat v16 support.
> >   - Live mode support. Similar to perf stat command, live mode prints the
> >     sched stats on the stdout.
> >   - Add pager support in `perf sched stats report` for better scrolling.
> >   - Some minor cosmetic changes in report output to improve readability.
> >   - Rebase to latest perf-tools-next/perf-tools-next (1de5b5dcb835).
> >
> > TODO:
> >   - perf sched stats records /proc/schedstat which is a CPU and domain
> >     level scheduler statistic. We are planning to add taskstat tool which
> >     reads task stats from procfs and generate scheduler statistic report
> >     at task granularity. this will probably a standalone tool, something
> >     like `perf sched taskstat record/report`.
> >   - Except pre-processor related checkpatch warnings, we have addressed
> >     most of the other possible warnings.
> >   - This version supports diff for two perf.data files captured for same
> >     schedstats version but the target is to show diff for multiple
> >     perf.data files. Plan is to support diff if perf.data files provided
> >     has different schedstat versions.
> >
> > Patches are prepared on v6.17-rc3 (1b237f190eb3).
> >
> > [1] https://youtu.be/lg-9aG2ajA0?t=283
> > [2] https://github.com/AMDESE/sched-scoreboard
> > [3] https://lore.kernel.org/lkml/c50bdbfe-02ce-c1bc-c761-c95f8e216ca0@amd.com/
> > [4] https://lore.kernel.org/lkml/3e32bec6-5e59-c66a-7676-7d15df2c961c@amd.com/
> > [5] https://lore.kernel.org/all/20241122084452.1064968-1-swapnil.sapkal@amd.com/
> > [6] https://lore.kernel.org/lkml/3170d16e-eb67-4db8-a327-eb8188397fdb@amd.com/
> > [7] https://lore.kernel.org/lkml/feb31b6e-6457-454c-a4f3-ce8ad96bf8de@amd.com/
> >
> > Swapnil Sapkal (11):
> >    perf: Add print_separator to util
> >    tools/lib: Add list_is_first()
> >    perf header: Support CPU DOMAIN relation info
> >    perf sched stats: Add record and rawdump support
> >    perf sched stats: Add schedstat v16 support
> >    perf sched stats: Add schedstat v17 support
> >    perf sched stats: Add support for report subcommand
> >    perf sched stats: Add support for live mode
> >    perf sched stats: Add support for diff subcommand
> >    perf sched stats: Add basic perf sched stats test
> >    perf sched stats: Add details in man page
> >
> >   tools/include/linux/list.h                    |   10 +
> >   tools/lib/perf/Documentation/libperf.txt      |    2 +
> >   tools/lib/perf/Makefile                       |    1 +
> >   tools/lib/perf/include/perf/event.h           |   69 ++
> >   tools/lib/perf/include/perf/schedstat-v15.h   |  146 +++
> >   tools/lib/perf/include/perf/schedstat-v16.h   |  146 +++
> >   tools/lib/perf/include/perf/schedstat-v17.h   |  164 +++
> >   tools/perf/Documentation/perf-sched.txt       |  261 ++++-
> >   .../Documentation/perf.data-file-format.txt   |   17 +
> >   tools/perf/builtin-inject.c                   |    3 +
> >   tools/perf/builtin-kwork.c                    |   13 +-
> >   tools/perf/builtin-sched.c                    | 1027 ++++++++++++++++-
> >   tools/perf/tests/shell/perf_sched_stats.sh    |   64 +
> >   tools/perf/util/env.h                         |   16 +
> >   tools/perf/util/event.c                       |   52 +
> >   tools/perf/util/event.h                       |    2 +
> >   tools/perf/util/header.c                      |  304 +++++
> >   tools/perf/util/header.h                      |    6 +
> >   tools/perf/util/session.c                     |   22 +
> >   tools/perf/util/synthetic-events.c            |  196 ++++
> >   tools/perf/util/synthetic-events.h            |    3 +
> >   tools/perf/util/tool.c                        |   18 +
> >   tools/perf/util/tool.h                        |    4 +-
> >   tools/perf/util/util.c                        |   48 +
> >   tools/perf/util/util.h                        |    5 +
> >   25 files changed, 2587 insertions(+), 12 deletions(-)
> >   create mode 100644 tools/lib/perf/include/perf/schedstat-v15.h
> >   create mode 100644 tools/lib/perf/include/perf/schedstat-v16.h
> >   create mode 100644 tools/lib/perf/include/perf/schedstat-v17.h
> >   create mode 100755 tools/perf/tests/shell/perf_sched_stats.sh
> >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
  2025-12-09 21:03   ` Ian Rogers
@ 2025-12-12  3:43     ` Ravi Bangoria
  2025-12-12  5:11       ` Ian Rogers
  2025-12-16 10:09     ` Swapnil Sapkal
  1 sibling, 1 reply; 7+ messages in thread
From: Ravi Bangoria @ 2025-12-12  3:43 UTC (permalink / raw)
  To: Ian Rogers
  Cc: peterz, mingo, acme, namhyung, james.clark, yu.c.chen,
	mark.rutland, alexander.shishkin, jolsa, rostedt, vincent.guittot,
	adrian.hunter, kan.liang, gautham.shenoy, kprateek.nayak,
	juri.lelli, yangjihong, void, tj, sshegde, linux-kernel,
	linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu, Sapkal, Swapnil, Ravi Bangoria

Hi Ian,

>>> Next is CPU scheduling statistics. These are simple diffs of
>>> /proc/schedstat CPU lines along with description. The report also
>>> prints % relative to base stat.
> 
> I wonder if this is similar to user_time and system_time:
> ```
> $ perf list
> ...
> tool:
> ...
>  system_time
>       [System/kernel time in nanoseconds. Unit: tool]
> ...
>  user_time
>       [User (non-kernel) time in nanoseconds. Unit: tool]
> ...
> ```
> These events are implemented by reading /proc/stat and /proc/pid/stat:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/tool_pmu.c?h=perf-tools-next#n267
> 
> As they are events then they can appear in perf stat output and also
> within metrics.

Create synthesized events for each field of /proc/schedstat?

Your idea is interesting and, I suppose, will work best when we care
about individual counters. However, for the "perf sched stats" tool,
I see atleast two challenges:

1. One of the design goal of "perf sched stats" was to keep the
   overhead low. Currently, it reads /proc/schedstat once at the
   beginning and once at the end. Switching to per-counter events
   would require opening, reading and closing a large number of
   events which would incur significant overhead.

2. Taking a snapshot in one go allows us to correlate counts easily.
   Using synthetic events would force us to read each counter
   individually, making cross-counter correlation impossible.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
  2025-12-12  3:43     ` Ravi Bangoria
@ 2025-12-12  5:11       ` Ian Rogers
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Rogers @ 2025-12-12  5:11 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, mingo, acme, namhyung, james.clark, yu.c.chen,
	mark.rutland, alexander.shishkin, jolsa, rostedt, vincent.guittot,
	adrian.hunter, kan.liang, gautham.shenoy, kprateek.nayak,
	juri.lelli, yangjihong, void, tj, sshegde, linux-kernel,
	linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu, Sapkal, Swapnil

On Thu, Dec 11, 2025 at 7:43 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Hi Ian,
>
> >>> Next is CPU scheduling statistics. These are simple diffs of
> >>> /proc/schedstat CPU lines along with description. The report also
> >>> prints % relative to base stat.
> >
> > I wonder if this is similar to user_time and system_time:
> > ```
> > $ perf list
> > ...
> > tool:
> > ...
> >  system_time
> >       [System/kernel time in nanoseconds. Unit: tool]
> > ...
> >  user_time
> >       [User (non-kernel) time in nanoseconds. Unit: tool]
> > ...
> > ```
> > These events are implemented by reading /proc/stat and /proc/pid/stat:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/tool_pmu.c?h=perf-tools-next#n267
> >
> > As they are events then they can appear in perf stat output and also
> > within metrics.
>
> Create synthesized events for each field of /proc/schedstat?
>
> Your idea is interesting and, I suppose, will work best when we care
> about individual counters. However, for the "perf sched stats" tool,
> I see atleast two challenges:
>
> 1. One of the design goal of "perf sched stats" was to keep the
>    overhead low. Currently, it reads /proc/schedstat once at the
>    beginning and once at the end. Switching to per-counter events
>    would require opening, reading and closing a large number of
>    events which would incur significant overhead.
>
> 2. Taking a snapshot in one go allows us to correlate counts easily.
>    Using synthetic events would force us to read each counter
>    individually, making cross-counter correlation impossible.

Thanks Ravi, those are interesting problems. There are similar
problems with just reading regular counters. For example, with the
problem in this series:
https://lore.kernel.org/lkml/20251113180517.44096-1-irogers@google.com/
that was reduced to just the remaining:
https://lore.kernel.org/lkml/20251118211326.1840989-1-irogers@google.com/
we could do a better bandwidth calculation if duration_time were read
along with the uncore counters. Perhaps we can have say a "wall-clock"
software counter (ie like cpu-clock and task-clock) to allow that and
allow the group of events to be read in one go as optimized here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n1910

So maybe there is potential for a read group type optimization of tool
like counters, to do something similar to what you are doing here.
Anyway, that's a different set of things to do and shouldn't inhibit
trying to get this series to land.

Thanks,
Ian


> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
  2025-12-09 21:03   ` Ian Rogers
  2025-12-12  3:43     ` Ravi Bangoria
@ 2025-12-16 10:09     ` Swapnil Sapkal
  2025-12-17 15:37       ` Namhyung Kim
  1 sibling, 1 reply; 7+ messages in thread
From: Swapnil Sapkal @ 2025-12-16 10:09 UTC (permalink / raw)
  To: Ian Rogers
  Cc: peterz, mingo, acme, namhyung, james.clark, ravi.bangoria,
	yu.c.chen, mark.rutland, alexander.shishkin, jolsa, rostedt,
	vincent.guittot, adrian.hunter, kan.liang, gautham.shenoy,
	kprateek.nayak, juri.lelli, yangjihong, void, tj, sshegde,
	linux-kernel, linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu

Hi Ian,

On 10-12-2025 02:33, Ian Rogers wrote:
> On Wed, Aug 27, 2025 at 9:43 PM Sapkal, Swapnil <swapnil.sapkal@amd.com> wrote:
>>
>> Hello all,
>>
>> Missed to add perf folks to the list. Adding them here. Sorry about that.
> 
> Hi Swapnil,
> 
> I was wondering if this patch series was active? The kernel test robot
> mentioned an issue.

The series is active. I have fix for the kernel test robot issue. I will 
be posting the next version in a week. I did not see any more review 
comments on the series hopefully it will be the final version.

--
Thanks and regards,
Swapnil



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
  2025-12-16 10:09     ` Swapnil Sapkal
@ 2025-12-17 15:37       ` Namhyung Kim
  2025-12-18  9:46         ` Swapnil Sapkal
  0 siblings, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2025-12-17 15:37 UTC (permalink / raw)
  To: Swapnil Sapkal
  Cc: Ian Rogers, peterz, mingo, acme, james.clark, ravi.bangoria,
	yu.c.chen, mark.rutland, alexander.shishkin, jolsa, rostedt,
	vincent.guittot, adrian.hunter, kan.liang, gautham.shenoy,
	kprateek.nayak, juri.lelli, yangjihong, void, tj, sshegde,
	linux-kernel, linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu

Hello,

On Tue, Dec 16, 2025 at 03:39:21PM +0530, Swapnil Sapkal wrote:
> Hi Ian,
> 
> On 10-12-2025 02:33, Ian Rogers wrote:
> > On Wed, Aug 27, 2025 at 9:43 PM Sapkal, Swapnil <swapnil.sapkal@amd.com> wrote:
> > > 
> > > Hello all,
> > > 
> > > Missed to add perf folks to the list. Adding them here. Sorry about that.
> > 
> > Hi Swapnil,
> > 
> > I was wondering if this patch series was active? The kernel test robot
> > mentioned an issue.
> 
> The series is active. I have fix for the kernel test robot issue. I will be
> posting the next version in a week. I did not see any more review comments
> on the series hopefully it will be the final version.

Sorry for the delay.  I'll try to review the series next week. ;-)

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 00/11] perf sched: Introduce stats tool
  2025-12-17 15:37       ` Namhyung Kim
@ 2025-12-18  9:46         ` Swapnil Sapkal
  0 siblings, 0 replies; 7+ messages in thread
From: Swapnil Sapkal @ 2025-12-18  9:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, peterz, mingo, acme, james.clark, ravi.bangoria,
	yu.c.chen, mark.rutland, alexander.shishkin, jolsa, rostedt,
	vincent.guittot, adrian.hunter, kan.liang, gautham.shenoy,
	kprateek.nayak, juri.lelli, yangjihong, void, tj, sshegde,
	linux-kernel, linux-perf-users, santosh.shukla, sandipan.das,
	Cristian Prundeanu

Hi Namhyung,

On 17-12-2025 21:07, Namhyung Kim wrote:
> Hello,
> 
> On Tue, Dec 16, 2025 at 03:39:21PM +0530, Swapnil Sapkal wrote:
>> Hi Ian,
>>
>> On 10-12-2025 02:33, Ian Rogers wrote:
>>> On Wed, Aug 27, 2025 at 9:43 PM Sapkal, Swapnil <swapnil.sapkal@amd.com> wrote:
>>>>
>>>> Hello all,
>>>>
>>>> Missed to add perf folks to the list. Adding them here. Sorry about that.
>>>
>>> Hi Swapnil,
>>>
>>> I was wondering if this patch series was active? The kernel test robot
>>> mentioned an issue.
>>
>> The series is active. I have fix for the kernel test robot issue. I will be
>> posting the next version in a week. I did not see any more review comments
>> on the series hopefully it will be the final version.
> 
> Sorry for the delay.  I'll try to review the series next week. ;-)
> 
No problems.

I will be taking off next week. I will incorporate your review comments 
after coming back and post the new series.

--
Thanks and Regards,
Swapnil

> Thanks,
> Namhyung
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-18  9:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250826051039.2626894-1-swapnil.sapkal@amd.com>
2025-08-28  4:43 ` [PATCH v4 00/11] perf sched: Introduce stats tool Sapkal, Swapnil
2025-12-09 21:03   ` Ian Rogers
2025-12-12  3:43     ` Ravi Bangoria
2025-12-12  5:11       ` Ian Rogers
2025-12-16 10:09     ` Swapnil Sapkal
2025-12-17 15:37       ` Namhyung Kim
2025-12-18  9:46         ` Swapnil Sapkal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox