From: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: jolsa@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Optimize perf stat for large number of events/cpus
Date: Wed, 27 Nov 2019 12:16:57 -0300 [thread overview]
Message-ID: <20191127151657.GE22719@kernel.org> (raw)
In-Reply-To: <20191121001522.180827-1-andi@firstfloor.org>
Em Wed, Nov 20, 2019 at 04:15:10PM -0800, Andi Kleen escreveu:
> [v8: Address review feedback. Only changes one patch.]
>
> This patch kit optimizes perf stat for a large number of events
> on systems with many CPUs and PMUs.
>
> Some profiling shows that the most overhead is doing IPIs to
> all the target CPUs. We can optimize this by using sched_setaffinity
> to set the affinity to a target CPU once and then doing
> the perf operation for all events on that CPU. This requires
> some restructuring, but cuts the set up time quite a bit.
>
> In theory we could go further by parallelizing these setups
> too, but that would be much more complicated and for now just batching it
> per CPU seems to be sufficient. At some point with many more cores
> parallelization or a better bulk perf setup API might be needed though.
>
> In addition perf does a lot of redundant /sys accesses with
> many PMUs, which can be also expensve. This is also optimized.
>
> On a large test case (>700 events with many weak groups) on a 94 CPU
> system I go from
>
> real 0m8.607s
> user 0m0.550s
> sys 0m8.041s
>
> to
>
> real 0m3.269s
> user 0m0.760s
> sys 0m1.694s
>
> so shaving ~6 seconds of system time, at slightly more cost
> in perf stat itself. On a 4 socket system the savings
> are more dramatic:
>
> real 0m15.641s
> user 0m0.873s
> sys 0m14.729s
>
> to
>
> real 0m4.493s
> user 0m1.578s
> sys 0m2.444s
>
> so 11s difference in the user visible set up time.
Applied to my local perf/core branch, now undergoing test builds on all
the containers.
Thanks,
- Arnaldo
next prev parent reply other threads:[~2019-11-27 15:17 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-21 0:15 Optimize perf stat for large number of events/cpus Andi Kleen
2019-11-21 0:15 ` [PATCH 01/12] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
2019-11-29 6:02 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 02/12] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
2019-11-29 6:02 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 03/12] perf cpumap: Maintain cpumaps ordered and without dups Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 04/12] perf evlist: Maintain evlist->all_cpus Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 05/12] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 06/12] perf evsel: Add functions to close evsel on a CPU Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 07/12] perf stat: Use affinity for closing file descriptors Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 08/12] perf stat: Factor out open error handling Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 09/12] perf stat: Use affinity for opening events Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-12-18 9:29 ` [perf stat] cc9cdf40ae: perf-sanity-tests.Event_times.fail kernel test robot
2019-12-18 9:29 ` kernel test robot
2019-11-21 0:15 ` [PATCH 10/12] perf stat: Use affinity for reading Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 11/12] perf evsel: Add functions to enable/disable for a specific CPU Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 0:15 ` [PATCH 12/12] perf stat: Use affinity for enabling/disabling events Andi Kleen
2019-12-04 7:53 ` [tip: perf/urgent] " tip-bot2 for Andi Kleen
2019-11-21 12:47 ` Optimize perf stat for large number of events/cpus Andi Kleen
2019-11-21 14:32 ` Arnaldo Carvalho de Melo
2019-11-27 15:16 ` Arnaldo Carvalho de Melo [this message]
2019-11-27 15:43 ` Arnaldo Carvalho de Melo
2019-11-27 23:26 ` Andi Kleen
2019-11-28 0:01 ` Arnaldo Carvalho de Melo
-- strict thread matches above, loose matches on Subject: below --
2019-11-16 5:52 Andi Kleen
2019-11-20 15:16 ` Jiri Olsa
2019-11-12 0:59 Andi Kleen
2019-11-07 18:16 Andi Kleen
2019-11-05 0:25 Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191127151657.GE22719@kernel.org \
--to=arnaldo.melo@gmail.com \
--cc=andi@firstfloor.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.