Re: Optimize perf stat for large number of events/cpus v2

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jiri Olsa <jolsa@redhat.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: acme@kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org,
	eranian@google.com, kan.liang@linux.intel.com,
	peterz@infradead.org
Subject: Re: Optimize perf stat for large number of events/cpus v2
Date: Tue, 22 Oct 2019 10:02:23 +0200	[thread overview]
Message-ID: <20191022080223.GC28177@krava> (raw)
In-Reply-To: <20191020175202.32456-1-andi@firstfloor.org>

On Sun, Oct 20, 2019 at 10:51:53AM -0700, Andi Kleen wrote:
> [The earlier v1 version had a lot of conflicts against some
> recent libperf changes in tip/perf/core. Resolve that and
> also fix some minor issues.]
> 
> This patch kit optimizes perf stat for a large number of events 
> on systems with many CPUs and PMUs.
> 
> Some profiling shows that the most overhead is doing IPIs to
> all the target CPUs. We can optimize this by using sched_setaffinity
> to set the affinity to a target CPU once and then doing
> the perf operation for all events on that CPU. This requires
> some restructuring, but cuts the set up time quite a bit.
> 
> In theory we could go further by parallelizing these setups
> too, but that would be much more complicated and for now just batching it
> per CPU seems to be sufficient. At some point with many more cores 
> parallelization or a better bulk perf setup API might be needed though.
> 
> In addition perf does a lot of redundant /sys accesses with
> many PMUs, which can be also expensve. This is also optimized.
> 
> On a large test case (>700 events with many weak groups) on a 94 CPU
> system I go from
> 
> real	0m8.607s
> user	0m0.550s
> sys	0m8.041s
> 
> to 
> 
> real	0m3.269s
> user	0m0.760s
> sys	0m1.694s
> 
> so shaving ~6 seconds of system time, at slightly more cost
> in perf stat itself. On a 4 socket system with the savings
> are more dramatic:
> 
> real	0m15.641s
> user	0m0.873s
> sys	0m14.729s
> 
> to 
> 
> real	0m4.493s
> user	0m1.578s
> sys	0m2.444s
> 
> so 11s difference in the user visible set up time.
> 
> Also available in 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/stat-scale-4
> 
> v1: Initial post.
> v2: Rebase. Fix some minor issues.

looks really helpful, I ack-ed 1st 2 patches,
I'll need more time for the rest

thanks,
jirka

next prev parent reply	other threads:[~2019-10-22  8:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-20 17:51 Optimize perf stat for large number of events/cpus v2 Andi Kleen
2019-10-20 17:51 ` [PATCH v2 1/9] perf evsel: Always preserve errno while cleaning up perf_event_open failures Andi Kleen
2019-10-22  8:01   ` Jiri Olsa
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Andi Kleen
2019-10-20 17:51 ` [PATCH v2 2/9] perf evsel: Avoid close(-1) Andi Kleen
2019-10-22  8:01   ` Jiri Olsa
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Andi Kleen
2019-10-20 17:51 ` [PATCH v2 3/9] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
2019-10-23  9:47   ` Jiri Olsa
2019-10-20 17:51 ` [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
2019-10-23  9:59   ` Jiri Olsa
2019-10-23 13:02     ` Andi Kleen
2019-10-23 14:30       ` Jiri Olsa
2019-10-23 14:52         ` Andi Kleen
2019-10-23 16:16           ` Alexey Budankov
2019-10-23 17:19             ` Andi Kleen
2019-10-23 18:08               ` Alexey Budankov
2019-10-23 22:37                 ` Andi Kleen
2019-10-24  8:46                   ` Alexey Budankov
2019-10-20 17:51 ` [PATCH v2 5/9] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
2019-10-20 17:51 ` [PATCH v2 6/9] perf stat: Use affinity for closing file descriptors Andi Kleen
2019-10-20 17:52 ` [PATCH v2 7/9] perf stat: Use affinity for opening events Andi Kleen
2019-10-20 17:52 ` [PATCH v2 8/9] perf stat: Use affinity for reading Andi Kleen
2019-10-20 17:52 ` [PATCH v2 9/9] perf stat: Use affinity for enabling/disabling events Andi Kleen
2019-10-23 10:30   ` Jiri Olsa
2019-10-23 13:07     ` Andi Kleen
2019-10-22  8:02 ` Jiri Olsa [this message]
2019-10-22 14:11   ` Optimize perf stat for large number of events/cpus v2 Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022080223.GC28177@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=andi@firstfloor.org \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox